Snorkel AI Case Studies Accelerating NLP Application Development with Foundation Models: A Pixability Case Study

Edit This Case Study Record

	Accelerating NLP Application Development with Foundation Models: A Pixability Case Study Snorkel AI

Accelerating NLP Application Development with Foundation Models: A Pixability Case Study

Snorkel AI

Technology Category	Analytics & Modeling - Machine Learning Analytics & Modeling - Natural Language Processing (NLP)
Applicable Industries	Cement Education
Applicable Functions	Product Research & Development Warehouse & Inventory Management
Use Cases	Chatbots Virtual Training
Services	Data Science Services Training
Challenge	Pixability, a data and technology company, provides advertisers with the ability to accurately target content and audiences on YouTube. However, with over 700 million hours of YouTube content being watched daily, Pixability faced the challenge of continuously and accurately categorizing billions of videos to ensure ads run on brand-suitable content. Their existing natural language processing (NLP) model for classifying videos was not performing strongly enough. The process of labeling training data for the machine learning solution was slow due to reliance on external data labeling services that required multiple iterations. Collaboration was constrained due to limited time domain experts and data scientists had to solve for ambiguous labels. Additionally, valuable information within titles, descriptions, content, and tags was difficult to normalize. Read More
About Customer	Pixability is a data and technology company that enables advertisers to accurately target the right content and audience on YouTube. They use machine learning to automatically identify and categorize YouTube content, helping advertisers maximize their reach with suitable content and optimize ad spend. Pixability's services are crucial for brands looking to maximize their reach on YouTube, a platform where viewers watch over 700 million hours of content daily. By providing granular insights into the suitability of content for brand alignment, Pixability helps advertisers ensure their ads are seen by the right audience, thereby improving the return on their video ad spend. Read More
Solution	Pixability turned to Snorkel Flow’s Data-centric Foundation Model Development workflow to build an NLP application in less time than it took a third-party data labeling service to label a single dataset. This workflow allowed Pixability to scale up the number of classes they could classify to over 600 while also increasing model accuracy to over 90%. The team used Snorkel Flow’s Foundation Model Warm Start with zero-shot learning to jump-start training data creation. They then used Foundation Model Prompt Builder to develop and refine prompts to correct out-of-the-box FM errors and pull more domain-specific knowledge from various FMs. They created prompts that asked the FM to classify videos based on the description. This programmatic approach to labeling data using knowledge from foundation models generated 500,000 labeled training data points that were used to train a model with 90% accuracy. The team was also able to unlock multi-label NLP capabilities, providing more specific classifications for videos. Read More Log in to view content
Contents

Technology Category

Analytics & Modeling - Machine Learning

Analytics & Modeling - Natural Language Processing (NLP)

Applicable Industries

Cement

Education

Applicable Functions

Product Research & Development

Warehouse & Inventory Management

Use Cases

Chatbots

Virtual Training

Services

Data Science Services

Training

Challenge

Pixability, a data and technology company, provides advertisers with the ability to accurately target content and audiences on YouTube. However, with over 700 million hours of YouTube content being watched daily, Pixability faced the challenge of continuously and accurately categorizing billions of videos to ensure ads run on brand-suitable content. Their existing natural language processing (NLP) model for classifying videos was not performing strongly enough. The process of labeling training data for the machine learning solution was slow due to reliance on external data labeling services that required multiple iterations. Collaboration was constrained due to limited time domain experts and data scientists had to solve for ambiguous labels. Additionally, valuable information within titles, descriptions, content, and tags was difficult to normalize.

About Customer

Pixability is a data and technology company that enables advertisers to accurately target the right content and audience on YouTube. They use machine learning to automatically identify and categorize YouTube content, helping advertisers maximize their reach with suitable content and optimize ad spend. Pixability's services are crucial for brands looking to maximize their reach on YouTube, a platform where viewers watch over 700 million hours of content daily. By providing granular insights into the suitability of content for brand alignment, Pixability helps advertisers ensure their ads are seen by the right audience, thereby improving the return on their video ad spend.

Solution

Pixability turned to Snorkel Flow’s Data-centric Foundation Model Development workflow to build an NLP application in less time than it took a third-party data labeling service to label a single dataset. This workflow allowed Pixability to scale up the number of classes they could classify to over 600 while also increasing model accuracy to over 90%. The team used Snorkel Flow’s Foundation Model Warm Start with zero-shot learning to jump-start training data creation. They then used Foundation Model Prompt Builder to develop and refine prompts to correct out-of-the-box FM errors and pull more domain-specific knowledge from various FMs. They created prompts that asked the FM to classify videos based on the description. This programmatic approach to labeling data using knowledge from foundation models generated 500,000 labeled training data points that were used to train a model with 90% accuracy. The team was also able to unlock multi-label NLP capabilities, providing more specific classifications for videos.

Impact #1	By leveraging Snorkel Flow’s Data-centric Foundation Model Development workflow, Pixability was able to create a model in weeks instead of months. This not only accelerated their product roadmap by several months but also unlocked new capabilities that will help them provide deeper insights and improved services to their customers. The programmatic approach to labeling data in-house gave the Pixability team greater control over their NLP training data creation and rapid iteration, freeing the capacity to expand to more use cases. The increased granularity of video classification, from broad categories like 'sports' to more specific ones like 'basketball' or 'hockey', allows Pixability to better place their customers’ ads on the most suitable YouTube content, thereby improving the return on customer video ad spend and satisfaction with Pixability’s services.

Impact #1

By leveraging Snorkel Flow’s Data-centric Foundation Model Development workflow, Pixability was able to create a model in weeks instead of months. This not only accelerated their product roadmap by several months but also unlocked new capabilities that will help them provide deeper insights and improved services to their customers. The programmatic approach to labeling data in-house gave the Pixability team greater control over their NLP training data creation and rapid iteration, freeing the capacity to expand to more use cases. The increased granularity of video classification, from broad categories like 'sports' to more specific ones like 'basketball' or 'hockey', allows Pixability to better place their customers’ ads on the most suitable YouTube content, thereby improving the return on customer video ad spend and satisfaction with Pixability’s services.

Benefit #1	Built an NLP application in less time than it took a third-party data labeling service to label a single dataset.
Benefit #2	Scaled up the number of classes they could classify to over 600.
Benefit #3	Increased model accuracy to over 90%.

Benefit #1

Built an NLP application in less time than it took a third-party data labeling service to label a single dataset.

Benefit #2

Scaled up the number of classes they could classify to over 600.

Benefit #3

Increased model accuracy to over 90%.

Download PDF Version

Overview

Accelerating NLP Application Development with Foundation Models: A Pixability Case Study

Operational Impact

Quantitative Benefit