Pinterest Cuts AI Costs by 90% by Gutting the Vision Layer of a Frontier Model
PINTEREST'S STRATEGY TO CUT AI COSTS BY 90%
Pinterest has recently made headlines by implementing a bold strategy that has led to a staggering 90% reduction in AI costs. With a user base of approximately 620 million monthly active users, the company recognized that relying on a frontier model for every image recommendation was not sustainable financially. In response, Pinterest's Chief Technology Officer, Matt Madrigal, spearheaded an initiative to streamline the cost structure associated with AI operations. This involved a significant overhaul of their existing AI framework, particularly focusing on the vision layer of the Qwen3-VL model, which had been a crucial component in their image recommendation system.
HOW PINTEREST GUTTED QWEN3-VL'S VISION LAYER
To achieve the ambitious goal of cutting AI costs, Pinterest took a decisive step by gutting the vision layer of the Qwen3-VL model. This involved a comprehensive re-evaluation and reconstruction of the model's architecture. Madrigal's team essentially “ripped out” the vision encoder layer, which is responsible for interpreting images, and replaced it with proprietary multimodal embeddings. This innovative approach not only reduced costs but also enhanced the model's performance, leading to a 30% increase in accuracy for image recommendations. By fine-tuning the model with their unique data, Pinterest has demonstrated that the quality of data can significantly outweigh the need for larger model sizes.
THE IMPACT OF CUSTOMIZING OPEN-SOURCE MODELS AT PINTEREST
Customizing open-source models has been a foundational strategy for Pinterest, allowing the company to leverage existing technologies while tailoring them to meet their specific needs. By investing heavily in the customization of open-source models, Pinterest has been able to enhance its visual search and discovery capabilities. The company has a history of utilizing models like Google’s BERT and OpenAI’s CLIP, but the recent focus on Qwen3-VL has taken this customization to new heights. Madrigal explained that by fine-tuning these models with proprietary data, Pinterest can achieve superior results that are more aligned with their unique operational requirements. This approach not only drives down costs but also fosters innovation within their AI framework.
PINTEREST'S INNOVATIONS IN VISUAL DISCOVERY AND RECOMMENDATION
Pinterest has long been at the forefront of visual discovery and recommendation technologies, and the recent modifications to the Qwen3-VL model are a testament to this commitment. The company’s conversational shopping assistant, Navigator 1, was built on the Qwen3-VL framework, showcasing the practical applications of their customized AI solutions. By incorporating proprietary visual embeddings and image metadata, Pinterest has enhanced its ability to deliver personalized experiences to users. The ability to precompute metadata around pins and images allows for regular updates and retraining of the model, ensuring that the recommendations remain relevant and engaging. This innovation not only improves user satisfaction but also strengthens Pinterest's position in a competitive market.
BOOSTING ACCURACY: PINTEREST'S APPROACH TO EMBEDDINGS
The strategic shift to proprietary embeddings has been a game-changer for Pinterest, significantly boosting the accuracy of its image recommendation systems. By focusing on high-quality data and fine-tuning their models, Pinterest has been able to enhance the relevance of the recommendations provided to users. This approach emphasizes the importance of data quality over sheer model size, a principle that Madrigal articulated during a recent podcast discussion. The ability to capture and utilize metadata effectively allows Pinterest to create a more nuanced understanding of user preferences and behaviors, ultimately leading to a more engaging and personalized user experience. As Pinterest continues to innovate in this space, the implications for visual discovery and recommendation technologies are profound, setting a new standard for the industry.