In the 2025 post Text Embedding Benchmark for Recommender Systems, we benchmarked the performance of text embedding models in similarity-based recommendations. Within six months of that post's publication, Alibaba Cloud and Google launched a new generation of open-source text embedding models: qwen3-embedding by Alibaba Cloud and embeddinggemma by Google. Recently, the gorse-cli also added a benchmarking feature for text embedding models. This post will use gorse-cli and the playground dataset to conduct a comprehensive benchmark of popular open-source text embedding models.
Embedding models encode multimodal information such as images and text into high-dimensional vectors, enabling the calculation of relationships between multimodal information by measuring the distance between embedding vectors in online services like search engines and recommender systems. Text embeddings are the most widely used. Major AI service providers offer text embedding APIs to their users, and there are also many open-source text embedding models available for self-hosting. The current mainstream evaluation standard for text embedding models is MTEB. However, MTEB does not assess the capabilities of text embedding models in recommender systems, and this post will attempt to evaluate the performance of text embedding models in recommender systems.
