Notes on evaluating Brazilian-Portuguese embeddings, the statistics behind the leaderboard, and practical model engineering.
The leaderboard now spans 169 models, 131 tasks, a retrieval benchmark with private data, and image, audio and video, but still no native Portuguese. Where MTEB-PT fits.
Read →The MTEB team extends the playbook to video and audio with a 23-task benchmark. What it found, and why the method matters for text embeddings too.
Read →The leaderboard’s top model is a closed API, yet the cost–quality frontier is shallow and a free open-weight model ties the leader.
Read →Translated benchmarks quietly flatten the differences between models. Here is what changes when you evaluate on Portuguese that was written in Portuguese.
Read →A multilingual model spends most of its parameters on tokens you never use. We cut EmbeddingGemma-300M to 157M for Portuguese, with zero training.
Read →