MTEB-PT

A Text Embedding Benchmark for Brazilian Portuguese

Native, not translated. 54 embedding models ranked on 16 Brazilian-Portuguese tasks, built only from text written in Portuguese (no machine-translated benchmarks), with confidence intervals, significance tests, and an analysis of which tasks actually separate models.

Models

Native tasks

The leaderboard

54 models on native Brazilian-Portuguese tasks, ranked by the 16-task mean. The top 15 are shown below; the full interactive table, the IRT ranking, and per-category views live on Hugging Face.

Open-weight models: quality versus size on a log scale. The dashed line is the Pareto frontier, the best 16-task mean reachable at each parameter budget. Hover any point for details.

#	Model	Params	License	mean₁₆
1	gemini-embedding-001 CLOSED	—	proprietary	0.744
2	text-embedding-3-large CLOSED	—	proprietary	0.733
3	Qwen3-Embedding-8B OPEN	8B	Apache-2.0	0.733
4	gemini-embedding-2 CLOSED	—	proprietary	0.731
5	Octen-Embedding-8B OPEN	8B	Apache-2.0	0.728
6	embeddinggemma-300m OPEN	300M	Gemma	0.726
7	voyage-4-large CLOSED	—	proprietary	0.724
8	harrier-oss-v1-27b OPEN	27B	MIT	0.722
9	embed-v4 CLOSED	—	proprietary	0.722
10	Qwen3-Embedding-4B OPEN	4B	Apache-2.0	0.718
11	KaLM-Embedding-Gemma3-12B-2511 OPEN	11.8B	Tencent-KaLM	0.711
12	F2LLM-v2-8B OPEN	8B	Apache-2.0	0.711
13	text-embedding-3-small CLOSED	—	proprietary	0.710
14	harrier-oss-v1-0.6b OPEN	0.6B	MIT	0.699
15	F2LLM-v2-14B OPEN	14B	Apache-2.0	0.691

🤗Explore the full interactive leaderboard ↗

Cite

The paper

A full write-up (benchmark design, the statistical layer, IRT task discrimination, and a cross-leaderboard validity analysis) is in preparation and will be posted as an arXiv preprint (cs.CL). A citation will appear here when it is live.

Contribute

Submit a model

Want your embedding model on the leaderboard? We accept submissions through either channel; pick whichever fits. Every score is reproducible from public scripts, so each new row can be audited.

Hugging Face discussion

Share the model ID and any prompt or pooling details. Best for a quick request.

Open a discussion ↗

GitHub issue

Prefer the code side? File an issue or a pull request on the benchmark repository.

Open an issue ↗

MTEB-PT

The leaderboard

From the blog

MTEB in 2026: an evaluation ecosystem, and the Portuguese gap

MVEB: embeddings go to video

Are closed embedding APIs worth paying for in Portuguese?

Why a native Brazilian-Portuguese embedding benchmark?

Trimming embedding vocabularies: half the parameters, ~99% of the score

The paper

Submit a model

Hugging Face discussion

GitHub issue