Stanford HELM
ai_benchmark
Overview
Open source✓ Open Source
Use caseholistic multi-metric evaluation of language models across accuracy, fairness, robustness, and efficiency
Also see
Alternative to
Knowledge graph stats
Claims6
Avg confidence97%
Avg freshness99%
Last updatedUpdated yesterday
Trust distribution
100% unverified
Governance
Not assessed
Stanford HELM
product — also known as: HELM
Holistic Evaluation of Language Models framework by Stanford CRFM for transparent multi-metric evaluation
Compare with...alternative to
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| OpenCompass | ○Unverified | High | Fresh | 1 |
evaluates
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| LLMs across 7 metrics including accuracy, calibration, robustness, and fairness | ○Unverified | High | Fresh | 1 |
open source
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| true | ○Unverified | High | Fresh | 1 |
primary use case
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| holistic multi-metric evaluation of language models across accuracy, fairness, robustness, and efficiency | ○Unverified | High | Fresh | 1 |
first released
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| 2022 | ○Unverified | High | Fresh | 1 |
developed by
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| Stanford Center for Research on Foundation Models | ○Unverified | High | Fresh | 1 |