primary use case

[published]static · preferred

measuring LLM factual accuracy with short fact-seeking questions having single correct answers

ConfidenceRankTemporalMethod
High (97%)preferredstatichuman_curated

Sources

SourceDomainScoreAI
primary_use_case