benchmark score

[published]static · preferred

SWE-bench Verified: 77.2%

ConfidenceRankTemporalMethod
High (97%)preferredstatichuman_curated

Sources

SourceDomainScoreAI
benchmark_score