HumanEval
conceptai_benchmark
Try in Playground →RSS
Overview
Open source✓ Open Source
Use caseevaluating functional correctness of code generated from docstrings
Also see
Alternative to
Knowledge graph stats
Claims7
Avg confidence97%
Avg freshness99%
Last updatedUpdated yesterday
Trust distribution
100% unverified
Governance

HumanEval

concept

OpenAI benchmark of 164 hand-written Python programming problems for evaluating code generation

Compare with...

used by

ValueTrustConfidenceFreshnessSources
GoogleUnverifiedHighFresh1

alternative to

ValueTrustConfidenceFreshnessSources
LiveCodeBenchUnverifiedHighFresh1

evaluates

ValueTrustConfidenceFreshnessSources
Python code generation from function signatures and docstringsUnverifiedHighFresh1

open source

ValueTrustConfidenceFreshnessSources
trueUnverifiedHighFresh1

primary use case

ValueTrustConfidenceFreshnessSources
evaluating functional correctness of code generated from docstringsUnverifiedHighFresh1

first released

ValueTrustConfidenceFreshnessSources
2021UnverifiedHighFresh1

created by

ValueTrustConfidenceFreshnessSources
OpenAIUnverifiedHighFresh1

Alternatives & Similar Tools

Related entities

Claim count: 7Last updated: 4/9/2026Edit history