RLHF
ai_safety
Overview
Developed byOpenAI
Use casealigning language models with human preferences via reward modeling
Knowledge graph stats
Claims11
Avg confidence94%
Avg freshness99%
Last updatedUpdated yesterday
Trust distribution
100% unverified
Governance
Not assessed
RLHF
concept
Reinforcement Learning from Human Feedback, primary alignment technique for modern LLMs
Compare with...first released
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| 2017 | ○Unverified | High | Fresh | 1 |
used by
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| Google DeepMind | ○Unverified | High | Fresh | 1 |
| Anthropic | ○Unverified | High | Fresh | 1 |
| OpenAI | ○Unverified | High | Fresh | 1 |
primary use case
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| aligning language models with human preferences via reward modeling | ○Unverified | High | Fresh | 1 |
implemented by
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| ChatGPT | ○Unverified | High | Fresh | 1 |
| Claude | ○Unverified | High | Fresh | 1 |
| GPT-4 | ○Unverified | High | Fresh | 1 |
developed by
| Value | Trust | Confidence | Freshness | Sources |
|---|---|---|---|---|
| OpenAI | ○Unverified | High | Fresh | 1 |
| Anthropic | ○Unverified | Moderate | Fresh | 1 |
| DeepMind | ○Unverified | Moderate | Fresh | 1 |