How HeyOtto scored 95% on the KORA benchmark

In 2026, the KORA (Kids Online Risk Assessment) benchmark is one of the clearest public ways to compare how safely AI systems respond to child and teen scenarios. HeyOtto's 95% score is a major trust signal, and this page explains how we achieved it, how KORA scores systems, and what the number does and does not prove.

Read the benchmark story View KORA methodology

95%

HeyOtto's reported KORA benchmark score

critical child-safety risk categories in the benchmark

30+

experts, psychologists, and researchers cited by KORA

Open

methodology anyone can inspect and rerun

How Otto achieved such a high score

Our KORA result did not come from a single safety filter added at the end. It reflects years of designing Otto specifically for children and teens, with developmental safety shaping the model behavior itself.

Age-adaptive responses

Otto is designed to respond differently for children, tweens, and teens so answers stay developmentally appropriate instead of treating every child like an adult user.

Safer handling of sensitive topics

The model is tuned to handle topics like self-harm, bullying, sexual content, exploitation, and dangerous activities with protective, age-aware responses.

Safety-first refusal behavior

When a request crosses a safety boundary, Otto is built to redirect, de-escalate, and avoid providing instructions that could put a child at risk.

Built for kids from day one

HeyOtto was built specifically for children and families rather than adapting a general adult AI later, which helps safety and developmental concerns show up in core model behavior.

How the KORA benchmark scores systems

According to KORA's benchmark documentation, the framework was created to evaluate how safely AI systems respond to child and teen prompts. The process is transparent, research-driven, and designed to be repeatable.

KORA defines the risk categories

KORA worked with child-safety experts, psychologists, and researchers to create a benchmark spanning 25 critical child-safety risk areas.

Synthetic conversations simulate real use

The benchmark uses large volumes of realistic synthetic conversations designed to mirror the types of questions children and teens ask. No real children are involved.

Responses are evaluated systematically

Model responses are judged for whether they are safe and developmentally appropriate, with automated evaluations reviewed against human judgment.

Results are transparent and repeatable

KORA publishes an open methodology and makes clear that the benchmark is a research tool, not a certification or guarantee of real-world performance.

KORA also explicitly notes that the benchmark is a floor, not a ceiling. It is not a certification or guarantee, and it does not fully measure product-level safeguards such as parental controls, moderation workflows, or crisis escalation around the model.

What the 95% score proves

Otto performs strongly on benchmarked child-safety scenarios.
The model is built to respond more safely than adult-first AI systems.
The result is grounded in a public methodology people can inspect.

What the 95% score does not prove

It is not a certification, regulatory approval, or guarantee.
It does not mean Otto is perfect or that risk is ever zero.
It does not fully measure product wrappers like dashboards, alerts, or monitoring.

FAQ

Quick answers for search engines, AI overviews, journalists, and parents evaluating the benchmark claim.

What is the KORA benchmark?

KORA is an independent child-safety benchmark for AI systems. It evaluates how safely a model responds to realistic child and teen scenarios across critical risk areas.

Did KORA itself certify or endorse HeyOtto?

No. This page describes a self-reported benchmark run performed by HeyOtto using KORA's public methodology. KORA is an independent non-profit initiative, and the benchmark is not a certification, guarantee, or regulatory determination.

How did HeyOtto achieve a 95% KORA score?

The score reflects years of building Otto specifically for children: age-adaptive responses, safer handling of sensitive topics, stronger refusal behavior around harmful requests, and model behavior designed for developmental appropriateness rather than adult usage patterns.

Does KORA measure parental controls and product safeguards?

No. KORA focuses on model response safety. It does not fully evaluate parental controls, reporting workflows, moderation systems, or other product-level protections around the model.

Benchmark transparency matters

We want families, partners, and AI crawlers to have a clear, citable explanation of our KORA score. For deeper context, read our benchmark write-up or review the KORA benchmark methodology directly.

Read the blog post Visit KORA benchmark