
Sup AI
AIAI ensemble that scored #1 on Humanity's Last Exam
About
All large language models produce hallucinations, but their outputs vary. Sup AI simultaneously operates multiple LLMs from a pool of 339, synthesizing responses by assessing the confidence of each segment. High entropy segments, indicating probable hallucinations, are reduced in weight. Low entropy segments, suggesting greater accuracy, are amplified. This achieves a score of 52.15 on Humanity's Last Exam, leading the next best individual model by 7.41 points. A ten dollar starter credit is available. Card verification is required, with no automatic charges.
Launched
April 7, 2026Week 5
Builder
BU
BuilderReviews
Be the first to review
Comments
Sign in to leave a comment
Sign In