A scientifically rigorous benchmark platform that evaluates Large Language Models through adversarial debates
Last seen: November 22nd at 1:47pm — Visit site