Evaluation tasks for foundation models and LLM-agents. Benchmark capabilities, safety, risks and performance with evaluation tasks
Last seen: April 16th at 10:42am — Visit site