CARE ET

ARPA-H seeks proposals to develop new evaluation technologies for detecting hallucinations and other inaccuracies in medical Large Language Model (LLM) output for patient-facing applications. The Chatbot Accuracy and Reliability Evaluation (CARE) ET endeavors to produce tools and technology that evaluates output with the efficiency of computational methods and the accuracy of human experts. By creating scalable and cost-effective LLM evaluation technologies, CARE seeks to accelerate the creation of trustworthy medical chatbots and enable access to more reliable health-related information.