Singapore Launches One of the World’s First LLM Evaluation Toolkits
On May 31, the Singapore Ministry of Communications and Information launched the open-beta AI Verify Project Moonshot (“Project Moonshot”) during the 2024 Asia Tech x Singapore (ATxSG) conference to advance AI safety and security through providing baselines for integrated benchmarking, red teaming, and testing. As one of the world’s first large language models (LLMs) evaluation toolkits, this open-source toolkit aims to enable businesses to better assess their applications against specific benchmarks, such as understanding of local languages and cultural contexts. The tool also employs attack modules to test whether applications can be manipulated into “misbehaving,” such as producing violent content and making inappropriate statements. Project Moonshot was developed by the AI Verify Foundation and the Infocomm Media Development Authority (IMDA). They worked closely with industry partners including IBM, DataRobot, Singtel, and Temasek to ensure its alignment with industry needs. This marks a key step towards Singapore’s continued efforts to play a role in the development of global AI standards.