Large Language Model Assessment in English Contexts / 英文语境下的人工智能大语言模型评测

by Zhenhui(Jack) Jiang, Xiaoyu Miao, Jiaxin Li / 蒋镇辉,苗霄宇,李佳欣
HKU Business School Shenzhen Research Institute

Please refer to the report for details on metrics, tasks and models.
Updated 02/2024.

综合排名大模型名称机构回答获取方式自然语言能力专业学科能力安全与责任综合得分
1GPT4-TurboOpenAIAPI91.0176.7778.0482.89
2Gemini ProGoogle网页85.9668.2481.1878.95
3Llama2 70BMetaAPI80.0960.8985.1275.27
6GPT4OpenAIAPI83.9976.6454.8873.70
4文心一言4 (ERNIEBot-4)百度API81.7867.3267.8673.33
5Claude2Anthropic网页77.8865.3575.2473.13
7GPT3.5-TurboOpenAIAPI83.1263.2659.4570.27
8商汤日日新 (SenseNova)商汤科技API74.1164.0469.2169.53
9通义千问2.0 (qwen-max)阿里巴巴API76.3955.9169.4567.90
10MiniMax (abab5.5-chat)MiniMaxAPI70.8161.6849.5562.08
11讯飞星火v3.0科大讯飞API70.2455.2556.0661.55
12智谱清言 (ChatGLM3)清华&智谱API70.6645.4165.9561.24
13百川大模型(Baichuan2)百川智能API63.6753.8161.6059.93
14360智脑 (360GPT_S2_V9)360API68.9551.3153.7859.14
15悟道·天鹰 (AquilaChat-7B)智源研究院API56.8226.5156.6147.00
16BLOOMZ-7BBigScienceAPI51.4432.1547.3244.10