Accuracy scores on the test set (2,710 problems) of
MM-IQ.
# | Model | Type | Source | Date | Mean | LO | Math | 2D-G | 3D-G | VI | TM | SR | CO |
1 | Human Performance* 🥇 | - | Link | 2024-09-04 | 51.27 | 61.36 | 45.03 | 60.11 | 47.48 | 46.67 | 55.61 | 36.63 | 65.79 |
2 | Claude-3.5-sonnet 🥈 | Proprietary | Link | 2024-11-14 | 27.49 | 23.41 | 29.48 | 26.60 | 24.37 | 35.56 | 25.69 | 27.72 | 42.11 |
3 | QVQ-72B-Preview 🥉 | Open-Source 🖼️ | Link | 2024-11-22 | 26.94 | 28.91 | 25.59 | 29.23 | 26.38 | 26.67 | 25.43 | 22.77 | 34.21 |
4 | GPT-4o | Proprietary | Link | 2024-11-12 | 26.87 | 25.52 | 25.70 | 28.32 | 27.64 | 26.67 | 25.69 | 27.72 | 50.00 |
5 | Gemini-1.5-pro-002 | Proprietary | Link | 2024-11-09 | 26.86 | 19.53 | 27.43 | 28.03 | 25.88 | 24.44 | 31.17 | 25.74 | 39.47 |
6 | Qwen2-VL-72B-Instruct | Open-Source 🖼️ | Link | 2024-11-13 | 26.38 | 24.74 | 24.40 | 28.60 | 27.39 | 24.44 | 26.93 | 32.67 | 23.68 |
7 | Deepseek-vl-7b-chat | Open-Source 🖼️ | Link | 2024-11-14 | 22.17 | 19.53 | 20.30 | 22.25 | 27.39 | 35.56 | 23.72 | 24.75 | 15.79 |
8 | LLaVA-1.6-7B | Open-Source 🖼️ | Link | 2024-11-03 | 19.45 | 24.22 | 20.34 | 17.92 | 15.83 | 20.00 | 18.23 | 17.82 | 18.42 |
Reasoning Paradigm: LO: Logical Operation, 2D-G: 2D-Geometry, 3D-G: 3D-Geometry,
VI: Visual Instruction, TM: Temporal Movement, SR: Spatial Relationship, CO: Concrete Object.
🚨 To submit your results to the leaderboard, please send to this email with your result json files.