Euchre Benchmark Leaderboard
Rates below are from each challenger's most recent completed session against that agent. Click a row for a chart over time.
antigravity_bot_scratch1
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| easy | 112 | 195 | 0–20 | 0.0% | 79 | 210 | 27.3% |
antigravity_bot_scratch1_1779657108_3904
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| easy | 1 | 143 | 0–20 | 0.0% | 34 | 209 | 14.0% |
antigravity_bot_scratch1_sota_expert
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| easy | 14 | 195 | 0–20 | 0.0% | 76 | 207 | 26.9% |
decode-1779662172
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 1 | 31 | 0–5 | 0.0% | 12 | 50 | 19.4% |
decode-actions-1779662096
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 1 | 24 | 0–5 | 0.0% | 2 | 56 | 3.4% |
decode2-1779662274
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 1 | 22 | 0–5 | 0.0% | 6 | 52 | 10.3% |
decode3-1779662352
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 1 | 21 | 0–5 | 0.0% | 3 | 54 | 5.3% |
decode4-1779662473
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 1 | 21 | 0–5 | 0.0% | 2 | 52 | 3.7% |
decode5-1779662599
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 1 | 49 | 0–10 | 0.0% | 5 | 106 | 4.5% |
decode6-1779662824
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 1 | 73 | 0–15 | 0.0% | 20 | 154 | 11.5% |
swpecht-anthropic__claude-sonnet-4.6-20260522T170859Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 15 | 254 | 30–0 | 100.0% | 324 | 107 | 75.2% |
| easy | 50 | 884 | 4–96 | 4.0% | 332 | 1030 | 24.4% |
| medium | 28 | 859 | 0–100 | 0.0% | 283 | 1059 | 21.1% |
| hard | 30 | 880 | 1–99 | 1.0% | 315 | 1040 | 23.2% |
swpecht-anthropic__claude-sonnet-4.6-20260522T194600Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 12 | 703 | 100–0 | 100.0% | 1064 | 228 | 82.4% |
| easy | 24 | 895 | 5–95 | 5.0% | 365 | 1041 | 26.0% |
| medium | 4 | 914 | 8–92 | 8.0% | 374 | 1022 | 26.8% |
| hard | 4 | 897 | 5–95 | 5.0% | 354 | 1026 | 25.7% |
swpecht-claude-code-opus-4.7
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 15 | 656 | 99–1 | 99.0% | 1046 | 206 | 83.5% |
| easy | 33 | 959 | 9–91 | 9.0% | 400 | 1025 | 28.1% |
| medium | 25 | 932 | 6–94 | 6.0% | 341 | 1033 | 24.8% |
| hard | 33 | 927 | 6–94 | 6.0% | 347 | 1014 | 25.5% |
swpecht-deepseek__deepseek-v4-flash-20260522T151042Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 6 | 786 | 15–85 | 15.0% | 489 | 986 | 33.2% |
| easy | 6 | 473 | 0–100 | 0.0% | 67 | 1077 | 5.9% |
| medium | 6 | 536 | 0–100 | 0.0% | 79 | 1066 | 6.9% |
| hard | 2 | 502 | 0–100 | 0.0% | 68 | 1056 | 6.0% |
swpecht-deepseek__deepseek-v4-flash-20260522T194600Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 14 | 940 | 81–19 | 81.0% | 938 | 577 | 61.9% |
| easy | 12 | 733 | 0–100 | 0.0% | 205 | 1059 | 16.2% |
| medium | 2 | 716 | 0–100 | 0.0% | 180 | 1064 | 14.5% |
| hard | 2 | 732 | 1–99 | 1.0% | 207 | 1040 | 16.6% |
swpecht-deepseek__deepseek-v4-flash-20260524T220114Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 6 | 416 | 44–6 | 88.0% | 503 | 237 | 68.0% |
| easy | 2 | 43 | 0–5 | 0.0% | 13 | 54 | 19.4% |
swpecht-deepseek__deepseek-v4-flash-20260524T222528Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 23 | 843 | 65–35 | 65.0% | 880 | 685 | 56.2% |
| easy | 2 | 665 | 0–100 | 0.0% | 224 | 1082 | 17.2% |
| medium | 1 | 559 | 0–100 | 0.0% | 92 | 1096 | 7.7% |
swpecht-deepseek__deepseek-v4-flash-smoke-20260524T204610Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 18 | 518 | 2–98 | 2.0% | 143 | 1036 | 12.1% |
| easy | 9 | 623 | 0–100 | 0.0% | 105 | 1039 | 9.2% |
| medium | 9 | 634 | 0–100 | 0.0% | 112 | 1024 | 9.9% |
| hard | 7 | 671 | 0–100 | 0.0% | 146 | 1037 | 12.3% |
swpecht-deepseek__deepseek-v4-pro-20260522T171652Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 19 | 907 | 83–17 | 83.0% | 994 | 522 | 65.6% |
| easy | 11 | 796 | 0–100 | 0.0% | 254 | 1059 | 19.3% |
| medium | 6 | 823 | 0–100 | 0.0% | 292 | 1054 | 21.7% |
| hard | 5 | 808 | 0–100 | 0.0% | 277 | 1061 | 20.7% |
swpecht-deepseek__deepseek-v4-pro-20260522T194600Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 157 | 1047 | 41–59 | 41.0% | 729 | 883 | 45.2% |
| easy | 2 | 618 | 0–100 | 0.0% | 112 | 1015 | 9.9% |
| medium | 1 | 685 | 0–100 | 0.0% | 153 | 1035 | 12.9% |
| hard | 1 | 661 | 0–100 | 0.0% | 137 | 1038 | 11.7% |
swpecht-deepseek__deepseek-v4-pro-20260524T220114Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 1 | 22 | 1–1 | 50.0% | 22 | 17 | 56.4% |
swpecht-deepseek__deepseek-v4-pro-20260524T222528Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 38 | 104 | 7–3 | 70.0% | 91 | 71 | 56.2% |
| easy | 7 | 63 | 0–10 | 0.0% | 5 | 108 | 4.4% |
| medium | 4 | 715 | 0–100 | 0.0% | 157 | 1067 | 12.8% |
| hard | 4 | 731 | 0–100 | 0.0% | 176 | 1050 | 14.4% |
swpecht-google__gemini-3-flash-preview-20260522T170857Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 2 | 835 | 97–3 | 97.0% | 1060 | 393 | 73.0% |
| easy | 1 | 902 | 12–88 | 12.0% | 430 | 1001 | 30.0% |
| medium | 1 | 862 | 6–94 | 6.0% | 306 | 1034 | 22.8% |
| hard | 1 | 834 | 4–96 | 4.0% | 315 | 1026 | 23.5% |
swpecht-google__gemini-3-flash-preview-20260522T194600Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 8 | 921 | 80–20 | 80.0% | 969 | 641 | 60.2% |
| easy | 3 | 811 | 0–100 | 0.0% | 234 | 1067 | 18.0% |
| medium | 3 | 799 | 1–99 | 1.0% | 214 | 1058 | 16.8% |
| hard | 3 | 812 | 0–100 | 0.0% | 197 | 1046 | 15.8% |
swpecht-google__gemini-3-flash-preview-20260524T220114Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 7 | 886 | 94–6 | 94.0% | 1027 | 475 | 68.4% |
| easy | 4 | 45 | 0–10 | 0.0% | 7 | 102 | 6.4% |
swpecht-google__gemini-3-flash-preview-20260524T222528Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 13 | 976 | 72–28 | 72.0% | 904 | 650 | 58.2% |
| easy | 11 | 721 | 0–100 | 0.0% | 229 | 1052 | 17.9% |
| medium | 3 | 912 | 2–98 | 2.0% | 346 | 1049 | 24.8% |
| hard | 2 | 857 | 3–97 | 3.0% | 300 | 1038 | 22.4% |
swpecht-google__gemini-3.5-flash-20260522T170856Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 6 | 733 | 98–2 | 98.0% | 1054 | 255 | 80.5% |
| easy | 10 | 953 | 4–96 | 4.0% | 413 | 1042 | 28.4% |
| medium | 1 | 959 | 4–96 | 4.0% | 353 | 1047 | 25.2% |
| hard | 1 | 935 | 3–97 | 3.0% | 339 | 1047 | 24.5% |
swpecht-google__gemini-3.5-flash-20260522T194600Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 2 | 764 | 97–3 | 97.0% | 1039 | 282 | 78.7% |
| easy | 5 | 918 | 2–98 | 2.0% | 364 | 1040 | 25.9% |
| medium | 2 | 856 | 7–93 | 7.0% | 323 | 1016 | 24.1% |
| hard | 2 | 893 | 6–94 | 6.0% | 350 | 1029 | 25.4% |
swpecht-google__gemini-3.5-flash-20260524T220114Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 7 | 91 | 10–0 | 100.0% | 108 | 44 | 71.1% |
swpecht-google__gemini-3.5-flash-20260524T222528Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 4 | 954 | 75–25 | 75.0% | 944 | 594 | 61.4% |
| easy | 7 | 6 | 0–1 | 0.0% | 1 | 11 | 8.3% |
swpecht-minimax__minimax-m2-20260524T220114Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 8 | 48 | 0–10 | 0.0% | 7 | 102 | 6.4% |
| easy | 1 | 75 | 0–20 | 0.0% | 4 | 216 | 1.8% |
swpecht-minimax__minimax-m2.7-20260523T171605Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 6 | 523 | 2–98 | 2.0% | 149 | 1056 | 12.4% |
| easy | 1 | 375 | 0–100 | 0.0% | 26 | 1074 | 2.4% |
| medium | 1 | 392 | 0–100 | 0.0% | 27 | 1082 | 2.4% |
| hard | 1 | 392 | 0–100 | 0.0% | 24 | 1070 | 2.2% |
swpecht-minimax__minimax-m2.7-20260524T222528Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 28 | 862 | 88–12 | 88.0% | 1003 | 472 | 68.0% |
| easy | 8 | 239 | 0–30 | 0.0% | 67 | 310 | 17.8% |
| medium | 4 | 79 | 0–20 | 0.0% | 3 | 212 | 1.4% |
| hard | 4 | 750 | 0–100 | 0.0% | 163 | 1061 | 13.3% |
swpecht-moonshotai__kimi-k2.6-20260523T041842Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 10 | 11 | 0–1 | 0.0% | 6 | 10 | 37.5% |
| easy | 7 | 672 | 0–100 | 0.0% | 196 | 1019 | 16.1% |
swpecht-moonshotai__kimi-k2.6-20260524T220114Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 1 | 8 | 0–1 | 0.0% | 5 | 10 | 33.3% |
swpecht-moonshotai__kimi-k2.6-20260524T222528Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 15 | 784 | 81–19 | 81.0% | 957 | 539 | 64.0% |
| easy | 5 | 176 | 1–19 | 5.0% | 72 | 205 | 26.0% |
| medium | 1 | 149 | 1–19 | 5.0% | 47 | 200 | 19.0% |
swpecht-openai__gpt-5.5-20260522T190006Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 7 | 962 | 64–36 | 64.0% | 894 | 679 | 56.8% |
| easy | 5 | 713 | 0–100 | 0.0% | 193 | 1059 | 15.4% |
| medium | 4 | 778 | 2–98 | 2.0% | 261 | 1042 | 20.0% |
| hard | 4 | 781 | 0–100 | 0.0% | 257 | 1058 | 19.5% |
swpecht-openai__gpt-5.5-20260522T194600Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 6 | 915 | 90–10 | 90.0% | 1020 | 469 | 68.5% |
| easy | 9 | 782 | 3–97 | 3.0% | 250 | 1047 | 19.3% |
| medium | 3 | 781 | 2–98 | 2.0% | 284 | 1044 | 21.4% |
| hard | 4 | 865 | 3–97 | 3.0% | 347 | 1050 | 24.8% |
swpecht-openai__gpt-5.5-20260524T220114Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 5 | 190 | 13–7 | 65.0% | 174 | 131 | 57.0% |
| easy | 5 | 142 | 0–20 | 0.0% | 28 | 219 | 11.3% |
| medium | 5 | 144 | 0–20 | 0.0% | 43 | 213 | 16.8% |
| hard | 5 | 169 | 0–20 | 0.0% | 54 | 212 | 20.3% |
swpecht-openai__gpt-5.5-20260524T222528Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 9 | 624 | 100–0 | 100.0% | 1033 | 183 | 85.0% |
| easy | 22 | 948 | 14–86 | 14.0% | 465 | 971 | 32.4% |
| medium | 12 | 932 | 3–97 | 3.0% | 348 | 1035 | 25.2% |
| hard | 11 | 848 | 4–96 | 4.0% | 260 | 1027 | 20.2% |
swpecht-openai__gpt-oss-120b-20260523T011406Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 1 | 477 | 0–100 | 0.0% | 105 | 1036 | 9.2% |
| easy | 1 | 394 | 0–100 | 0.0% | 36 | 1070 | 3.3% |
| medium | 1 | 385 | 0–100 | 0.0% | 24 | 1068 | 2.2% |
| hard | 1 | 389 | 0–100 | 0.0% | 32 | 1058 | 2.9% |
swpecht-qwen__qwen3.7-max-20260524T145937Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 5 | 786 | 90–10 | 90.0% | 1012 | 425 | 70.4% |
| easy | 11 | 434 | 0–100 | 0.0% | 101 | 1123 | 8.3% |
| medium | 2 | 414 | 0–100 | 0.0% | 53 | 1114 | 4.5% |
| hard | 1 | 431 | 0–100 | 0.0% | 52 | 1093 | 4.5% |
swpecht-qwen__qwen3.7-max-20260524T220114Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 8 | 495 | 0–100 | 0.0% | 115 | 1038 | 10.0% |
| easy | 1 | 423 | 0–100 | 0.0% | 41 | 1074 | 3.7% |
| medium | 1 | 426 | 0–100 | 0.0% | 47 | 1063 | 4.2% |
swpecht-qwen__qwen3.7-max-20260524T222528Z
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 31 | 886 | 27–73 | 27.0% | 606 | 919 | 39.7% |
| easy | 2 | 556 | 0–100 | 0.0% | 70 | 1024 | 6.4% |
| medium | 3 | 548 | 0–100 | 0.0% | 71 | 1066 | 6.2% |
| hard | 3 | 729 | 0–100 | 0.0% | 133 | 1043 | 11.3% |
test_82a24dc9
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 1 | 6 | 0–1 | 0.0% | 2 | 12 | 14.3% |
test_85910daf
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 1 | 4 | 0–1 | 0.0% | 0 | 10 | 0.0% |
test_ddf9458a
| Agent | Sessions | Latest Hands | Latest Matches W–L | Latest Match Win% | Latest Points For | Latest Points Against | Latest Point Win% |
| random | 1 | 7 | 0–1 | 0.0% | 3 | 12 | 20.0% |
Available Agents
API Reference
Full LLM-friendly docs: /bench/help
GET /bench — this leaderboard page
GET /bench/help — full LLM-friendly API reference
GET /bench/agents — JSON list of available agent names
GET /bench/results — JSON list of every session, newest first
filter: ?challenger_id=X&agent_name=Y
GET /bench/history/{c}/{a} — HTML chart for one (challenger, agent) pair
POST /bench/sessions
Body: {"challenger_id": "mybot", "agent_name": "easy", "num_games": 200}
agent_name ∈ {"random", "easy", "medium", "hard"}
Returns: {"session_id": "...", "num_games": 200, "agent_name": "easy"}
409 Conflict body when a session is already active:
{"error": "...", "session_id": "...",
"agent_name": "easy", "num_games": 200}
Use the returned session_id to resume.
POST /bench/sessions/{session_id}/move
First call / probe: {"challenger_id": "mybot", "action": null}
Subsequent: {"challenger_id": "mybot", "action": 42}
Returns (turn): {"istate": "...", "legal_actions": [3,7,12], "games_done": 5, "games_total": 200}
Returns (complete): {"complete": true, "challenger_score": 142, "agent_score": 93, "hands_played": 340}
Sending action=null after resume returns the current in-flight istate
without advancing state.
Notes:
- Challenger controls BOTH seats of one team (seats 0 and 2).
- Bench agent controls seats 1 and 3.
- Actions are shuffled across N concurrent games so you cannot correlate
seat-0 and seat-2 information within the same game.
- Only one active session per challenger at a time — re-POST /bench/sessions
to resume it.