9x9 Scalability study

9787 games played so far.
Computed by bayeselo

Last update: Wed Feb 13 16:09:39 EST 2008

The purpose of this study is to determine the effects of additional computing power for scalable 9x9 Computer Go programs. The programs that are being tested use a popular new technique based on Monte Carlo simulations (randomly played games in this case) combined with best first tree searching called UCT.

This study tests 13 versions of two different programs which play matches between each other in a competition for ELO rating points. Elo ratings are a direct measure of playing strength. Each version of a particular program spends twice as much effort playing it's move as the previous version, as measured by the number of random games played to choose a move.

Matches are scheduled between players using a random scheduling algorithm where players of similar strength are much more likely to play each other, but any pairing is still possible. This works far better for assessment purposes than, for instance, a round robin where matches between players of considerably different skill are common.

The amount of computing effort required to test these programs at higher levels is substantial and several people have contributed computing resources to this effort. On a single computer this study would require many months of computing effort in order to get enough data to be statistically meaningful.

The two programs playing in this study are Mogo and FatMan. Mogo is probably the strongest 9x9 go playing program in the world at this time (early 2008) and FatMan is a very simple and generic UCT based program that plays on CGOS, a game server just for Go playing programs. Additionally, a popular program called "Gnugo" plays in this study as a fixed point of reference at 1800 ELO.

In the following table, there are 13 versions of each program, labeled 01 - 13. Mogo_01 does only 64 monte carlo simulations in the evaluation porition of the tree search. Mogo_02 doubles this and each subsequent version doubles the previous in number of simulations.

The same formula applies to FatMan, except the number of simulations is adjusted upward to correspond in strength (at least roughly) with the much stronger Mogo and thus FatMan_01 does 1024 simulations. To put it another way, FatMan_01 needs 1024 simulations to be roughly equal in strength to Mogo_01 which does 64 simulations.

We can compute the number of simulations for any level as:

Mogo simulations at level N = 64 * 2^(N-1)
FatMan simulations at level N = 1024 * 2^(N-1)

Rating Chart

Rank Name Elo + Games score opponent
1 bigMogo_18 3065 58 54 354 86% 2569
2 Mogo_18 2979 52 50 375 81% 2543
3 Mogo_17 2977 51 49 381 81% 2557
4 bigMogo_16 2964 68 65 208 78% 2554
5 Mogo_16 2959 41 39 582 81% 2519
6 Mogo_15 2893 40 39 579 76% 2491
7 Mogo_14 2815 37 37 602 70% 2512
8 Mogo_13 2757 36 36 634 67% 2489
9 Mogo_12 2659 35 35 633 60% 2463
10 FatMan_14 2635 56 55 286 64% 2426
11 Mogo_11 2580 37 37 623 55% 2451
12 FatMan_13 2569 38 38 573 60% 2367
13 FatMan_12 2516 37 37 618 53% 2405
14 Mogo_10 2469 38 38 644 54% 2344
15 FatMan_11 2417 38 38 602 54% 2310
16 Mogo_09 2339 40 41 616 51% 2255
17 FatMan_10 2298 41 41 624 49% 2244
18 Mogo_08 2270 40 41 635 48% 2245
19 FatMan_09 2205 43 43 610 47% 2200
20 Mogo_07 2063 45 45 598 50% 2029
21 FatMan_08 2059 44 44 604 48% 2060
22 Mogo_06 1979 42 42 615 46% 2025
23 FatMan_07 1918 44 44 599 49% 1915
24 FatMan_06 1815 43 43 587 45% 1896
25 Gnugo-3.7.11 1800 43 43 612 42% 1915
26 Mogo_05 1763 42 42 590 45% 1839
27 FatMan_05 1692 40 41 592 44% 1768
28 Mogo_04 1591 43 44 584 42% 1732
29 FatMan_04 1554 46 47 577 40% 1723
30 FatMan_03 1395 47 48 583 36% 1646
31 Mogo_03 1337 47 48 582 34% 1622
32 FatMan_02 1098 48 49 578 26% 1561
33 Mogo_02 1017 50 51 572 21% 1568
34 FatMan_01 874 51 53 559 15% 1520
35 Mogo_01 694 61 46 563 6% 1534

Command line program invocation (weakest level)
Mogomogo --9 --nbTotalSimulations 64 --playsAgainstHuman 0
FatManFatMan -l 1 -r       (level 1 = 1024 simulations)
Gnugo-3.7.11gnugo --mode gtp --capture-all-dead --chinese-rules --min-level 8 --max-level 8 --positional-superko

Rating plot

X axis:Each CPU Doubling
Y axis:ELO Rating

Rating plot with bezeir smoothing