Starcraft Eval

ML experiments using featurized SC2 and Broodwar replays to more accurately predict who is going to win the game.

In Starcraft, predicting who has an advantage often requires an expert carefully reviewing the current game state (i.e. number of expansions, unit counts and positions, etc. for each player). This information is used for entertaining commentary but it can also be used for analysis like deciding how one-sided or back-and-forth a game was. In this project I experiment with using replay files to automate this type of analysis.


Given a 1v1 Starcraft game, the odds of predicting the winner by randomly guessing is 50% at any given point. So I set out to create a model which can improve upon that statistic.

Misc - SC2 vs. Broodwar

Broodwar (i.e. Starcraft 1, the predecessor to SC2) is still very popular (especially in South Korea). I personally find it more entertaining to watch than SC2. However, SC2 replay files contain much richer data that lends itself well to creating these models.

Broodwar contains the bare minimum amount of data to reproduce the game. It mostly contains player actions. This makes it very difficult to determine basic stats like how many units each player has at any given point; it's obvious when someone builds a unit but not obvious when that unit might die. The game must be simulated to attain that information.

SC2 First Experiment

My first experiments with SC2 replays yielded 62.5%. I scraped tens of thousands of replays online, parsed the replay with sc2reader, and produced a training example every 10 seconds. The example contained a lot of information on the game state at that point + the label of which player won. 9/10 replays processed became training data. 1/10 became test data. The evaluation was done after training 3 epochs over the entire training set and running the trained model on all of the test data, using simple counts of the number of correct classifications to arrive at that percent.

I have a few ideas on how to improve it, such as getting misc_stats to work or featurizing actions (since that appeared to work well with BW). I'd also like to run over the test data ignoring the first 0->1 minute of the game (and 0->2 and so on) to test how the prediction rate changes. (It should get higher with each additional minute ignored.)

Checkout the pdf at the bottom for a jupyter notebook of the training.

BW First Experiment

With the limited amount of features I had, mostly featurizing actions, I'm only able to get slightly above guessing, around 55%.