by Ibai Burgos

Photo by Stephen Dawson / Unsplash


A tool to obtain statistics from PGN files

Initial idea

Well, lets start from the very beginning. After finishing the EvAU, in the first semester of the university course, I started learning some basic C programming.

At first, I was quite excited since it was the first time I started programming seriously. Nevertheless, after improving and learning more, I wanted to apply my programming skills and knowledge to the computer-chess world, and I knew C was going to be good enough for what I wanted to achieve.

Therefore, after some time thinking, I came to the conclusion that it would be a good idea to create a program capable of obtaining statistics from commented PGN databases. Firstly, before starting, I took some inspiration from publicly availble github repositories, such as:

GitHub - chris13300/pgnStats: Tool to get some statistics from commented games.
Tool to get some statistics from commented games. Contribute to chris13300/pgnStats development by creating an account on GitHub.
GitHub - ebemunk/pgnstats: parses PGN files and extracts statistics for chess games
parses PGN files and extracts statistics for chess games - GitHub - ebemunk/pgnstats: parses PGN files and extracts statistics for chess games

Coding time

After having a clear initial idea, it was time to focus on what really mattered: programming. For that purpose, I decided to use CLion, a JetBrains IDE that incorporates plugins and is highly customizable. After that, I needed to decide which stats I wanted my program to obtain, and I thought about the following ones:

So, before diving into the functions to obtain the statistics, I defined some constants in a header file called types.h, which also contains the function prototypes:

#ifndef TYPES_H
#define TYPES_H

 #define MAX_MOVES 1024
 #define N 50
 void getstats(FILE *, FILE *); //gets some statistics of the input pgn file
 void deleteTags(FILE *, FILE *); //deletes the result tag of the input pgn file and writes the output to another file
 void getavgGD(FILE *, FILE *); //gets the average game duration of the input pgn file
 void getavgPC(FILE *, FILE *); //gets the average PlyCount of the input pgn file
 void getavgD(FILE *, FILE *); //gets the average depth per move of the input pgn file
 void getavgT(FILE *, FILE *); //gets the average time per move of the input pgn file


As we can see in the code snippet above, each function takes as arguments two files. In this case, the first argument refers to the input file and the second one refers to the output file. In addition to all the functions to obtain statistics, I also needed a "deleTags" function that removes the conflicting tags in the PGN file that alter and worsen the results. The function is quite simple:

void deleteTags(FILE *input, FILE *output)
  char line[MAX_MOVES];

    if(strncmp(line, "[Result ", 8) == 0 || strncmp(line, "[GameStartTime ", 15) == 0 || strncmp(line, "[GameEndTime ", 12) == 0 ) {

    // Otherwise, write the line to the output file
    fputs(line, output);

After executing this piece code, the PGN file changes from being like this:

[Event "*"]
[Site "*"]
[Date "*"]
[Round "*"]
[White "*"]
[Black "*"]
[Result "*"]
[ECO "B06"]
[GameDuration "00:04:18"]
[GameEndTime "2023-05-07T01:26:40.703 Hora de verano romance"]
[GameStartTime "2023-05-07T01:22:22.403 Hora de verano romance"]
[Opening "Robatsch (Modern) defense"]
[PlyCount "148"]
[TimeControl "60+1"]

to being like this:

[Event "*"]
[Site "*"]
[Date "*"]
[Round "*"]
[White "*"]
[Black "*"]
[ECO "B06"]
[GameDuration "00:04:18"]
[Opening "Robatsch (Modern) defense"]
[PlyCount "148"]
[TimeControl "60+1"]

So, finally, after filtering out unwanted tags, the PGN file is ready for some statistical analysis, a task carried out by the remaining functions.

Compiling is funny

As many people may know, code written in C needs to be compiled before it can be executed since C is a compiled programming language and not an interpreted programming language, which is something completely different.

Therefore, after finishing the code, I thought it would be a good idea to create a makefile for my little project, in order to make things easier for the people who wanted to use the tool. I learned a little bit about makefiles at university but I was not quite sure about their use and how to create one for my project. After some research, I managed to create a very simple makefile, which looks like this:

CFLAGS=-Wall -I.
all: PgnStats
PgnStats: main.o functions.o; $(CC) -o PgnStats main.o functions.o
clean: ;rm -f *.o

The makefile includes a clean rule to delete the *.o files (the compiled code) and uses gcc for compilation. Even though this is already fine, I also created a very simple windows script that does basically the same:

gcc -c functions.c main.c
gcc -o PgnStats functions.o main.o
del *.o

In summary, both options are valid and create a functional PgnStats executable file succesfully.


And finally, the greatest and most important part, testing the tool. For this task, I used some PGNs from various public sources. First tests helped me to understand that I needed to delete some tags of the PGN file (that is why I created the deleteTags function), and, after fixing everything, the tool was working flawlessly.

Here you can see an example (which assumes you have already compiled the tool and that it is placed in the same folder of the PGN file you want to obtain statistics from) :

To run the tool in Linux:

cd /path/of/the/files
./PgnStats name_of_your_pgn.pgn

To run the tool in Windows

cd path\of\the\files
PgnStats.exe name_of_your_pgn.pgn

Sample output

The total number of games contained in the pgn file is: 31320
The number of draws is: 30365 (96.95%)
The number of white wins is: 786 (2.51%)
The number of black wins is: 169 (0.54%)
The average game duration is: 198.32 seconds
The average plycount is: 105
The average depth per move is: 29
The average time per move is: 2.05 seconds 

Final conclusions

All in all, as we can see, the tool is able to handle relatively big PGN files quite well and it manages to obtain the desired statistics correctly. In my opinion, it is quite useful because it is very easy to use and, for its task, it is a lot faster than a regular chess GUI.

I hope you enjoyed this blog post. In case you want to know more about the project, the code and some documentation, check here:

GitHub - IbaiBuR/PgnStats: Little C program to get some basic stats from a PGN file
Little C program to get some basic stats from a PGN file - GitHub - IbaiBuR/PgnStats: Little C program to get some basic stats from a PGN file