
I recently decided to review my graduation project, which was about an application to predict the decision results in a business using Genetic Algorithms. Why? Just because I wanted to practice developing something different, something I am not used to anymore and find the motivation to create new things.
When I was studying at the university I used to play football with my friends, and I remember one of them creating an algorithm to select the teams before playing, and making them as fair as possible. It was not a complex problem, but yet fun to resolve using GAs.
Nowadays I still play with friends, and not just football, I also play other sports, so I have been thinking about developing an application for this purpose, selecting the teams in a way that the game will be fun. Perhaps it doesn’t make sense if we think that we can easily do it when we all know each other, but with the usage of applications to join games where you don’t know people, I think it would be interesting to have this feature.
Altho I mentioned the GAs here, the goal of this small project is to review the theory of GAs and try to analyze if it is worth being used for this particular use case.
As I mentioned before, it’s been a while since I did something related to GA so I need a plan on how to start, a sort of guide or roadmap to define how I am going to make this, so just decided to do it as I did it when I learned:
- Part 1 (This Article): I am going to install Matlab (or even better, search for an alternative open source), explain how the algorithm should work, and write the code. You will find it here
- Part 2: I will migrate the code to JS and using React I will build a simple interface to be able to execute the algorithm.
- Part 3: Finally, I will build a BFF and incorporate register and login screens so people can create games and invite people to join their games, and where the organizer can trigger the algorithm to see different options.
How does it sound? Let’s start!
A little bit of theory!
A GA is a search heuristic algorithm inspired by Charles Darwin’s theory of natural evolution. It reflects the natural process to select the fittest individuals selected for reproduction and move to the next generation.
But before talking about heuristic algorithms we need to understand what alternatives we have, like deterministic algorithms, which always return the same output for the same input, passing always through the same states to get the result.
To explain it with an example, let suppose we have 4 players to play tennis, and we have a player score defined by one number from 1 to 5. We need to find the best combination of players for the game to be level.
The deterministic way to resolved this problem would be calculating all the different options and picking the best team scores, for example:
Pa: 2 – Pb: 4 – Pc: 3 – Pd: 3
Option 1: Pa + Pb vs Pc + Pd => 6 vs 6
Option 2: Pa + Pc vs Pb + Pd => 5 vs 7
Option 3: Pa + Pd vs Pb + Pc => 5 vs 7
As you can see the Option 1 in this case, is the best one, since both teams would have a score of 6. But this problem was simple, just 3 possible solutions and comparing two teams! super simple actually, so let suppose that now, we want to do the same but for a football game 5 vs 5, how many unique combinations should we calculate? The answer is 126.
The way to calculate this is using the formula to calculate the number of possible combinations nCr (if you want to read more about this topic, you can visit this link). Is not complicated to apply the formula, but we need to understand how to use it with our problem, so let see for our 5 vs 5 matches:
- First we need to understand what is n and what is r. Since we are going mix 10 players, n = 10, and because we want to combine teams of 5 players, r = 5
- The previous formula will give us the number of possible teams, and because we want to compare 2 of them at the same time, we should divide it by 2.
- nCr = 252 (in our example) and dividing it by 2 is 126.
Wait! it still sounds feasible to be done in a paper right? But what happens if we increase the number, football 11? Now the number of possible combinations grows to 352716, and now we can start thinking about how useful is to resolve this problem using GA instead of a deterministic algorithm right? And not just that, what if we want to organize a tournament, and we need to compare now not just 2 teams, but the group in general? This problem can become as complex as we want, but let’s leave that part for later, for now, let’s focus on defining two teams.
In short, we can say that when we have more variables, and the problem becomes more complex, it also requires more CPU to calculate the solution, and using a GA is a good way to reduce the time it takes to get to a solution.
How a GA works?
I like explaining them with 5 phases, and each one will be repeated every generation.
- Initial Population: Composed by multiple solutions, where each solution is called Chromosome and every variable used to form a chromosome is called a Gene
- Calculate chromosome weight: At this point we use a Fitness Function to calculate how fit an individual is and assign a score.
- Selection: In this phase, and using the calculated score, we select the individuals that are going to survive and move to the next generation. At this point, it is super important to be careful in the selection of the method, since we could bias our algorithm and make it always arrive to the same solution, and not neccesarily the best one.
- Crossover: This phase emulates the reproduction, and we need here to take different solutions and mix them to create a new one.
- Mutation: Now we pick random individuals and we just change them
The algorithm could end because the next generation will not produce very different results than the previous ones, or because we decided to limit the number of generations to be executed. Either of the reasons needs to use a method to pick the fittest solutions and output them.
Let’s apply this very simple explanation to our problem:

You might notice that developing a solution for this problem using a GA is over-engineer our problem, but anyway let’s do it!
Alternative to Matlab
Searching for different alternatives to Matlab I found this article that mentions 3 main options, and I picked the first one, just because it was the first one, it makes sense right? The name of the app is GNU Octave, available for Linux, Windows, and Mac, and it looks pretty similar to Matlab. I would recommend you install it since I will share the code I am writing.
Let’s code!
Now that we know the tool we are going to use, we can continue writing the code. You can find the full working code here.
Fit Function
This function will be responsible for calculating how “equals” are the teams, so the easiest way to do it is by calculating the sum of the players score in Team A and Team B, then subtracting the values and calculating the modulus, so the formula would be: F1 = | Σ P1..N/2 – Σ P (N/2 + 1) .. N |
When the result of F1 is closer to 0, the team selection is better, and the problem is we need to have. a higher number in order to be able to use the “Roulette Wheel selection” algorithm, so what I did what to calculate the maximum possible difference between the teams, and then I subtracted the result of F1.
So the formula to calculate the max possible score (or worst possible scenario) is F2 = (TeamSize * 10) – TeamSize
And finally, the score would be Score = F2 – F1. Here I am talking about scores, but it is more common to talk about weights, hence you will see a mix of terminology in my code, sorry for that!
function scores = fitCalc(population, players)
[total, len] = size(population);
maxScore = ((len/2)*10 - (len/2));
scores = zeros(1, total);
for i=1:total
selectedPopulation = players(1, population(i, :));
teamA = sum(selectedPopulation(1, 1:(len/2)));
teamB = sum(selectedPopulation(1, (len/2)+1:len));
score = maxScore - abs(teamA - teamB);
scores(1, i) = score; # to invert the definition of Max and Min
endfor
endfunction
Select Function
I will not explain how the Roulette Wheel Selection algorithm works, since there are plenty of articles talking about it, you can even check this Wikipedia page.
But I would like to mention just one thing I was not doing at the beginning which is sorting the weights array before calculating the cumsum, and because of that, I was increasing the selection probability of Chromosomes which the score/weight was not really good.
function choice = wheelSelection(weights)
[weights, indexes] = sort(weights, "ascend");
accumulation = cumsum(weights);
p = rand() * accumulation(end);
chosen_index = 0;
for index = 1 : length(accumulation)
if (accumulation(index) > p)
chosen_index = indexes(1, index);
break
endif
endfor
choice = chosen_index;
endfunction
Crossover Function
At this point I think it is important to stop and explain how I am doing this and why.
There are different techniques to cross two chromosomes, but sadly I couldn’t use them as easily as it looks, since our use case has a particular constraint, which is “a player can’t play in both teams at the same time”.
To make it clearer I will start showing how a one-point crossover algorithm works (I got the image from Wikipedia)

As you can see in the image, the crossing consists of picking one point in both chromosomes and inter-change their values. As shown in the following image, if we do that, we could cause having one or more players in both teams at the same time, and avoid selecting one or more of the players.

So what I decided to do is to mix 1 by one of the elements, and then remove the repeated ones to form the second child. I think the following image can explain it better.

We could say that this crossover algorithm looks like a uniform crossover, but to be honest, I am not 100% sure if that is the name, so just decided to leave it unnamed.
function crossoveredPopulation = crossoverFunction(population, newPopulation, percentage, lastIndex)
[total, len] = size(newPopulation);
[totalExcluded, lenExcluded] = size(population);
totalToCrossover = ceil(percentage * total);
selectedIndexes = zeros(1, totalToCrossover * 2);
crossoveredPopulation = newPopulation;
teamSize = len / 2;
for index=1:totalToCrossover
do
selectedIndex1 = ceil(rand()*(totalExcluded-1) + 1);
selectedIndex2 = ceil(rand()*(totalExcluded-1) + 1);
until any(selectedIndex1 != selectedIndexes) && any(selectedIndex2 != selectedIndexes)
crossedValues = [population(selectedIndex1, :);population(selectedIndex2, :)];
crossedValues = crossedValues(:)';
[crossedValues1, crossedValues2] = unique(crossedValues, "stable");
crossedValues2 = crossedValues2'; generalIndex = lastIndex + index;
crossoveredPopulation(generalIndex, :) = crossedValues1(1, :);
selectedIndexes(1, index) = selectedIndex1;
index += 1;
generalIndex += 1;
crossoveredPopulation(generalIndex, :) = crossedValues2(1, :);
selectedIndexes(1, index) = selectedIndex2;
endfor
endfunction
Mutation Function
I don’t have too much to say about this step, since it is simpler than the crossover. In this case, we just need to randomly select two players, one from Team A and the other from Team B, and swap them, and this chromosome will result in the one injected into the new population.
function mutatedPopulation = mutateFunction(population, newPopulation, percentage, lastIndex)
[total, len] = size(newPopulation);
[totalExcluded, lenExcluded] = size(population);
totalToSelect = ceil(percentage * total);
selectedIndexes = zeros(1, totalToSelect);
mutatedPopulation = newPopulation;
teamSize = len / 2;
for index=1:totalToSelect
do
selectedIndex = ceil(rand()*(totalExcluded-1) + 1);
until any(selectedIndex != selectedIndexes);
leftTeamIndex = ceil(rand()*(teamSize-1) + 1);
rightTeamIndex = ceil(rand()*(teamSize-1) + teamSize + 1);
generalIndex = lastIndex + index;
mutatedPopulation(generalIndex, :) = population(selectedIndex, :);
backupLeftItem = mutatedPopulation(generalIndex, leftTeamIndex);
mutatedPopulation(generalIndex, leftTeamIndex) = mutatedPopulation(generalIndex, rightTeamIndex);
mutatedPopulation(generalIndex, rightTeamIndex) = backupLeftItem;
selectedIndexes(1, index) = selectedIndex;
endfor
endfunction
General config
Finally, we have to talk about how we configure the algorithm to run, and we have basically 5 parameters to play with:
- Population Size: How many chromosomes we want to include. We use 40 which is not a big number, but it is enough for us.
- Generations: How many times we want to run the “evolution” cycle.
- % of Selection: How many chromosomes are going to be picked using the selection algorithm. In our case 70%.
- % of Crossover: How many chromosomes we want to cross, this should be an even number, in our case we use 20%.
- % of Mutation: How many chromosomes we pick and mutate. We use 10%
You could configure your algorithm as you wish and play with it to see the results you get, but keep in mind that your algorithm shouldn’t converge to a solution too fast or not even converge at all, which means having 100% of the population has the best possible result. Why? Because it could mean that you biased your algorithm to arrive always at the same result and you didn’t explore all the possible solutions.
You could also run the mutation algorithm randomly, in my case, I am always running it.
Finally, you should consider that running 10 generations should result in a lower scored population than running 100 generations, but it doesn’t mean that you will have more chromosomes with the best possible result, this means that your population should have an avg score higher than running fewer generations.
Side note: A first attempt was done using the selection algorithm and running the crossover and mutation using the excluded chromosomes in the selection but the results were not good, since the avg score was not growing always. I fixed it and now we use the selected population to pick two chromosomes and cross them, and one of them is selected to be mutated.
Results
After running the algorithm a couple of times we can see very good results, and the graph showing the evolution of the AVG SCORES is also good, since the minimum values keep growing and making the population to evolve, exploring different possible solutions.

I think the results are pretty clear, but just in case I would like to explain that TOTAL_POSSIBLE_TEAMS shows how many teams with the maximum score have been found, but it doesn’t mean they are unique. That is why the ans variable shows only one team in this particular case, because altho it found 15 chromosomes with the best result, the 15s look the same.

Finally, we can see in the plot image, that the minimum values tend to grow while we keep variating the avg score, which means our population evolves, and at the same time we are exploring different results.
Conclusion
We reached the end of the article, and I would like to close it by saying that the first step, review the theory and run an example code has been achieved. It was fun writing it again and analyzing what was happening when it was not working, so now I think we can continue moving to the next step, writing it in JS with React. Remember that I am going to share it in my GitHub account so if you want to see how I migrated it, wait for the second part!
I hope you find this useful, and of course, there are plenty of improving points, I need to keep studying and analyzing what is the best way to do this, for sure play with other selection, crossover, and mutation algorithms, but let’s leave it for later, once we have the application in JS up and running.