Great Network War
As many of you have noticed and experienced, in the past few weeks we’ve been dealing with some server issues. While single player has ran smoothly, online games have been crashing and acting up. Thank you for all the feedback, crash reports and emails, it helped a lot to identify the problems!
We were concentrating heavily on locating and fixing the issue (actually, rather multiple issues that combined together, as we learned later). But we also feel that we owe some explanation to our devoted players. So, if you want to know more, please read on. If not, just be assured that we do our best, and now it seems the worst problems are fixed, so the games will hopefully run smoothly again – actually, perhaps even smoother than before :)
To understand what happened in our games, we need to go way more back in time. When CGE digital started, we licensed Marmalade, a nice multiplatform C++ system. Such system is crucial for a small team, as it handles lots of platform related stuff: many little things are already solved, you get continuous support, and you do not need to rewrite all your games whenever a new version of iOS or Android gets released – the people developing the system do it, and you as a developer can concentrate just on your games.
As we always planned to create more games, we invested some time to build our own CGE system on top of Marmalade – a system perfectly suitable for implementing board games. And running on that system, Galaxy Trucker was born. Then Through the Ages. And Codenames were on their way.
But even before Through the Ages was finished, a very unfortunate thing happened – Marmalade team announced they quit their business. That meant, no further support in the future. And that meant we would need to handle lots of things on our own. It is like driving a car that slowly falls apart - everything is working so far, but you know whenever a problem occurs or whenever iOS or Android start requiring some new features (and we knew that sooner or later they will), you will have to fix or add it yourself to a huge and complicated (and not that well documented) outdated system developed by a different team. That’s not something we could afford in the long run, so even though we knew it will be a huge amount of work that slows down development of our games, we were preparing for our MarmaladExit. We were planning on doing it slowly and carefully, to not threaten the games already out, neither to stop the development of the new titles.
At that moment, IPv6 problem stroke. Some (mostly US) players reported their inability to play online, and it showed up that in the US, T-Mobile moved their mobile internet to IPv6 only, so suddenly, we needed to support IPv6 in our games now. We didn’t plan on starting our MarmaladExit with the most complex and delicate thing, network communication.
Network communication was formerly handled by third-party library, RakNet, and it served us well so far. Unfortunately, RakNet was not supporting IPv6, so it meant that we needed to completely replace it with another library and rework our whole multiplayer system (both on server and on the client side) in the process. And there, our Great Network War started.
We decided to switch libraries seamlessly, to not interrupt the games already in progress, which was very difficult. It is like fixing a car engine while the car is running, you need to carefully replace the server part by part, you need to keep a perfect backward compatibility, and the server usually supports multiple protocols at the same time (as it is impossible to update all the clients at the same moment, and we needed to support also the apps in development and the already started internal playtesting of New Leaders and Wonders expansion). Any little mistake could mean broken games and sad players.
It was also very hard to playtest, as with our limited capacities, we were unable to simulate real life environment with hundreds of users online and thousands of games in progress. We just had to be careful and hope for the best.
So, when we finally released the update that allowed our US players to enjoy multiplayer, the worst happened – the new library we picked was not optimized for some aspects of such big traffic, and actually was buggy under the load of all played games. Our server started to become unstable, and the more unstable it was, the bigger the problem grew – as the most traffic always happens when the server is restarted, and all clients try to connect again at the same time.
There was no way back, so we had to fix the stuff as quickly as possible. Our Great Network War went to its blitz phase. Our team was online almost constantly, watching the server status and was ready to restart it in case of problems, while replacing the network solution again, by another library. Our player support was holding the trenches, trying to fix the games that got broken and soothe the players that started to be (rightfully) impatient. (Meanwhile, MarmaladExit started to gain speed, as both iOS and Android required 64 bit support, our push notifications stopped working with the new Android SDK and more little details popped out that the original Marmalade didn’t support).
Anyway, the third multiplayer library, enet, looks like a good choice so far, but we had to implement it very hastily. We actually did it on two battle fronts – operational part of our team was trying to implement it as quickly as possible, to allow our multiplayer to run well again, while strategical part of the team was preparing long term solution, even more stable and safe, working well within our ongoing MarmaladExit (well, when talking about parts of the team, we mean “one guy” and “another guy”, we are not that big :)).
When examining the problems, we also realized our database was hitting its limits. We are incredibly happy that there are so many players online and so many games played, but as the database grew, serious lags started to appear, and sometimes, it was not clear what part of problem was caused by suboptimal network solution and what by database size. The statistical data from all played games are huge (hundreds of GB), and we do not want just throw them away – they are very valuable when finetuning the game and Vlaada, author of the game, is using them to balance stuff in the upcoming expansion.
To lower the server workload, we decided to move most of the older data to another database. It is like letting passengers to disembark at full speed – not as complicated as exchanging engine of a running car, but still, a little error may have huge consequences. Which exactly happened the last weekend, and we are sorry for the shock some of you experienced when your games appeared to be lost for few hours. To this moment, we still tackle some consistency problems that happened before we realized our error and fixed it.
We also disabled online statistics on our website, as it showed up these slowed down the server significantly, too. We are sorry for this, the statistics will return once we implement more considerate solution that does not increase server workload – but for now, we concentrate on the online play itself.
We are aware some decisions we made in the past have bitten us harshly, and some mistakes could have been avoided, but that’s life - we hope we will come strengthened from these lessons. At this moment, it seems we have won the main battles. The Great Network War is not over yet, there might be some guerilla bugs we missed when advancing quickly, and the last big operation that secures the reconquered territories (switching to the independently developed advanced version of the network system) still awaits. And we never know what MarmaledExit challenge pops up next.
But for now, everything looks optimistic. The server purrs like a happy kitten (which is really welcome, considering how it screamed and cried during the previous weeks), our testers seem to really enjoy the early version of Through the Ages expansion, Galaxy Trucker approaches Steam fearlessly… and our programmers can’t wait to return to their work on Codenames, to implement all the great stuff the artists and designers prepared when programmers fought their network battles.
Thanks for reading to this point. We believe we are really lucky to have such great players and hope that you will stand by us also in the future battles, even if it may sometimes require a bit more understanding and patience than we have right to expect. We will try to reward it by the best gaming experience we are able to deliver.