How we configured autoscaling for Datawrapper image exports

Hi, I’m Jakub, the backend team leader at Datawrapper. Today I will dive into the challenges of running a large web application.

Datawrapper is available 24/7, but most people make visualizations on a more regular schedule. Our users create and publish a lot of charts during the day and fewer during the night. Moreover, exceptionally high usage happens during large media events such as elections. We have to adjust our infrastructure to handle demand when it’s high without wasting resources when it’s low.

When talking about software infrastructure, scaling is the practice of having several servers provide your product. Autoscaling is then the practice of automatically adjusting the number of these servers. When usage of your product increases, autoscaling starts more servers. And when the usage drops, it stops them again. So the whole system is neither overloaded because too few servers are running, nor wasting money because too many are. It’s optimized.

To achieve a correct autoscaling configuration, you have to choose the right metric to determine whether servers are currently overloaded or idling. It can be CPU, memory utilization, or, in the case I’m looking at today, the number of visualizations that users are trying to export as PNG, PDF, or SVG files. Then you have to decide how to respond to that metric: how many servers should start when it goes too high, and how many should stop when it dips too low?

Knowing all the variables, we wrote Datawrapper’s first autoscaling configuration and naturally chose to make things fast. As soon as we registered a high number of visualizations that needed exporting, we started several servers, and we stopped them equally quickly when the number of exports decreased.

Yet the starting of servers wasn't fast enough. The number of waiting visualizations often reached unacceptable levels. Visualizations waiting meant users were waiting. And no one likes to wait.

One solution to such a problem is to try to start even faster. You can, for example, reduce the size of each server by installing as little software on it as possible. But what if you could win the race by going slower?

We figured out that a better strategy was to stop our servers much more gradually. This is called a scale-in cooldown. So even after a peak in the number of visualizations to export, we still keep a good number of servers available to handle the next peak. The unacceptable waiting times are gone.

I hope you enjoyed today's software-focused blog post and, if you are a Datawrapper user, the chance to export your charts any time of the day, even during elections. Next week, our support engineer Shaylee is going to make a brand new Weekly Chart.