Tech Interview: Building a Backend for Millions of Daily Requests, Hourly Peaks, And Rapid Growth
Our backend team needs to ensure our servers can deal with over 100 million requests each month and plan ahead for that number to grow. In this interview, backend developer Igor tells us how the team tackles this task, along with some of the unique challenges that come with digital health apps
Backend developer, Igor Zmitrovich, gives us insight into how the team faces the challenges that come with MyTherapy’s ever-growing userbase, millions of requests each day, and peaks that come on the hour, every hour. Read how the solutions are the result of an effective Scrum implementation and a company culture that ensures every member of the team can voice their opinions.
MyTherapy’s userbase has grown significantly over the last couple of years. As a backend developer, what challenges have you faced as a result of this growth, and how have you tackled them?
One of the challenges with our app is that most people set their reminders on the hour. So, every hour there is a big peak, and then for the rest of the hour there is relatively little until the next hour, and these peaks put a strain on the backend. On average, around 20,000 requests come flooding in at pretty much the same time, which is pretty significant. The peaks weren’t a big issue when we had a small number of users but as our userbase started growing, some bottlenecks started to appear in our code. We needed to tackle this load because it is always expected and would only get worse as we gain users. The first thing we did was to increase the number of machines, the number of instances that serve this load. We also scale up the server capacity at peak times, so minutes before the peak we start provisioning them and decrease the capacity to the standard level when the peak levels off.
But then another bottleneck appeared, which is the database. We can increase the number of servers but the number of connections our database itself can handle is limited. So, we decided to change how the frontend synchronizes. Previously, these synchronizations were cyclical, but with each request, the time the frontend needs to wait grows exponentially. A better solution is to smooth this load throughout the whole hour when we don’t have such high loads so that we can process all these requests.
The next part, which was a little problematic for us, was checking if the user is logged in. We tackled this by switching to another technology for authenticating a user's session and thereby reduced the number of requests that our accounts microservice needs to handle.
Combined, these developments greatly improve our backend capacity and demonstrate the effort we put into optimizing performance.
With smartpatient becoming a part of Shop Apotheke, we are expecting MyTherapy’s userbase to continue to grow. To what extent do these changes future-proof the backend?
These improvements will help us deal with more users but, of course, we will continue to plan for the future. For example, we are aiming to reduce the number of synchronization requests even more. Basically, if there is nothing new on the backend there is no need to send these requests to the database every time. We can have two separate endpoints, one to send new data and one to fetch new data. And frontend apps will know if we need to make a new request to fetch new data. If, for example, we send a user a push message; imagine a user has two devices and registered that they took an intake on one device, we can send a push message to the other device by recognizing there is something new and making an incremental synchronization request. This will also decrease the load on the database in the future. This is our plan and we’re working on it right now.
It seems as though a lot of time and resources go into ensuring the backend is well-equipped to handle growth. If the solutions you’ve mentioned weren’t in place and these bottlenecks weren’t properly dealt with, what effect would have for users?
If someone is using MyTherapy and tells the app that they’ve taken their medication and the frontend sends some data to the backend, the user probably won’t even notice that this request was not served or if there was a timeout error, as the request will just be sent again in a couple of minutes. But if, for example, the user logged out and logged back in again they might not be able to see all of their medications and data in the app, so won’t be able to use it properly. Nothing from the backend side will get to the frontend, which is something we clearly want to avoid, especially as we’re dealing with people’s treatment.
Ignoring these issues might not immediately affect users too badly, but it would definitely come back to give us much bigger problems in the future.
In previous interviews on this blog, frontend developers have told us how certain provisions need to be made for our partner programs. Is this true for the backend?
Yeah, our partner programs typically involve very serious medications, so we need to aim for as close to 100% uptime as possible. We have a separate way of treating requests related to our partner programs – they are served by other servers, so even if our base servers are seriously overloaded, partner program requests will not be affected. Of course, the aim is to make sure our base server doesn’t get overloaded, but this extra precaution is taken for partner programs because of the nature of the software and the fact they are related to these important medications.
In describing the changes that have been made you mentioned things like switching to authentication technology. As a team, how do you decide which solutions to implement and which technology to use?
I think it’s important that we don’t have things forced upon us from above, we always decide from our side which solution we think would be best to use. It’s in everyone’s interest for us to find the best solutions, to help have fewer bugs and less downtime, so we are trusted to make the decisions we think are best. Communication is important. So, we have multiple guild meetings every sprint to discuss what is required, discuss what we have planned, what has been implemented, share knowledge. There are always opportunities to talk about what we are doing. We also have a discussion where we can ask questions about the current sprint, if there’s anything we’re not sure about, or we want other people’s opinion about the best way to implement something.
So, there is a lot of knowledge being shared. And even though we are in multiple Scrum teams, it’s always good to have a wider perspective of what’s going on and what other teams’ functionalities are.
I get the impression from you and other developers who I’ve spoken to that the Scrum framework works really well at smartpatient. Why do you think that is, and do you think it will continue to work as we grow as a company?
Personally, I’m happy that we are working with Scrum and I find it works for us. We develop a lot of new stuff and Scrum is very good for getting fast feedback on code that has already been developed. We can develop our functionalities every two weeks and those functionalities are presented internally first and then by our product owners to our partners. It’s quite a dynamic environment and I think Scrum’s agile framework works very well for this.
We have five Scrum teams at the moment, which is going to increase, and we’re already aware of some potential problems that can come with that and planning for them. For example, we are working on splitting functionalities further within the teams, so we will have clearer responsibility for every part of the app. I think we are aware that adding more teams can make things a little messy, but we are being proactive and already working on some solutions for this.
With the team growing and new Scrum teams being created, what would you say to any backend developer reading this who might be considering applying at smartpatient?
Personally, I really enjoy working here and don’t plan on moving any time soon! We have very open and communicative people and, within the whole company, quite an international vibe. We have a lot of input regarding MyTherapy; product owners always discuss new functionalities with developers and ask for our propositions and our perspective. It’s only after receiving this feedback from us that product owners make decisions and plan the stories.
If you’re not only strong from a technical perspective, but also capable of working in this open environment, then I think smartpatient is a very good match for you.
We are hiring!
Are you interested in working on a multi-million user app and in an environment where your opinions matter? Check out our job listings here: Careers.