In the middle of March I came to Switzerland for just under two weeks, and it was during this time that Italy went from “no public gatherings” to “everything is closed - no one in or out”. As the Coronavirus was approaching pandemic territory I started to become really interested in the data behind the virus. I live in the UK and I was (and am still) amazed by the lack of response from the government there, and at the same time concerned that in five days the situation could change so significantly I would not be allowed back in to the country.
I thought it would be cool to build a graph showing the data over time, perhaps against other territories, to try and establish when countries would start taking certain actions. As I searched around, I found there wasn’t a good API to use out of the box for what I wanted. I thought this would be cool mini-hackathon so I decided to build a quick API to expose this data.
The API is available at covid19api.com.
After some googling, the best data source I could find was the Github repo from John Hopkins University. They update the repo every day with the latest data collated from various sources. I decided to start with the time series data as it served the requirements I wanted to fill.
I’m a big fan of Go so I naturally chose it as the language for the server application. The application ended up being just shy of 500 lines of code and broadly works as follows:
- Every six hours the files are fetched from the Github repo above
- Any current files are backed up to keep historical data
- The new files are then parsed and saved in to the database
- An API is exposed to give access to the data
- Statistics are collected to show overall API usage
The API queries all run pretty efficiently with responses generally being returned in 150ms. The biggest route is for all data which takes 4s to return 8MB which is pretty good performance. There’s nothing really special going on here, indexes added to MySQL and normal Go code is more than sufficient (there isn’t even any caching yet).
On the data input side, I’m making fair use of Go routines but again - nothing crazy. Each file (three in total) run in parallel when getting loaded every six hours and this takes around 10 seconds in total.
There is a lot of room for improvement which I plan to do, but right now I focused on time to market due to the rapid spread of the Coronavirus and the potentially diminishing returns of this API.
Why Build This API
I want to provide third parties with clean data through a nice, fast JSON API so that web, mobile and other applications can be built using it. If the data is more easily integratable there should hopefully be more innovation - there are currently only two apps on the App Store, which I find interesting considering the huge focus on this.
I plan to work on the code enough to make it open source soon, as well as add better error checking, testing and logging. I’ll also add some performance improvements as well as additional, detailed routes. Finally I’ll work on some documentation, probably through Postman, to help get people up and running.
The response so far has been great and really quick - it seems this has filled a need which I’m happy about. The project’s landing page is covid19api.com if you’re interested in finding out more. Feel free to reach out to me about features, bugs or anything else.