Over the last few months we’ve been working hard on building the range of statistics we cover here at Timetric. The other day we surpassed the one-million-series mark. We thought you might want to know how we’ve done it, especially as these series aren’t static; we actively, and automatically, check each one for changes periodically. Thousands are updated daily – check our front page for the most recently updated.
All the data in Timetric is uploaded by a subsystem we call the “Big Dataset Uploader”. This goes away and pulls in data from various organization’s websites, FTP servers, or wherever else it’s to be found, beats it around until it’s in the right shape, then uploads it. Ideally we’d get all our data from consistent and well defined web services; The World Bank’s API is a good example for others to follow in this regard.
In general, though, the biggest help has been using proven, open source software components. We’ve been able to draw on the wealth of knowledge available and contribute back wherever possible; Toby’s sunburnt library, a Python interface to the Solr search engine, for instance.
We’ve been building Timetric almost entirely in Python. You might be surprised by that – there are much faster compiled languages – but it’s worked out well for us. Python has great libraries. In particular, with Numpy, it has very good numeric performance for a scripting language. With a small team the productivity and maintainability advantages more than compensate for any performance hit.