<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Timetric Blog &#187; Uncategorized</title>
	<atom:link href="http://blog.timetric.com/category/uncategorized/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.timetric.com</link>
	<description></description>
	<lastBuildDate>Fri, 27 May 2011 12:29:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Unsolicited advice for large governmental data providers</title>
		<link>http://blog.timetric.com/2011/05/23/unsolicited-advice-for-large-governmental-data-providers/</link>
		<comments>http://blog.timetric.com/2011/05/23/unsolicited-advice-for-large-governmental-data-providers/#comments</comments>
		<pubDate>Mon, 23 May 2011 16:30:40 +0000</pubDate>
		<dc:creator>Toby White</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.timetric.com/?p=763</guid>
		<description><![CDATA[An ideal data source We source data from a number of large national, and trans-national, statistical bodies, like the Office of National Statistics here in the UK, or Eurostat. Downloading useful data from organizations like this is sometimes a tricky &#8230; <a href="http://blog.timetric.com/2011/05/23/unsolicited-advice-for-large-governmental-data-providers/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h1>An ideal data source</h1>
<p>We source data from a number of large national, and trans-national, statistical bodies, like the <a href="http://www.statistics.gov.uk">Office of National Statistics</a> here in the UK, or <a href="http://ec.europa.eu/eurostat">Eurostat</a>. Downloading useful data from organizations like this is sometimes a tricky job &#8211; although publishing data is usually part of their <em>raison d&#8217;être</em>, they&#8217;re not usually thinking of people like us &#8211; Big Data geeks &#8211; when making their data available. And often, their methods of making data available have been essentially unchanged for the past ten or fifteen years, and even then are probably based on processes predating the Internet.</p>
<p>One of the sources of value Timetric adds is simply making this data more widely available and accessible. But it&#8217;s also true that there&#8217;s so much more we could do if we could put our minds to using this data in new and exciting ways, rather than expending expertise on working out the best way to map old-fashioned data publication workflows to a web-centric way of working. So it&#8217;s an interesting question to ask &#8211; in an ideal world, how would a large statistical organization publish data for us?</p>
<p>There&#8217;s three aspects to this question:</p>
<ol>
<li>Data transfer and formats</li>
<li>Metadata formats and reconciliation</li>
<li>Update frequency and notifications</li>
</ol>
<h2>1. Data transfer and formats</h2>
<p>For us, the easiest <em>data</em> to deal with is probably &mdash; and perhaps counter-intuitively &mdash; either the ONS or Eurostat. That&#8217;s despite the fact that both of these present their data in fairly obscure, more-or-less undocumented dumps of 1980&#8242;s-era databases (at a guess).</p>
<p>However, in both of these cases, we can download the entire database in just a few files, largely one per data release, each containing several thousands, to tens of thousands, of series. We don&#8217;t have to run any queries to express which data we&#8217;d like, everything simply lives at a predictable URL. We don&#8217;t want to make hundreds of queries to get different subsets of the data, we mostly just want it all (though see below).</p>
<p>The formats and the URL schemes could be documented much better &mdash; but we&#8217;ve already done the job of reverse engineering them. As long as they don&#8217;t change significantly, it&#8217;s a trivially-repeatable set of operations to get the files, and extract the data from them. And each source is yielding a huge quantity of valuable data, so for that up-front investment in time, we get a good payoff.</p>
<p>For a new source, we&#8217;d be quite happy with anything along those lines. We don&#8217;t mind a bit of time in writing a parser for a new data format, or even in reverse-engineering some URL construction. That up-front cost isn&#8217;t a huge investment if there&#8217;s a lot of high-quality data, repeatably downloadable, waiting for us afterwards. That said, obviously we&#8217;d much rather have the data in a well-documented, simple format, and minimize that up-front investment. You can&#8217;t go very far wrong with CSV files lying behind well-established URLs.</p>
<p>What we really <em>don&#8217;t</em> like is API endpoints built around the idea that you only want a few series at a time, and you&#8217;ll be making the choice by hand. It&#8217;s no fun doing thousands of HTTP connections to get each and every data series (neither for us, having to track success/failure/retries &#8211; nor for the servers, having to deal with us flooding their API). It&#8217;s also no fun trying to work out various combinations of query parameters until we get just what we want. That&#8217;s especially painful when they&#8217;re query parameters for forms designed originally to be driven by human interactions. But even when they are aimed at computer downloads, there&#8217;s still far too many API developers who still haven&#8217;t thought about API discoverability. (And we definitely don&#8217;t want these forms submitted by POST. Bang, there goes your cache, and our chances of getting data quickly.) All in all, we&#8217;d rather just have data dumps to download.</p>
<p>In short, APIs which are good for exposing small quantities of data to individual users aren&#8217;t very good for exposing large quantities of data for reuse on a large scale. And formats don&#8217;t really matter at all.</p>
<h2>2. Metadata schemata, formats, and reconciliation.</h2>
<p>Again, surprisingly, there&#8217;s something to be said for the Eurostat approach to this &#8211; but this time not the ONS. Eurostat have a fairly cryptic set of metadata codes, encoded in a rather bizarre way within the data, which only directly apply to their own data, and are probably the result of several decades of semi-random accretion. There&#8217;s no international standards in use here. On the other hand, they are well-documented, and once you&#8217;ve worked out how to extract and decode them, you&#8217;ve got a nice, consistent set of metadata across tens of thousands of data sets. That&#8217;s a far better state of affairs than some data suppliers, who give us little or no metadata, and certainly don&#8217;t have a well-documented background to their metadata terms (collection methods, statistical processes, industrial classifications etc).</p>
<p>(The ONS, by comparison, are not useful in this regard. They are very precise about their metadata, and have reams upon reams of well-written documentation about statistical standards. However, almost none of this metadata can be linked up with their data in any automatic way. The data themselves come with nothing except very short titles, often with enigmatically and inconsistently abbreviated technical terms.)</p>
<p>If you&#8217;re a large well-established national or trans-national body, and you&#8217;ve got your own internal metadata &mdash; please just expose it! At least that way, we can arrange all your data consistently with respect to itself, and probably start linking the obvious bits of metadata across multiple sources. We&#8217;d much rather have that now, than wait on a perfect standard further down the road.</p>
<p>On the other hand, if you&#8217;re starting from scratch these days, you could do much better. Our lives would be made much easier if people used metadata which was drawn from some standardized vocabulary, so we could reconcile metadata between different suppliers. If you were beginning the process today, the obvious place to start now is with <a href=" http://en.wikipedia.org/wiki/SDMX">SDMX</a> (and see the <a href="http://www.ecb.int/stats/services/sdmx/html/tutorial.en.html">tutorial from the European Central Bank</a>).</p>
<p>At the moment, we have to do lots of that reconciliation ourselves. You can automate surprisingly large amounts of the work, but by no means all. It definitely still requires human intervention, and often from someone who&#8217;s fairly economically literate. Enormous amounts of the work we&#8217;ve done has gone into building tools to let us leverage that human intervention as much as possible, to develop semi-automated workflows for metadata reconciliation.</p>
<p>In short, ideally, everyone would use internationally-recognized standards of metadata and reporting. But if they don&#8217;t, or can&#8217;t yet, the most useful thing they can do <em>now</em>, is to make as much of their internal metadata systems documented, available for reuse, and mark up as much of their data with it as possible. Making that available now would be an immediate gain for everyone. Waiting around for people to map their internal metadata systems on to SDMX doesn&#8217;t help anyone nearly as much.</p>
<h2>3. Update frequency and notifications</h2>
<p>For most data providers on this scale, different series are updated at different times, on different release schedules. A naïve approach to dealing with this is to simply download the entire dataset daily, and reprocess it to find what&#8217;s changed. This has the problem that:</p>
<ul>
<li>it costs <em>us</em> quite a bit of processing time, much of which is entirely unnecessary, meaning data isn&#8217;t as available as quickly as it should be,</li>
<li>it costs the data provider in terms of bandwidth on data that&#8217;s downloaded unecessarily, just so we could check it&#8217;s not changed,</li>
<li>it leaves open the question of *when* we should do this downloading. We want the data as soon as its released &#8211; but we have no way of finding out when that is. All we can do is download frequently enough that we aren&#8217;t likely to be too slow in catching new data (while not running afoul of either of the other two problems).</li>
</ul>
<p>There are various ways around this. The ONS, for example, always makes data releases at 09:30 UK time (or very shortly thereafter), so that&#8217;s when we check their site. Unfortunately, they don&#8217;t tell you (in a machine-readable way) what has changed, so we still have to process an awful lot of unchanged data.</p>
<p>An easy way for them, or indeed anyone, to do this right is just to use HTTP timestamps on data dump files. We could simply do a HEAD on the URL, check whether the data has changed, and download the contents only if they were new.</p>
<p>If they wanted to be even more helpful, they could provide notification services for us to subscribe to, letting us know as soon as data was updated. But to be perfectly honest, I wouldn&#8217;t be very concerned with them doing that. It&#8217;s another moving part to go wrong, and I&#8217;d still be very tempted to poll their URLs anyway to check that there weren&#8217;t updates being provided which notifications weren&#8217;t working for.</p>
<p>Finally, if I were a data provider, I&#8217;d make very sure I had a good cache in front of everything. If you are doing what we want in terms of data dumps, then there&#8217;s nothing easier to cache than GETs to a set of unchanging URLs, with relatively-infrequently changing contents at each URL. Even if you are using something else, then caching should be an important part of your technical strategy.</p>
<p>People make mistakes, and they will end up accidentally downloading too much, too often, or accidentally letting a badly-written client run loose. You can shout at them and deny them access, but nobody&#8217;s going to be happy about that, whoever&#8217;s fault it is, and that&#8217;s still going to leave you open to the risk of people overloading you before you get round to banning them. (See the <a href="http://www.theregister.co.uk/2011/02/04/crime_mapping_police_uk/">police.uk</a> fiasco for how not to handle this!)</p>
<h2>Thinking about your users</h2>
<p>Sometimes I wonder if perhaps we&#8217;re atypical of the sorts of users that statistical organizations need to deal with &mdash; but on reflection, I don&#8217;t think that&#8217;s true. (and to the extent it is, I believe we&#8217;re in the vanguard of a much broader spectrum of people who want to reuse data like this.) We&#8217;re probably higher volume than most users right now, but we&#8217;re probably also more likely to be prepared to deal with poorly designed systems, and persist when many wouldn&#8217;t. Organizations that put time and effort into making data useful to users like us will find their data much more widely used than those who don&#8217;t.</p>
<p>Ultimately, if an organization like we&#8217;re discussing here wants to make its data more useful to everyone, there&#8217;s one big lesson to be learnt, and that is: &#8220;You&#8217;re not going to be able to predict what your users want&#8221;. The best job a public data organization can do is make as much of the data (and metadata!) available as it can, in as untrammelled a form as possible. Don&#8217;t try and second guess what people want to do &#8211; just let them have everything. As long as you don&#8217;t stand in their way, people will use and re-use your data in ways you could never have foreseen.</p>
<div id="tweetbutton763" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.timetric.com%2F2011%2F05%2F23%2Funsolicited-advice-for-large-governmental-data-providers%2F&amp;via=timetric&amp;text=Unsolicited%20advice%20for%20large%20governmental%20data%20providers&amp;related=timetric&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.timetric.com%2F2011%2F05%2F23%2Funsolicited-advice-for-large-governmental-data-providers%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.timetric.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.timetric.com/2011/05/23/unsolicited-advice-for-large-governmental-data-providers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Sharing Timetric with your colleagues</title>
		<link>http://blog.timetric.com/2010/07/08/sharing-timetric-with-your-colleagues/</link>
		<comments>http://blog.timetric.com/2010/07/08/sharing-timetric-with-your-colleagues/#comments</comments>
		<pubDate>Thu, 08 Jul 2010 15:08:17 +0000</pubDate>
		<dc:creator>Andrew Walkingshaw</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.timetric.com/?p=426</guid>
		<description><![CDATA[At Timetric, we reckon the most important way you can use data is to use it to understand things and persuade people. So we&#8217;ve been busy building things to help you out with that, and here&#8217;s a new feature which &#8230; <a href="http://blog.timetric.com/2010/07/08/sharing-timetric-with-your-colleagues/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>At Timetric, we reckon the most important way you can use data is to use it to understand things and persuade people. So we&#8217;ve been busy building things to help you out with that, and here&#8217;s a new feature which has come from that: you can now share indexes on Timetric with your friends and colleagues by email!</p>
<p>A lot of our pages now have an email button:</p>
<p><img style="margin: 0 auto; display: block;" src="http://d1bqgrdx21papg.cloudfront.net/images/email-button.0d5c02ac.png" alt="email button" /></p>
<p>If you click that button where you see it, and fill out the form, we&#8217;ll handle the rest. Here&#8217;s how it works:</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="505" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/ect4Q_9wjH4&amp;hl=en_US&amp;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="640" height="505" src="http://www.youtube.com/v/ect4Q_9wjH4&amp;hl=en_US&amp;fs=1" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>And your friend will get an email like this:</p>
<p><a href="http://blog.timetric.com/files/2010/07/Picture-7.png"><img class="aligncenter size-full wp-image-437" title="Shared index email from Timetric" src="http://blog.timetric.com/files/2010/07/Picture-7.png" alt="Shared index email from timetric.com" width="771" /></a></p>
<p>Really simple and, we hope, really useful. Let us know what you think.</p>
<div id="tweetbutton426" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.timetric.com%2F2010%2F07%2F08%2Fsharing-timetric-with-your-colleagues%2F&amp;via=timetric&amp;text=Sharing%20Timetric%20with%20your%20colleagues&amp;related=timetric&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.timetric.com%2F2010%2F07%2F08%2Fsharing-timetric-with-your-colleagues%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.timetric.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.timetric.com/2010/07/08/sharing-timetric-with-your-colleagues/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Timetric Portfolios &#8211; now for your iPhone or iPod Touch</title>
		<link>http://blog.timetric.com/2010/04/15/timetric-portfolios-now-for-your-iphone-or-ipod-touch/</link>
		<comments>http://blog.timetric.com/2010/04/15/timetric-portfolios-now-for-your-iphone-or-ipod-touch/#comments</comments>
		<pubDate>Thu, 15 Apr 2010 17:08:18 +0000</pubDate>
		<dc:creator>Andrew Walkingshaw</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.timetric.com/?p=310</guid>
		<description><![CDATA[Update: As of the 21st September 2010, we&#8217;ve ended the Timetric Portfolios experiment, and we&#8217;ve pulled it offline. We launched Timetric Portfolios a couple of weeks ago, and we&#8217;re delighted to see people using it to track their investments (or &#8230; <a href="http://blog.timetric.com/2010/04/15/timetric-portfolios-now-for-your-iphone-or-ipod-touch/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong>Update</strong>: As of the 21st September 2010, we&#8217;ve ended the Timetric Portfolios experiment, and we&#8217;ve pulled it offline. </p>
<hr />
<div id="pics" style="text-align: center;">
    <img src="http://finance.timetric.com/images/portfolios-iphone-view.png">
</div>
<p>We launched <a href="http://finance.timetric.com/portfolios">Timetric Portfolios</a> a couple of weeks ago, and we&#8217;re delighted to see people using it to track their investments (or just work out how a set of shares <em>would</em> have done).</p>
<p>Now you can carry your portfolios around with you, too: <a href="http://finance.timetric.com/doc/portfolios-iphone/">Timetric Portfolios for iPhone (or iPod Touch)</a>!</p>
<p><a href="http://itunes.apple.com/app/timetric-portfolios/id366648499">Get it from the UK or US app stores.</a></p>
<div id="tweetbutton310" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.timetric.com%2F2010%2F04%2F15%2Ftimetric-portfolios-now-for-your-iphone-or-ipod-touch%2F&amp;via=timetric&amp;text=Timetric%20Portfolios%20%26%238211%3B%20now%20for%20your%20iPhone%20or%20iPod%20Touch&amp;related=timetric&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.timetric.com%2F2010%2F04%2F15%2Ftimetric-portfolios-now-for-your-iphone-or-ipod-touch%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.timetric.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.timetric.com/2010/04/15/timetric-portfolios-now-for-your-iphone-or-ipod-touch/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Investment announcement!</title>
		<link>http://blog.timetric.com/2010/03/24/investment-announcement/</link>
		<comments>http://blog.timetric.com/2010/03/24/investment-announcement/#comments</comments>
		<pubDate>Wed, 24 Mar 2010 12:25:00 +0000</pubDate>
		<dc:creator>Andrew Walkingshaw</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.timetric.com/?p=306</guid>
		<description><![CDATA[23rd March, 2010: Investment announcement Timetric is delighted to announce that it has recently closed an investment round. Participants in the round included Stefan Glänzer, Alex Zubillaga, Sherry Coutu, Matteo Stefanel and Sean Park and Udayan Goyal of Nauiokas Park. &#8230; <a href="http://blog.timetric.com/2010/03/24/investment-announcement/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong>23rd March, 2010: Investment announcement<br />
</strong></p>
<p>Timetric is delighted to announce that it has recently closed an investment round. Participants in the round included Stefan Glänzer, Alex Zubillaga, Sherry Coutu, Matteo Stefanel and Sean Park and Udayan Goyal of Nauiokas Park. Timetric intend to use the capital raised to accelerate the rollout of their network of novel statistical services.</p>
<p><strong>About Timetric<br />
</strong></p>
<p>Timetric build services which make statistics useful. They include <a href="http://timetric.com/">timetric.com</a>, a leading aggregator of public statistical data, and <a href="http://finance.timetric.com/portfolios/">Timetric Portfolios</a>, a radically simple and social tool for analysing stock portfolios. All of their services are built on top of the Timetric Platform, their class-leading proprietary service for publishing, analysing, and performing calculations on very large quantities of time-varying statistical data. Their customers include the Guardian and United Business Media.</p>
<p>Timetric was founded by Andrew Walkingshaw, Toby White and Dan Wilson in mid-2008. They were winners at London Mini Seedcamp 2009; the company is now based in Clerkenwell, London, having relocated from Cambridge towards the end of last year.</p>
<p>Enquiries to press@timetric.com.</p>
<p>Timetric Ltd is a company registered in England and Wales, number 7133675, with registered office address Level 9, 107 Cheapside, London EC2V 6DN.</p>
<hr />
<p>We&#8217;re on <a href="http://eu.techcrunch.com/2010/03/24/timetric-closes-seed-funding-for-statistics-on-speed-platform/">Techcrunch Europe</a> too! Big day for us.</p>
<p>And with that, two more bits of news. We launched <a href="http://finance.timetric.com/portfolios/">Timetric Portfolios</a>, and you should check it out. Also: <a href="http://timetric.com/biz/jobs/">we&#8217;re hiring</a>. Get in touch!</p>
<div id="tweetbutton306" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.timetric.com%2F2010%2F03%2F24%2Finvestment-announcement%2F&amp;via=timetric&amp;text=Investment%20announcement%21&amp;related=timetric&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.timetric.com%2F2010%2F03%2F24%2Finvestment-announcement%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.timetric.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.timetric.com/2010/03/24/investment-announcement/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Sunburnt: a python-solr interface</title>
		<link>http://blog.timetric.com/2010/02/08/sunburnt-a-python-solr-interface/</link>
		<comments>http://blog.timetric.com/2010/02/08/sunburnt-a-python-solr-interface/#comments</comments>
		<pubDate>Mon, 08 Feb 2010 17:08:33 +0000</pubDate>
		<dc:creator>Toby White</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.timetric.com/?p=231</guid>
		<description><![CDATA[Over the last few months, we&#8217;ve been hard at work behind the scenes at Timetric, and a few of the results are now to be seen on the website. If you&#8217;ve been paying close attention, you might have noticed the &#8230; <a href="http://blog.timetric.com/2010/02/08/sunburnt-a-python-solr-interface/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Over the last few months, we&#8217;ve been hard at work behind the scenes at Timetric, and a few of the results are now to be seen on the website. If you&#8217;ve been paying close attention, you might have noticed the appearance of machine tags, and of the ability to search series by value.</p>
<p>These are both reflections of one of the biggest changes we&#8217;ve made – we&#8217;ve entirely replaced the search infrastructure the site runs on. We&#8217;re now backed by <a href="http://lucene.apache.org/solr/">Apache Solr</a>, and we&#8217;ve written a new Python-Solr interface, called <a href="http://github.com/tow/sunburnt/">sunburnt</a>.</p>
<p>We let users search using both free text search and drill-down tagging &mdash; we used to run these on a combination of <a href="http://code.google.com/p/djangosearch/">postgres-backed full-search text</a> and <a href="http://code.google.com/p/django-tagging/">django-tagging</a>, but this combination <a href="#shortcomings">wasn&#8217;t particularly satisfactory</a>. Unsurprisingly, when you&#8217;re trying to add search infrastructure to a site, what you really want is a proper search-engine backend.</p>
<p>For a mature, full-featured, well-supported open-source search engine, the choice boils down to <a href="http://lucene.apache.org/solr/">Solr</a> or <a href="http://xapian.org">Xapian</a>. We were strongly tempted by the latter &mdash; there&#8217;s no shortage of Xapian expertise around Cambridge, but we were swayed by the Apache licensing of Solr, rather than Xapian&#8217;s GPL.</p>
<p>And although there are <a href="http://eaddrinu.se/blog/2010/sunburnt.html">a number of existing Python-Solr interfaces</a>, none of them did what we wanted, which was to provide an intelligent and robust Pythonic API, which lets you pass arbitrary objects in and out of Solr. So we built our own, and called it <a href="http://tow.github.com/sunburnt/">sunburnt</a>.</p>
<p>Sunburnt is most directly comparable to <a href="http://haystacksearch.org">Haystack</a>, but with a couple of major differences. Firstly, it&#8217;s not <a href="http://haystacksearch.org/docs/faq.html#when-should-i-not-be-using-haystack">restricted to Django model data</a>, and secondly, it&#8217;s schema-driven rather than schema-generating &mdash; it lets you construct your own Solr schema, and automatically derives all the type-checking/conversion/coercion code necessary to map your objects to and from the Solr index, when constructing queries and exchanging data.</p>
<p>The only documentation at the moment is the examples below, but the code is all <a href="http://github.com/tow/sunburnt/">up on Github</a>. Patches and contributions are more than welcome!</p>
<h4>Sunburnt in use</h4>
<p>To start indexing &#038; querying, you initialize a <tt>SolrInterface</tt> with your schema. At the moment, you need to do this by passing in the schema xml &mdash; sunburnt won&#8217;t query the Solr server for its schema.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">solr_interface = sunburnt.<span style="color: black;">SolrInterface</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;http://localhost:8983&quot;</span>, <span style="color: #483d8b;">&quot;schema.xml&quot;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>To index objects, <tt>add()</tt> them to the interface. sunburnt doesn&#8217;t care what form the data comes in, so long as</p>
<ul>
<li>if it looks like an object, it has attributes named according to fields defined in the schema</li>
<li>if it looks like a dictionary, it has keys named according to fields defined in the schema</li>
</ul>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> Document<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, title, contents<span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">title</span> = title
        <span style="color: #008000;">self</span>.<span style="color: black;">contents</span> = contents
&nbsp;
documents = <span style="color: black;">&#91;</span>
   <span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;title&quot;</span>:<span style="color: #483d8b;">&quot;This is a dictionary&quot;</span>, <span style="color: #483d8b;">&quot;contents&quot;</span>:<span style="color: #483d8b;">&quot;Lorem ipsum&quot;</span><span style="color: black;">&#125;</span>,
   Document<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;This is an object&quot;</span>, <span style="color: #483d8b;">&quot;dolor&quot;</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#93;</span>
&nbsp;
solr_interface.<span style="color: black;">add</span><span style="color: black;">&#40;</span>documents<span style="color: black;">&#41;</span></pre></div></div>

<p>If you haven&#8217;t set your Solr instance up to do <a href="http://wiki.apache.org/solr/SolrConfigXml/#Update_Handler_Section">autocommit</a>, then you might want to commit your documents to the index:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">solr_interface.<span style="color: black;">commit</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>after which the documents are searchable. The API is fairly close to that offered by Haystack (and indeed <a href="http://docs.djangoproject.com/en/dev/ref/models/querysets/">Django&#8217;s QuerySet</a>) &#8211; unsurprisingly, since they&#8217;re solving similar problems.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">solr_interface.<span style="color: black;">query</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;This&quot;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>does what you might expect, searching on the default field.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">solr_interface.<span style="color: black;">query</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;This&quot;</span><span style="color: black;">&#41;</span>.<span style="color: #008000;">filter</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;dictionary&quot;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>while chaining with <tt>filter()</tt> allows you to choose <a href="http://wiki.apache.org/solr/CommonQueryParameters#fq">which parts of your queries are cached by Solr</a>.</p>
<p>For fields representing numbers, or dates, then searching by range is useful, for example</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">solr_interface.<span style="color: black;">query</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;This&quot;</span>, last_modified__gt=<span style="color: #483d8b;">&quot;2009-01-01&quot;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>if you have a last_modified field in your schema. Queries <a href="http://wiki.apache.org/solr/SolrFacetingOverview">can be faceted</a> &#8211; if you had tags on your objects, you might do this:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">solr_interface.<span style="color: black;">query</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;This&quot;</span><span style="color: black;">&#41;</span>.<span style="color: black;">facet_by</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;tags&quot;</span>, limit=<span style="color: #ff4500;">20</span>, mincount=<span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span></pre></div></div>

<p>and if you wanted to search for similar documents, you can do a <a href="http://wiki.apache.org/solr/MoreLikeThis">more-like-this query</a> (in this case, looking for similarity in the tags field)</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">solr_interface.<span style="color: black;">query</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;This&quot;</span><span style="color: black;">&#41;</span>.<span style="color: black;">mlt</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;tags&quot;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>sunburnt doesn&#8217;t support all of the Solr API, but it gives you access to a goodly portion, and all of these operations are chainable.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">solr_interface.<span style="color: black;">query</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;This&quot;</span><span style="color: black;">&#41;</span>.<span style="color: #008000;">filter</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;dictionary&quot;</span><span style="color: black;">&#41;</span>.\
    facet_by<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;tags&quot;</span>, limit=<span style="color: #ff4500;">20</span>, mincount=<span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>.\
    query<span style="color: black;">&#40;</span>last_modified__gt=<span style="color: #483d8b;">&quot;2009-01-01&quot;</span><span style="color: black;">&#41;</span>.<span style="color: black;">paginate</span><span style="color: black;">&#40;</span>rows=<span style="color: #ff4500;">10</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Finally, having set up a query, you can get a result object back by <tt>execute()</tt>ing the query. This bit of the API is still a bit rough around the edges, but</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">r = solr_interface.<span style="color: black;">query</span><span style="color: black;">&#40;</span>...<span style="color: black;">&#41;</span>.<span style="color: black;">execute</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
r.<span style="color: black;">result</span> <span style="color: #808080; font-style: italic;"># has the main query results</span>
r.<span style="color: black;">facet_counts</span> <span style="color: #808080; font-style: italic;"># has the faceting results</span>
r.<span style="color: black;">more_like_these</span> <span style="color: #808080; font-style: italic;"># has any more_like_these results</span></pre></div></div>

<p>and if you poke around in that object, it has all the rest of the information that Solr provides.</p>
<h4>Sunburnt in practice</h4>
<p>The code is now running live on Timetric, and is problem-free for us. We&#8217;ve been able to throw away scads of code working around <a href="#shortcomings">djangosearch/django-tagging shortcomings</a>, and performance is significantly faster all round, especially for anything regarding tagging. Most usefully though, it&#8217;s provided us with a platform to start experimenting with new navigation features much more rapidly.</p>
<p><a name="shortcomings"></a></p>
<h5>Postscript: djangosearch/django-tagging shortcomings</h5>
<p>djangosearch, though easy to set up (if you&#8217;re already using postgres), offers very little in the way of control over various parameters and options you might want to tune, and requires filtering/escaping of some queries. (Searching for a string with &#8220;£&#8221; in it causes interesting errors!)</p>
<p>django-tagging had slightly more serious issues;</p>
<ul>
<li>we had to maintain our own fork of the codebase due to a couple of long-standing issues; corner cases which upstream weren&#8217;t interested   in fixing (One that kept biting us was its lack of support for models with non-integer primary keys. Easily fixed, but <a href="http://code.google.com/p/django-tagging/issues/detail?id=15">support is still not included in upstream django-tagging</a>).</li>
<li>extending its functionality turned out to involve writing a lot of  hand-tuned and often inherently slow SQL. Writing related-tags functionality was particularly painful &#8211; it involves inverting the  index, which is very time-consuming &#8211; we had to do that offline.</li>
</ul>
<div id="tweetbutton231" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.timetric.com%2F2010%2F02%2F08%2Fsunburnt-a-python-solr-interface%2F&amp;via=timetric&amp;text=Sunburnt%3A%20a%20python-solr%20interface&amp;related=timetric&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.timetric.com%2F2010%2F02%2F08%2Fsunburnt-a-python-solr-interface%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.timetric.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.timetric.com/2010/02/08/sunburnt-a-python-solr-interface/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>OAuth 1.0a and autodiscovery</title>
		<link>http://blog.timetric.com/2009/09/17/oauth-1-0a-and-autodiscovery/</link>
		<comments>http://blog.timetric.com/2009/09/17/oauth-1-0a-and-autodiscovery/#comments</comments>
		<pubDate>Thu, 17 Sep 2009 13:59:59 +0000</pubDate>
		<dc:creator>Toby White</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.timetric.com/?p=213</guid>
		<description><![CDATA[OAuth 1.0a As of last Monday, we&#8217;ve upgraded timetric.com to cope with the OAuth 1.0a workflow. If you follow these things, you can hardly have avoided noticing that there was a big fuss in April this year, when a vulnerability &#8230; <a href="http://blog.timetric.com/2009/09/17/oauth-1-0a-and-autodiscovery/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h5>OAuth 1.0a</h5>
<p>As of last Monday, we&#8217;ve upgraded timetric.com to cope with the <a href="http://oauth.net/core/1.0a">OAuth 1.0a</a> workflow.</p>
<p>If you follow these things, you can hardly have avoided noticing that there was a <a href="http://news.cnet.com/8301-13577_3-10225103-36.html">big fuss</a> in April this year, when a vulnerability in the OAuth protocol was discovered. When it was <a href="http://oauth.net/advisories/2009-1">made public</a>, it turned out to be less a technical than a social vulnerability. The OAuth workflow involves several transactions, and the exchange of multiple tokens. In version 1.0, there was an opportunity for a malicious third party to step into the exchange, and by tricking the end user (essentially, by phishing them into clicking a link) gain their credentials.</p>
<p>Still, social vulnerabilities are as important as technical ones, and the OAuth team rapidly developed version 1.0a of the workflow which avoids the problem. In the interim, and since upgrading existing servers and clients is hard work, and since the issue can be mitigated by anti-phishing provisions, it&#8217;s been standard practice to carry on supporting the 1.0 workflow, while attaching big warnings everywhere, and that&#8217;s what we&#8217;ve been doing.</p>
<p>However, we&#8217;ve now finished implementing a 1.0a-compliant server for timetric, so that 1.0a-capable clients can take advantage of the improved workflow. But because most clients don&#8217;t yet support 1.0a, our server currently supports both 1.0 and 1.0a transactions. Doing this has involved borrowing from, and extending, both <a href="http://oauth.googlecode.com/svn/code/python/oauth/">python-oauth</a> &#038; <a href="http://code.welldev.org/django-oauth/">django-oauth</a>. We&#8217;ve fed our changes to both the upstream authors of both projects, we&#8217;ve tested our codebase, and it&#8217;s now running live on timetric.</p>
<p>(As of writing, our changes haven&#8217;t yet made it into the upstream version of django-oauth, but you can get a hold of what we&#8217;ve done from our <a href="http://github.com/timetric/django-oauth/">fork on github</a>.)</p>
<p>We&#8217;ll continue to support the 1.0 workflow for the immediately-foreseeable future, but obviously at some point we&#8217;ll want to retire it in favour of the more secure 1.0a. For those of you who&#8217;ve written OAuth clients , I can highly recommend <a href="http://mojodna.net/2009/05/20/an-idiots-guide-to-oauth-10a.html">this blog post</a> as a very nice overview of the changes in the workflow, and what you need to do to 1.0a-enable a client.</p>
<h5>OAuth autodiscovery</h5>
<p>While I was poking around with the OAuth code, I also managed to address a niggle I&#8217;ve had for a long time with OAuth. Oauth uses three separate URL endpoints to manage the token request/exchange process. These need to be published somewhere, and then any OAuth clients need to know these service-specific URLs. This is annoying; practically, because it makes it hard to write a generic OAuth client framework, and also it offends the RESTian purist in me &#8211; resources should be machine-discoverable, dammit.</p>
<p>Anyway, it turns out that there is an experimental <a href="http://oauth.net/discovery/">OAuth auto-discovery spec</a>, which piggybacks off the XRDS resource discovery scheme. It&#8217;s not final, and it seems there&#8217;s not a lot of active development on it, but I thought I&#8217;d try it out anyway. Having implemented it as an experiment, I&#8217;m actually quite happy with it. All timetric OAuth resources are now completely auto-discoverable, knowing nothing but the xrds mimetype.</p>
<p>The workflow goes like this: firstly, ask for the location of the XRDS resource description, by using content-negotiation on whatever OAuth-protected resource you&#8217;re trying to gain access to:</p>
<p>Request:<br />
<code>GET /resource-of-interest<br />
Accept: application/xrds+xml</code></p>
<p>Response:<br />
<code>HTTP/1.1 302 Found<br />
X-XRDS-Location: /xrds.xrds<br />
Location: http://timetric.com/xrds.xrds<br />
[...]</code></p>
<p>then follow the redirect;</p>
<p>Request:<br />
<code>GET /xrds.xrds</code></p>
<p>Response:<br />
<code>HTTP/1.1 200 OK<br />
Content-Type: application/xrds+xml<br />
[...]</p>
<p>&lt;?xml version="1.0" encoding="UTF-8"?&gt;<br />
&lt;XRDS xmlns="xri://$xrds"&gt;<br />
 [...]</code></p>
<p>The XRDS file has a well-defined XML format, and the client can parse it to pull out the location of the OAuth endpoints. This means that you can write a very generic OAuth client library; all the library needs to be told is the location of any interesting oauth-protected resources, and now it can find out everything it needs to negotiate the OAuth workflow. Helpfully, you can also use the XRDS file to advertise which forms of OAuth negotiation you support &#8211; parameters in the URI, or as HTTP headers, different signature schemes, and so on.</p>
<p>I&#8217;ve had an immediate benefit, because now it&#8217;s made my test framework much simpler &#8211; I don&#8217;t need to store arbitrary strings denoting my OAuth URLs, nor manipulate them every time I run tests against differently-named test servers. Since the spec isn&#8217;t final, this scheme is obviously liable to change &#8211; but it&#8217;s a nice example of how to make a service machine-discoverable.</p>
<div id="tweetbutton213" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.timetric.com%2F2009%2F09%2F17%2Foauth-1-0a-and-autodiscovery%2F&amp;via=timetric&amp;text=OAuth%201.0a%20and%20autodiscovery&amp;related=timetric&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.timetric.com%2F2009%2F09%2F17%2Foauth-1-0a-and-autodiscovery%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.timetric.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.timetric.com/2009/09/17/oauth-1-0a-and-autodiscovery/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>API improvements</title>
		<link>http://blog.timetric.com/2009/08/13/api-improvements/</link>
		<comments>http://blog.timetric.com/2009/08/13/api-improvements/#comments</comments>
		<pubDate>Thu, 13 Aug 2009 18:24:54 +0000</pubDate>
		<dc:creator>Toby White</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.timetric.com/?p=204</guid>
		<description><![CDATA[As Dan alluded to yesterday, this week we made a new API release. Previously our API was basically only let you add and retrieve data. This has been useful to a whole lot of people, but there&#8217;s much more that &#8230; <a href="http://blog.timetric.com/2009/08/13/api-improvements/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As Dan <a href="http://blog.timetric.com/2009/08/new-dashboard/">alluded to yesterday</a>, this week we made a new API release.</p>
<p>Previously our API was basically only let you add and retrieve data. This has been useful to a whole lot of people, but there&#8217;s much more that you can do with Timetric.</p>
<p>The new release involves several features. There&#8217;s a bit of improvement to existing functionality to make life a bit easier when uploading data; but more excitingly, we&#8217;ve opened up access to even more of the capabilities of the timetric platform.</p>
<h3>Search endpoints</h3>
<p>When building applications on top of Timetric, one of things we&#8217;ve been asked for is the ability to retrieve lists of relevant data. This might simply be to get hold of all of your own series, or it might be a list of tagged series, or it might be a complex search query.</p>
<p>For all of these, we&#8217;ve <a href="http://timetric.com/help/httpapi-query/">exposed search endpoints</a> that let you do powerful queries across our data. You can search through the full text of our titles and descriptions, over tags, and by user. This means you can build much more useful interactive interfaces on top of Timetric. In fact, these are exactly the same endpoints that timetric.com uses internally when you browse our data.</p>
<h3>Calculated series</h3>
<p>Through the timetric.com website, you&#8217;ve always had the ability to build model calculations, and to filter series. We&#8217;ve now <a href="http://timetric.com/help/httpapi-models/">exposed this at the API level</a> as well, so you can build these models and filters programmatically.</p>
<h3>Cross-domain requests</h3>
<p>If you&#8217;re a web developer, you&#8217;ll be all too familiar with the headaches of restrictions on cross-domain requests. In many cases, there are perfectly good security-related reasons for them, but these restrictions make writing some web applications much harder than it ought to be.</p>
<p>Fortunately, the newest generation of browsers (Firefox 3.5, IE8, and Safari 4) let you make secure cross-domain requests directly — so long as the server supports it (see <a href="https://developer.mozilla.org/en/HTTP_access_control">https://developer.mozilla.org/en/HTTP_access_control</a>). Since this is such a useful feature — for us as much as anyone else &#8211; we&#8217;ve enabled it so you can use it too, and build much more exciting Timetric mashups in modern browsers.</p>
<h3>Easier uploading</h3>
<p>And finally, we had feedback from several people about ways in which we could make pushing data into the platform through the API a bit easier. The details are probably uninteresting unless you like constructing HTTP messages yourself (which I do, but it&#8217;s not everyone&#8217;s cup of tea!) so I&#8217;ll simply point you at the <a href="http://timetric.com/help/httpapi-data/">new documentation</a>. In short, you can POST data directly, rather than having to multipart-encode it.</p>
<h3>So &#8230;</h3>
<p>If you&#8217;re a developer, get out there and play! We&#8217;re always happy to <a href="http://getsatisfaction.com/inklingsoftware/">get any feedback</a> &#8211; positive or negative!</p>
<div id="tweetbutton204" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.timetric.com%2F2009%2F08%2F13%2Fapi-improvements%2F&amp;via=timetric&amp;text=API%20improvements&amp;related=timetric&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.timetric.com%2F2009%2F08%2F13%2Fapi-improvements%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.timetric.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.timetric.com/2009/08/13/api-improvements/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Timetric&#039;s New Logo</title>
		<link>http://blog.timetric.com/2009/08/07/timetrics-new-logo/</link>
		<comments>http://blog.timetric.com/2009/08/07/timetrics-new-logo/#comments</comments>
		<pubDate>Fri, 07 Aug 2009 14:48:01 +0000</pubDate>
		<dc:creator>Dan Wilson</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.timetric.com/?p=183</guid>
		<description><![CDATA[It&#8217;s been a busy month in Timetric Towers, so this post is waaay overdue, but I really want to highlight the excellent new logos designed for us by Kate Abbass (@kateabbass). We love them — they&#8217;re simple yet distinctive. And &#8230; <a href="http://blog.timetric.com/2009/08/07/timetrics-new-logo/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been a busy month in Timetric Towers, so this post is waaay overdue, but I really want to highlight the excellent new logos designed for us by <a href="http://www.kateabbass.co.uk/">Kate Abbass</a> (<a href="http://twitter.com/kateabbass">@kateabbass</a>).</p>
<div style="text-align: center;"><img class="alignnone size-full wp-image-184" title="Timetric Logo" src="http://blog.timetric.com/wp-content/uploads/2009/08/Timetric-Logo.png" alt="Timetric Logo" width="171" height="69" /> <img class="alignnone size-full wp-image-185" title="Timetric Mark" src="http://blog.timetric.com/wp-content/uploads/2009/08/Timetric-Marks.png" alt="Timetric Mark" width="74" height="73" /></div>
<p><div style="text-align: left;">We love them — they&#8217;re simple yet distinctive. And Andrew&#8217;s happy that we finally have some Helvetica in the site! Look out for them on a blog near you soon&#8230;</div>
<p>
<div id="tweetbutton183" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.timetric.com%2F2009%2F08%2F07%2Ftimetrics-new-logo%2F&amp;via=timetric&amp;text=Timetric%26%23039%3Bs%20New%20Logo&amp;related=timetric&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.timetric.com%2F2009%2F08%2F07%2Ftimetrics-new-logo%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.timetric.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.timetric.com/2009/08/07/timetrics-new-logo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Upgrading the Timetric backend.</title>
		<link>http://blog.timetric.com/2009/05/01/upgrading-the-timetric-backend/</link>
		<comments>http://blog.timetric.com/2009/05/01/upgrading-the-timetric-backend/#comments</comments>
		<pubDate>Fri, 01 May 2009 18:21:33 +0000</pubDate>
		<dc:creator>Toby White</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.timetric.com/?p=103</guid>
		<description><![CDATA[As you might have noticed, this week involved a flurry of activity at Timetric HQ &#8211; on Wednesday we pre-announced downtime for a database upgrade on Friday, but we ended up accelerating our schedule, and doing the upgrade on Thursday &#8230; <a href="http://blog.timetric.com/2009/05/01/upgrading-the-timetric-backend/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As you might have noticed, this week involved a flurry of activity at Timetric HQ &#8211; on Wednesday we <a href="http://blog.timetric.com/2009/04/timetric-upgrades-on-friday/">pre-announced downtime for a database upgrade on Friday</a>, but we ended up <a href="http://twitter.com/timetric/status/1658245302">accelerating our schedule, and doing the upgrade on Thursday instead</a>. We thought it might be useful to offer an explanation of what was going on!</p>
<p>The backstory is that Timetric runs on a pair of databases for its backend. One is a traditional RDBMS (Postgres, as it happens), which is used for storing all metadata; the other is a non-relational DB, used for storing all our timeseries data. It&#8217;s the latter which was at issue &#8211; we needed to make the fairly major change from using <a href="http://hadoop.apache.org/hbase/">HBase</a> to using <a href="http://tokyocabinet.sourceforge.net/">Tokyo Cabinet/Tyrant</a>.</p>
<p>When we first started building Timetric, we knew we wanted to use a non-relational DB for our timeseries data. There&#8217;s a number of options out there (for a recent overview, see Bob Ippolito&#8217;s talk &#8220;<a href="http://blip.tv/file/1949416/">Drop ACID and think about data</a>&#8221; at PyCon 2009). We did some initial experimentation, and ended up going with HBase. It seemed like a good match &#8211; it has timestamped versioning for all its data, which seemed to fit data that&#8217;s inherently time-based; it&#8217;s a high-profile project, at the time just adopted by Apache; it&#8217;s got a lively, helpful developer community; the codebase seemed relatively robust, and was comprehensible (so much so that despite not really being a Java programmer, I was able to offer a few minor patches easily enough). And it has a very nice scaling story, up to billions of rows, which is nice to have in reserve!</p>
<p>Its major downside was that its performance wasn&#8217;t really yet ready for use behind an interactive website &#8211; nevertheless, we were still in early alpha at the time. Performance improvements were high up the developer team&#8217;s wishlist, and we didn&#8217;t seem to be the only people interested in using it as backing store for a web application, so we had hopes that things would improve. and, indeed they did &#8211; current HBase performance is significantly better than it was 6 months ago.</p>
<p>Nevertheless, a few weeks ago, we decided that we couldn&#8217;t carry on working with HBase indefinitely; we took another look around, and made the choice to migrate to Tyrant.</p>
<p>There were a few reasons for this:</p>
<ul>
<li>HBase performance was improving, but it was becoming apparent that the loads we were placing on it were atypical; it was coping, but we were having to do some relatively baroque optimizations within the web application layer to get acceptable performance for large datasets.</li>
<li>We&#8217;ve had people getting in touch about running private Timetric instances. For these, having one single, huge, database for holding all our data is less important &#8211; rather, we need multiple, per-instance databases. Scaling at the high-end is less important, and management at the low-end is more important; HBase isn&#8217;t trivial to manage &#8211; it&#8217;s got a lot of moving parts, and automating the management of large-scale deployments is a complex task (see, for example, <a href="http://www.smartfrog.com">SmartFrog</a>).</li>
<li>Most importantly, when you&#8217;re managing multiple instances, reliability becomes a far greater concern. Not that it&#8217;s more important, but there are more points of failure; managing that reliably is hard. HBase is a large system, with a relatively small user-base &#8211; while we&#8217;ve been in a position to deal with running it behind Timetric.com, we weren&#8217;t confident in our ability to do so behind multiple instances of the platform, for multiple customers.</li>
</ul>
<p>Fundamentally, we need a much simpler, more easily manageable, and faster solution. Fortunately, Tyrant fulfils all of these criteria for us. </p>
<ul>
<li>It&#8217;s astonishingly fast &#8211; so much so that we&#8217;ve actually switched memcached <em>off</em> in a number of situations, because it&#8217;s quicker simply to get the data straight from the source. (This is partly due to the fact that<a href="http://docs.djangoproject.com/en/dev/topics/cache/"> Django&#8217;s interface to memcached</a> requires you to use <a href="http://docs.python.org/library/pickle.html">pickle</a>, which is <a href="http://inkdroid.org/journal/2008/10/24/json-vs-pickle/">orders of magnitude slower</a> than using <a href="http://pypi.python.org/pypi/simplejson/">simplejson</a> for simple data structures).</li>
<li>Setup is ridiculously simple. Honestly, why can&#8217;t all databases be this simple? Compilation is a bog-standard &#8220;./configure &amp;&amp; make &amp;&amp; make install&#8221;, there&#8217;s literally no configuration necessary to get a database going which is optimised for most common use cases, and there&#8217;s one command to start &amp; stop it.</li>
<li>Despite some reasonably hard pushing at it, we&#8217;ve had no hint of data corruption &#8211; or even transaction failures &#8211; against Tyrant. And the knowledge that it&#8217;s been heavily stress-tested elsewhere gives a nice warm fuzzy feeling.</li>
</ul>
<p>So, over the last few weeks, we&#8217;ve been planning this changeover; writing the new backend interface, testing, bugfixing, load-testing, optimizing, etc., with the aim of making the changeover today.</p>
<p>However, yesterday morning, events overtook us. The server logs started showing worrying error messages &#8211; the sort you really don&#8217;t like to see &#8211; about missing data. On further investigation, it turned out that our HBase instance was dropping data on the floor, left, right and centre.</p>
<p><strong><em>Fortunately, we keep good backups!! I&#8217;ll emphasise that before going any further.</em><span style="font-weight: normal;"> (I should also note that we still have no idea why the data corruption started happening &#8211; although clearly it was HBase dropping the data, I&#8217;m not laying all the blame on its head; we haven&#8217;t had time to find out what might have prompted the problem.)</span></strong></p>
<p>We deliberated briefly about what to do &#8211; we knew we had backups, and could restore them to HBase &#8211; we&#8217;ve tested that process before, so we knew how long it would take, a matter of a few hours. On the other hand, we&#8217;d been planning to do the migration the next day anyway. The migration would take a bit longer than simply restoring the backups; and because we&#8217;d been forced into it, would take slightly longer than was originally planned. On the whole, though, we thought it was best to cut our losses and kickstart the migration immediately.</p>
<p>So, the result was what you saw yesterday! Stress levels were fairly high all day (not helped by one of our number being in self-imposed quarantine &#8211; Dan came down with a cold (not swine flu!) yesterday morning, so kept himself at home, though he was working hard online. Nevertheless, the migration went almost entirely smoothly; much better than the worst case scenarios I&#8217;d dreamt up, and as you saw we were up again before the day was out.</p>
<p>And the result is fairly impressive, I think. Currently, most of the improvements are either fairly well-hidden, or amount to support for additional features which we&#8217;ll be building out in the next short while. But the most noticeable gain is that the whole site is much snappier now &#8211; page load times, especially for large series, have dropped dramatically. Upload times are hugely improved as well, which is letting us do some <a href="http://twilight.timetric.com/">quite exciting things</a>.</p>
<div id="tweetbutton103" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.timetric.com%2F2009%2F05%2F01%2Fupgrading-the-timetric-backend%2F&amp;via=timetric&amp;text=Upgrading%20the%20Timetric%20backend.&amp;related=timetric&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.timetric.com%2F2009%2F05%2F01%2Fupgrading-the-timetric-backend%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.timetric.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.timetric.com/2009/05/01/upgrading-the-timetric-backend/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Timetric upgrades on Friday</title>
		<link>http://blog.timetric.com/2009/04/29/timetric-upgrades-on-friday/</link>
		<comments>http://blog.timetric.com/2009/04/29/timetric-upgrades-on-friday/#comments</comments>
		<pubDate>Wed, 29 Apr 2009 16:16:07 +0000</pubDate>
		<dc:creator>Andrew Walkingshaw</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[maintenance]]></category>
		<category><![CDATA[upgrades]]></category>

		<guid isPermaLink="false">http://blog.timetric.com/?p=97</guid>
		<description><![CDATA[We&#8217;re about to roll out a major upgrade to our backend, which will let let us do some exciting new things. Unfortunately, though, we&#8217;ll have to take the site offline to move everything across. We&#8217;re going to do that this &#8230; <a href="http://blog.timetric.com/2009/04/29/timetric-upgrades-on-friday/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>We&#8217;re about to roll out a major upgrade to our backend, which will let let us do some exciting new things.</p>
<p>Unfortunately, though, we&#8217;ll have to take the site offline to move everything across. We&#8217;re going to do that this Friday (1st May); we&#8217;ll go as fast as we can, but you should expect <a href="http://timetric.com/">Timetric</a> to be offline all day, UK time, on Friday.</p>
<p>Sorry for the inconvenience!</p>
<div id="tweetbutton97" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.timetric.com%2F2009%2F04%2F29%2Ftimetric-upgrades-on-friday%2F&amp;via=timetric&amp;text=Timetric%20upgrades%20on%20Friday&amp;related=timetric&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.timetric.com%2F2009%2F04%2F29%2Ftimetric-upgrades-on-friday%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.timetric.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.timetric.com/2009/04/29/timetric-upgrades-on-friday/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

