<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Timetric Blog &#187; search</title>
	<atom:link href="http://blog.timetric.com/category/search/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.timetric.com</link>
	<description></description>
	<lastBuildDate>Fri, 27 May 2011 12:29:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>DJUGL talk: Scaling search to a million pages with Solr, Python and Django</title>
		<link>http://blog.timetric.com/2010/07/27/djugl-talk-scaling-search-to-a-million-pages-with-solr-python-and-django/</link>
		<comments>http://blog.timetric.com/2010/07/27/djugl-talk-scaling-search-to-a-million-pages-with-solr-python-and-django/#comments</comments>
		<pubDate>Tue, 27 Jul 2010 14:32:59 +0000</pubDate>
		<dc:creator>Toby White</dc:creator>
				<category><![CDATA[infrastructure]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://blog.timetric.com/?p=472</guid>
		<description><![CDATA[Thanks to everyone who came along last night to DJUGL, to see me (and Nicholas Tollervey, and Mat Clayton) speak. My topic for the night was &#8220;Scaling search to a million pages, with Solr, Python and Django&#8221;. I&#8217;ve put the &#8230; <a href="http://blog.timetric.com/2010/07/27/djugl-talk-scaling-search-to-a-million-pages-with-solr-python-and-django/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Thanks to everyone who came along last night to <a href="http://groups.google.com/group/django-london">DJUGL</a>, to see me (and <a href="http://twitter.com/ntoll">Nicholas Tollervey</a>, and <a href="http://twitter.com/matclayton">Mat Clayton</a>) speak.</p>
<p>My topic for the night was &#8220;Scaling search to a million pages, with Solr, Python and Django&#8221;. I&#8217;ve put the <a href="http://www.slideshare.net/tow21/scaling-solr">slides up at SlideShare</a> (<a href="http://dl.dropbox.com/u/1942316/SolrMillionsOfDocs.pdf">direct PDF link</a>) if anyone wants them.</p>
<p>The <em>tl;dr</em> is summarized on the last-but-one-slide. If you want to be able to scale your search across millions of pages, and still get good results from your users, then you need to pay attention to some details at the small scale, and some details at the large scale.</p>
<p>At the small scale, you need to spend time thinking about how to construct your index schema. What queries do you want to be able to run, and what information do you need to present when your search results come back? The shape of your index schema needs to be driven entirely by the answers to these two questions, and that depends heavily on the shape of your data, and the way your users want to interact with it.</p>
<p>On the large scale, each installation will have its own problems, but three things you&#8217;ll almost certainly need to pay attention to are:</p>
<ul>
<li>Decoupling reading from and writing to the index. They have very different performance characteristics (and writing presents special problems if you&#8217;re updating documents as well as adding brand new documents).</li>
<li>Working out the right balance of adding/commiting/optimizing data. This will be driven by the frequency with which you add data, and how soon you need to be able to serve results from newly-added data. Must it be immediate, or can you wait seconds/minutes/hours?</li>
<li>Fine-tuning your tokenizers/analyzers. Although small and fiddly, this is an issue which will bite you more and more as a corpus of data grows. You&#8217;ll need to tweak your indexing algorithms away from the defaults; extracting relevant results from a pile of a million documents is much harder than from a few thousand.</li>
</ul>
<p>I also took the opportunity to plug my Python/Solr library, <a href="http://timetric.com/about/opensource/#sunburnt">sunburnt</a>. It&#8217;s a work in progress, but it&#8217;s battle-tested here at Timetric. If you&#8217;re trying to use Solr in any interesting Python project, I think its API is worth a look.</p>
<div id="tweetbutton472" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.timetric.com%2F2010%2F07%2F27%2Fdjugl-talk-scaling-search-to-a-million-pages-with-solr-python-and-django%2F&amp;via=timetric&amp;text=DJUGL%20talk%3A%20Scaling%20search%20to%20a%20million%20pages%20with%20Solr%2C%20Python%20and%20Django&amp;related=timetric&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.timetric.com%2F2010%2F07%2F27%2Fdjugl-talk-scaling-search-to-a-million-pages-with-solr-python-and-django%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.timetric.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.timetric.com/2010/07/27/djugl-talk-scaling-search-to-a-million-pages-with-solr-python-and-django/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On journalism</title>
		<link>http://blog.timetric.com/2009/07/16/on-journalism/</link>
		<comments>http://blog.timetric.com/2009/07/16/on-journalism/#comments</comments>
		<pubDate>Thu, 16 Jul 2009 22:44:40 +0000</pubDate>
		<dc:creator>Andrew Walkingshaw</dc:creator>
				<category><![CDATA[about us]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[news]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://blog.timetric.com/?p=172</guid>
		<description><![CDATA[Andrew Walkingshaw of Timetric at News Innovation from Martin Belam on Vimeo. From News Innovation: London. Thank you to Martin Belam for his very sympathetic editing! Andrew here; Dan and I took part in NewsInnovation:London at NESTA on Friday 10th &#8230; <a href="http://blog.timetric.com/2009/07/16/on-journalism/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><object width="600" height="345"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=5583442&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=1&amp;color=01AAEA&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=5583442&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=1&amp;color=01AAEA&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="600" height="345"></embed></object>
<p><a href="http://vimeo.com/5583442">Andrew Walkingshaw of Timetric at News Innovation</a> from <a href="http://vimeo.com/currybet">Martin Belam</a> on <a href="http://vimeo.com">Vimeo</a>.</p>
<p>From <a href="http://newsinnovationlondon.eventbrite.com/">News Innovation: London</a>. Thank you to <a href="http://currybet.net/">Martin Belam</a> for <a href="http://www.currybet.net/cbet_blog/2009/07/andrew_walkinshaw_timetric.php">his very sympathetic editing</a>!</p>
<p><a href="http://www.lexical.org.uk/">Andrew</a> here; <a href="http://dan-wilson.co.uk">Dan</a> and I took part in NewsInnovation:London at <a href="http://www.nesta.org.uk">NESTA</a> on Friday 10th July (just under a week ago). I spoke alongside <a href="http://www.tomski.com">Tom Loosemore</a> of <a href="http://www.4ip.org.uk/">4iP</a> about building tools to help journalists, and members of the public, get more out of public data.</p>
<p>If you&#8217;ve wondered what we&#8217;re doing with <a href="http://www.guardian.co.uk/datablog">the Guardian</a>, and more importantly <em>why</em> we&#8217;re doing it, here are some of the things we&#8217;ve been doing and the reasons why we&#8217;ve been doing them.</p>
<p>The video&#8217;s about four minutes long, and if nothing else, it&#8217;s a good opportunity to see (and poke fun at) what one of us on the Timetric team looks like! There is a serious point here, though: to give people better tools to find, view, share and analyse data is to empower people to make better-informed, more considered, smarter judgements — and it occurs to us that the best journalism has the same goal: to help people to understand and to decide. That&#8217;s something which <em>really</em> matters.</p>
<p>Thank you to <a href="http://www.markng.co.uk/">Mark</a> and the other organisers for putting on such an inspirational event.</p>
<div id="tweetbutton172" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.timetric.com%2F2009%2F07%2F16%2Fon-journalism%2F&amp;via=timetric&amp;text=On%20journalism&amp;related=timetric&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.timetric.com%2F2009%2F07%2F16%2Fon-journalism%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.timetric.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.timetric.com/2009/07/16/on-journalism/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New search and new graphs!</title>
		<link>http://blog.timetric.com/2009/03/25/new-search-and-new-graphs/</link>
		<comments>http://blog.timetric.com/2009/03/25/new-search-and-new-graphs/#comments</comments>
		<pubDate>Wed, 25 Mar 2009 18:47:57 +0000</pubDate>
		<dc:creator>Andrew Walkingshaw</dc:creator>
				<category><![CDATA[plotting]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[user interface]]></category>

		<guid isPermaLink="false">http://blog.timetric.com/?p=80</guid>
		<description><![CDATA[We&#8217;ve just added a couple of new features to Timetric which we think you&#8217;ll find useful. Up until now, it&#8217;s been a bit tricky to get a quick overview of the data in an area; you&#8217;ve needed to save all &#8230; <a href="http://blog.timetric.com/2009/03/25/new-search-and-new-graphs/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve just added a couple of new features to <a href="http://timetric.com/">Timetric</a> which we think you&#8217;ll find useful. Up until now, it&#8217;s been a bit tricky to get a quick overview of the data in an area; you&#8217;ve needed to save all the series to your dashboard individually before you could plot them against each other &#8211; which meant you had to be logged in! &#8211; and, on top of that, the different sorts of search we had (by tag and by free-text) weren&#8217;t as well integrated as they could have been.</p>
<p>Well, we&#8217;ve changed all that.</p>
<p><a href="http://timetric.com/search/?q=employment">Take employment data as an example:</a></p>
<div id="attachment_81" class="wp-caption aligncenter" style="width: 597px"><img class="size-full wp-image-81" title="picture-4" src="http://blog.timetric.com/wp-content/uploads/2009/03/picture-4.png" alt="Search results for &quot;Employment&quot; on Timetric" width="587" height="415" /><p class="wp-caption-text">Search results for &quot;Employment&quot; on Timetric</p></div>
<p>Two big changes here &#8211; firstly, there are sparklines, so you can get a feel for all the data in front of you immediately; secondly, you now get all the relevant tags at the top of the page on every search result, so you can immediately start filtering through the search results to find what you&#8217;re interested in. <a href="http://timetric.com/tags/utilities/?n=20&amp;q=Employment">Let&#8217;s say that&#8217;s the utilities sector.</a></p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-86" title="picture-5" src="http://blog.timetric.com/wp-content/uploads/2009/03/picture-5.png" alt="picture-5" width="587" height="415" /></p>
<p>The stars on the right here, if you&#8217;re logged in, immediately save series to <a href="http://timetric.com/dashboard">your dashboard</a>; they&#8217;re gold for each series you&#8217;re already watching. But the bigger change is on the left; if you check the series you&#8217;re interested in&#8230;</p>
<div id="attachment_87" class="wp-caption aligncenter" style="width: 597px"><img class="size-full wp-image-87" title="picture-6" src="http://blog.timetric.com/wp-content/uploads/2009/03/picture-6.png" alt="Selecting series in Timetric search results" width="587" height="415" /><p class="wp-caption-text">Selecting series in Timetric search results</p></div>
<p>and then hit &#8220;Overlay&#8221; or &#8220;Versus&#8221;, which you&#8217;ll find at the start and end of the search results page:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-88" title="picture-7" src="http://blog.timetric.com/wp-content/uploads/2009/03/picture-7.png" alt="picture-7" width="587" height="415" /></p>
<p>you can start making plots straight from search results.</p>
<div id="attachment_89" class="wp-caption aligncenter" style="width: 597px"><img class="size-full wp-image-89" title="picture-8" src="http://blog.timetric.com/wp-content/uploads/2009/03/picture-8.png" alt="Transportation employment's much more seasonal in Alaska than in Alabama. " width="587" height="415" /><p class="wp-caption-text">Transportation employment&#39;s much more seasonal in Alaska than in Alabama. </p></div>
<p>You can even get the embed code to put a graph into your blog straight from there:</p>
<p><object width="520" height="520" data="http://timetric.com/swf/corr.swf" type="application/x-shockwave-flash"><param name="wmode" value="opaque" /><param name="allowFullScreen" value="true" /><param name="flashvars" value="data=http%3A%2F%2Ftimetric.com%2Fembed%2FLXPnVnb9RweJgrRVAX1gtw%2CjsiQ6PVIROyT0r3VwzWSjw%2Fversus%2F" /><param name="src" value="http://timetric.com/swf/corr.swf" /><param name="bgcolor" value="#FFFFFF" /><param name="allowfullscreen" value="true" /></object></p>
<p>And that gives us a chance to mention another new feature which a few of you&#8217;ve been asking for – if you hover your mouse over the points in this graph, you&#8217;ll see each measurement in the scatter plot labelled with the time it comes from.</p>
<p>The big changes to graphing here are actually under the covers, though: after this, we&#8217;ll be able to make some really exciting improvements in the near future.</p>
<div id="tweetbutton80" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.timetric.com%2F2009%2F03%2F25%2Fnew-search-and-new-graphs%2F&amp;via=timetric&amp;text=New%20search%20and%20new%20graphs%21&amp;related=timetric&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.timetric.com%2F2009%2F03%2F25%2Fnew-search-and-new-graphs%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.timetric.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.timetric.com/2009/03/25/new-search-and-new-graphs/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Finding things</title>
		<link>http://blog.timetric.com/2008/11/04/finding-things/</link>
		<comments>http://blog.timetric.com/2008/11/04/finding-things/#comments</comments>
		<pubDate>Tue, 04 Nov 2008 16:09:47 +0000</pubDate>
		<dc:creator>Andrew Walkingshaw</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://blog.timetric.com/?p=5</guid>
		<description><![CDATA[Hi there! I&#8217;m Andrew Walkingshaw, and &#8211; amongst other things &#8211; I&#8217;m responsible for helping you to find the data you&#8217;re looking for on Timetric. That means two things, really; I build ways for you to find the data already &#8230; <a href="http://blog.timetric.com/2008/11/04/finding-things/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Hi there! I&#8217;m <a href="http://blog.timetric.com/about-andrew">Andrew Walkingshaw</a>, and &#8211; amongst other things &#8211; I&#8217;m responsible for helping you to find the data you&#8217;re looking for on <a href="http://timetric.com/">Timetric</a>.</p>
<p>That means two things, really; I build ways for you to find the data already in Timetric, and I find and upload new sources of data which we think you might be interested in. I&#8217;m going to write about the first of these today, but if there&#8217;s data out there which you&#8217;d like us to get, leave us a suggestion on <a href="http://getsatisfaction.com/inklingsoftware">our Get Satisfaction site</a> and we&#8217;ll see what we can do.</p>
<p>Anyway, about finding data. If you&#8217;ve used sites like <a href="http://flickr.com/">Flickr</a> or <a href="http://delicious.com/">Delicious</a>, then Timetric&#8217;s going to feel pretty familiar &#8211; like them, we use search and <a href="http://en.wikipedia.org/wiki/Tag_(metadata)">tagging</a> to help you get to the data you&#8217;re after.</p>
<p>When you log into Timetric, the first thing you see is your dashboard.</p>
<div id="attachment_6" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.timetric.com/wp-content/uploads/2008/11/picture-1.png"><img class="size-medium wp-image-6" title="picture-1" src="http://blog.timetric.com/wp-content/uploads/2008/11/picture-1-300x157.png" alt="" width="300" height="157" /></a><p class="wp-caption-text">Dashboard</p></div>
<p>Any nuggets you&#8217;re watching appear in here. Up in the top right hand corner, though, there&#8217;s a search box; if you type something in there and hit &#8220;Search&#8221;, then you get back a page of search results.</p>
<div id="attachment_7" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.timetric.com/wp-content/uploads/2008/11/picture-2.png"><img class="size-medium wp-image-7" title="picture-2" src="http://blog.timetric.com/wp-content/uploads/2008/11/picture-2-300x185.png" alt="Search results" width="300" height="185" /></a><p class="wp-caption-text">Search results</p></div>
<p>I searched for &#8220;bond&#8221; here, and I&#8217;ve got back a page of information about bond prices. Let&#8217;s say I&#8217;m not interested in data from Moody&#8217;s: if I search for &#8220;bond -Moody&#8221;, then I can exclude them from the search results.</p>
<div id="attachment_8" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.timetric.com/wp-content/uploads/2008/11/picture-3.png"><img class="size-medium wp-image-8" title="picture-3" src="http://blog.timetric.com/wp-content/uploads/2008/11/picture-3-300x189.png" alt="Search results for &quot;bond -Moody&quot;" width="300" height="189" /></a><p class="wp-caption-text">Search results for &quot;bond -Moody&quot;</p></div>
<p>Under each of these search results, you can see a list of tags (in light grey). When you upload data, you can give a list of tags to remember it by &#8211; categories, more or less. A lot of the data here comes from the UK National Statistics dataset from http://www.statistics.gov.uk/, and we&#8217;ve tagged that &#8220;National Statistics&#8221; &#8211; so if you click on that tag:</p>
<p><a href="http://blog.timetric.com/wp-content/uploads/2008/11/picture-4.png"><img class="aligncenter size-medium wp-image-9" title="picture-4" src="http://blog.timetric.com/wp-content/uploads/2008/11/picture-4-300x182.png" alt="" width="300" height="182" /></a></p>
<p>&#8230; then you get back a list of everything tagged with that tag &#8211; for this dataset, about 2000 pages worth! So you&#8217;ll need to be able to filter that. From this page, you can filter by another tag &#8211; the most common tags found alongside the ones you&#8217;re filtering by are listed at the top of the page. Let&#8217;s filter by &#8220;monthly&#8221;:</p>
<div id="attachment_10" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.timetric.com/wp-content/uploads/2008/11/picture-5.png"><img class="size-medium wp-image-10" title="picture-5" src="http://blog.timetric.com/wp-content/uploads/2008/11/picture-5-300x185.png" alt="Multi-tag filters" width="300" height="185" /></a><p class="wp-caption-text">Multi-tag filters</p></div>
<p>We&#8217;re based in Cambridge, about an hour away from London, so I&#8217;m most interested in what&#8217;s going on in the south of England. I can search within these results by typing &#8220;south&#8221; into the search box there.</p>
<div id="attachment_11" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.timetric.com/wp-content/uploads/2008/11/picture-6.png"><img class="size-medium wp-image-11" title="picture-6" src="http://blog.timetric.com/wp-content/uploads/2008/11/picture-6-300x188.png" alt="Tagging and searching together" width="300" height="188" /></a><p class="wp-caption-text">Tagging and searching together</p></div>
<p>So you can use tags and searching together to find the data you&#8217;re interested in. If there&#8217;s something you&#8217;re looking for which you can&#8217;t find, though, <a href="http://getsatisfaction.com/inklingsoftware">let us know</a> and we&#8217;ll try and help you out.</p>
<div id="tweetbutton5" class="tw_button" style=""><a href="http://twitter.com/share?url=http%3A%2F%2Fblog.timetric.com%2F2008%2F11%2F04%2Ffinding-things%2F&amp;via=timetric&amp;text=Finding%20things&amp;related=timetric&amp;lang=en&amp;count=horizontal&amp;counturl=http%3A%2F%2Fblog.timetric.com%2F2008%2F11%2F04%2Ffinding-things%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://blog.timetric.com/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.timetric.com/2008/11/04/finding-things/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

