<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Googling for graphs</title>
	<atom:link href="http://www.bit-player.org/2007/googling-for-graphs/feed" rel="self" type="application/rss+xml" />
	<link>http://bit-player.org/2007/googling-for-graphs</link>
	<description>An amateur's outlook on computation and mathematics.</description>
	<pubDate>Tue, 07 Feb 2012 08:14:54 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.3</generator>
		<item>
		<title>By: New Year’s To-Do List &#171; Learning Computation</title>
		<link>http://bit-player.org/2007/googling-for-graphs#comment-2836</link>
		<dc:creator>New Year’s To-Do List &#171; Learning Computation</dc:creator>
		<pubDate>Fri, 30 Apr 2010 23:47:34 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=125#comment-2836</guid>
		<description>[...] every month or so, so that you can shame me into staying on target. (Hat tip to Brian Hayes at bit-player for [...]</description>
		<content:encoded><![CDATA[<p>[...] every month or so, so that you can shame me into staying on target. (Hat tip to Brian Hayes at bit-player for [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: mersin web tasarÄ±m hosting</title>
		<link>http://bit-player.org/2007/googling-for-graphs#comment-1562</link>
		<dc:creator>mersin web tasarÄ±m hosting</dc:creator>
		<pubDate>Wed, 19 Dec 2007 00:47:35 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=125#comment-1562</guid>
		<description>thanks</description>
		<content:encoded><![CDATA[<p>thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: brian</title>
		<link>http://bit-player.org/2007/googling-for-graphs#comment-1560</link>
		<dc:creator>brian</dc:creator>
		<pubDate>Fri, 14 Dec 2007 04:26:54 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=125#comment-1560</guid>
		<description>Yes, if you load the page 50,000 times, I get away clean but Google puts you on the no-fly list.</description>
		<content:encoded><![CDATA[<p>Yes, if you load the page 50,000 times, I get away clean but Google puts you on the no-fly list.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kurt</title>
		<link>http://bit-player.org/2007/googling-for-graphs#comment-1559</link>
		<dc:creator>Kurt</dc:creator>
		<pubDate>Fri, 14 Dec 2007 02:31:25 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=125#comment-1559</guid>
		<description>Well, this raises more questions for me.  The IP number from which the chart request comes from belongs to the person browsing, not the host site.  Of course, the web browser is supposed to pass along the referring URL along with other info about the requester, so Google can still keep track of which host sites are generating a lot of requests.  And I suppose that any site which generates 50,000 hits per day can afford to generate their own graphs in-house.</description>
		<content:encoded><![CDATA[<p>Well, this raises more questions for me.  The IP number from which the chart request comes from belongs to the person browsing, not the host site.  Of course, the web browser is supposed to pass along the referring URL along with other info about the requester, so Google can still keep track of which host sites are generating a lot of requests.  And I suppose that any site which generates 50,000 hits per day can afford to generate their own graphs in-house.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: brian</title>
		<link>http://bit-player.org/2007/googling-for-graphs#comment-1558</link>
		<dc:creator>brian</dc:creator>
		<pubDate>Thu, 13 Dec 2007 15:02:50 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=125#comment-1558</guid>
		<description>Yes, they could surely cache the most popular graphs. The question I'm curious about is whether it's worthwhile to do so. Obviously there's a storage cost for keeping things, but more important there's a computational cost. You've got to examine every URL that comes in to see if it matches one of the saved graphs. The routine might go like this:

1. Receive a URL.
2. Apply a hash function.
3. Look up the hash in a table.
&#160; &#160; &#160;  3a. If you've never seen this hash before, set a counter to 1.
&#160; &#160; &#160;  3b. If you've seen the hash before, increment its counter.
4. Compare the count with the popularity threshold:
&#160; &#160; &#160;   (count &#60; threshold) --&#62; generate a fresh graph
&#160; &#160; &#160;  (count = threshold) --&#62; generate a fresh graph and cache it
&#160; &#160; &#160;  (count &#62; threshold) --&#62; retrieve the cached copy

(I'm ignoring the possibility of hash collisions, which could make matters worse.)

Note that the overhead of hashing and table lookup and counting applies to &lt;em&gt;every&lt;/em&gt; URL received, not just the popular ones. Is it worth the bother? I don't know. Perhaps we could find out by experiment: Try requesting the same graph repeatedly, and see if at some point the response time drops significantly. But note the following "usage policy":

&lt;blockquote&gt;Use of the Google Chart API is subject to a query limit of 50,000 queries per user per day. If you go over this 24-hour limit, the Chart API may stop working for you temporarily. If you continue to exceed this limit, your access to the Chart API may be blocked.&lt;/blockquote&gt;

This suggests that they are at least bothering to keep track of the IP number that requests come from. And of course many of us suspect that Google knows all and keeps &lt;em&gt;everything&lt;/em&gt;.</description>
		<content:encoded><![CDATA[<p>Yes, they could surely cache the most popular graphs. The question I&#8217;m curious about is whether it&#8217;s worthwhile to do so. Obviously there&#8217;s a storage cost for keeping things, but more important there&#8217;s a computational cost. You&#8217;ve got to examine every URL that comes in to see if it matches one of the saved graphs. The routine might go like this:</p>
<p>1. Receive a URL.<br />
2. Apply a hash function.<br />
3. Look up the hash in a table.<br />
&nbsp; &nbsp; &nbsp;  3a. If you&#8217;ve never seen this hash before, set a counter to 1.<br />
&nbsp; &nbsp; &nbsp;  3b. If you&#8217;ve seen the hash before, increment its counter.<br />
4. Compare the count with the popularity threshold:<br />
&nbsp; &nbsp; &nbsp;   (count &lt; threshold) &#8211;&gt; generate a fresh graph<br />
&nbsp; &nbsp; &nbsp;  (count = threshold) &#8211;&gt; generate a fresh graph and cache it<br />
&nbsp; &nbsp; &nbsp;  (count &gt; threshold) &#8211;&gt; retrieve the cached copy</p>
<p>(I&#8217;m ignoring the possibility of hash collisions, which could make matters worse.)</p>
<p>Note that the overhead of hashing and table lookup and counting applies to <em>every</em> URL received, not just the popular ones. Is it worth the bother? I don&#8217;t know. Perhaps we could find out by experiment: Try requesting the same graph repeatedly, and see if at some point the response time drops significantly. But note the following &#8220;usage policy&#8221;:</p>
<blockquote><p>Use of the Google Chart API is subject to a query limit of 50,000 queries per user per day. If you go over this 24-hour limit, the Chart API may stop working for you temporarily. If you continue to exceed this limit, your access to the Chart API may be blocked.</p></blockquote>
<p>This suggests that they are at least bothering to keep track of the IP number that requests come from. And of course many of us suspect that Google knows all and keeps <em>everything</em>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kurt</title>
		<link>http://bit-player.org/2007/googling-for-graphs#comment-1557</link>
		<dc:creator>Kurt</dc:creator>
		<pubDate>Thu, 13 Dec 2007 01:26:14 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=125#comment-1557</guid>
		<description>&lt;blockquote&gt;If a thousand people read the page, the graph will be recreated a thousand times.&lt;/blockquote&gt;I'm willing to bet that Google has some clever caching strategies so that high-volume web pages get their graphs stored so they don't have to be regenerated for each hit.</description>
		<content:encoded><![CDATA[<blockquote><p>If a thousand people read the page, the graph will be recreated a thousand times.</p></blockquote>
<p>I&#8217;m willing to bet that Google has some clever caching strategies so that high-volume web pages get their graphs stored so they don&#8217;t have to be regenerated for each hit.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

