Recently Google accused Bing of effectively copying their results by using toolbar data, and data from Internet Explorer if the suggested sites feature is enabled – you can read Google’s side of the story here, and the story of Bing’s response here.
I’m not going to explain it all in too much detail because I think those two articles cover it quite well, but as a quick summary:
1. Google suspected Bing of using some of Google’s data in Bing’s results
2. Google set up a test to prove this – by allowing pages to rank for “synthetic queries” (Googlewhacks), using IE8 with the Bing bar installed to search for and then visit those pages, and then found Bing returning around 9% of those results a few weeks later
3. Bing very strongly denied “copying” Google’s results once accused
Bing’s description of what’s happening appears to be around the use of “clickstream data” – it sounds like the Bing toolbar (and IE with suggested sites) looks at which pages you’re on and which pages you visit afterwards. This isn’t restricted to Google – this is, apparently, for all pages on the Internet.
I was actually quite surprised by the number of people siding with Bing over this, there’s something about Bing using it’s browser to collect user data from competitors that doesn’t sit quite right with me. Regardless, I was surprised by some of the things that Bing said to defend itself.
Google engaged in a “honeypot” attack to trick Bing. In simple terms, Google’s “experiment” was rigged to manipulate Bing search results through a type of attack also known as “click fraud.” That’s right, the same type of attack employed by spammers on the web to trick consumers
What Bing is complaining about here, is that Google engineers chose to adjust Google’s results for specific terms, searched in Google for those keywords and then clicked on those listings. In Google. That’s not an “attack”, nor is it a “trick” and it’s definitely not “click-fraud”.
Bing also mentions that the clickstream data that they’re using is one of 1,000 signals used to determine where a site should rank, and that the honeypot keywords that Google used were noticeable because they were outliers – and as such they only really had the clickstream data to go on.
How much of the clickstream data, is actually data from Google?
But this is what I don’t fully understand – the clickstream data itself. Bing says that the clickstream data isn’t just for Google – it’s for all sites on the web. But of course, Google – their biggest competitor – is the second most visited site on the Internet from the US, so it’s fair to say that a very hefty chunk of that clickstream data actually contains data from people searching on Google.
What happens when the clickstream data is scaled?
The other thing I don’t understand is what happens when you scale that clickstream data. We’ve only seen what happens when it’s used on 100 invented terms from Google’s honeypot test, where around 9% of those queries then appeared to affect Bing’s results. Bing implies that this isn’t a lot, and that the effect is much smaller when it’s scaled – but I’m not so sure. I’d actually be quite surprised if, when this was scaled to something the size of the Bing toolbar’s userbase, there wasn’t a very noticeable impact on Bing’s results. This is one of those things that cannot really be proved – we have to take Bing’s word for it.
Is Bing morally right to take Google’s user data?
During the Farsight video, the Bing rep mentioned that they were only using publicly available clickstream data – but of course, that data isn’t publicly available. The data is coming from a toolbar, and the conditions are, let’s face it, buried away somewhere in a EULA which nobody in their right mind ever reads. These users have legally opted in to sharing that data, but I don’t think they’re aware of it.
Regardless of that, though – Bing is taking data from Google users, who are searching on Google and allowing it to influence Bing’s search results. It may be legal, but it doesn’t mean you have to agree with it.
Flickr image from reway2007.