Canonical Link Element Mistakes
Duplicate content can be a proper nightmare. The Canonical Link Element was introduced to help webmasters and site owners stop duplicate urls from getting indexed in Google.
Matt Cutt’s talks about the Canonical Link Element in the following video and blogged about it here.
So how does Google treat Canonical Mistakes?
In Matt’s video he mentions that the presence of the Canonical Link Element in the page’s code is a strong hint as to whether or not a url should be indexed or not. If a webmaster accidentally makes a mistake then, Matt says ‘we don’t promise we will abide by this 100%’ and Google reserves the ‘right to do what we think best’ for the user. So if Google thinks there’s been a mistake, that a webmaster has accidentally messed up, the page(s) may still be indexed.
So here’s how the 3 engines treated a Canonical ‘Shoot yourself in the Foot’ Mistake.
- Website with 30 pages of content, the canonical element with http://www.thewebaddress.com/ inserted into a header so that it was on every page of the site.
- Every page had quality and unique content with images. The site had unique titles and meta descriptions.
- There was a standard navigation and good internal linkage.
- The site had links and page rank.
So what happened?
Google clearly didn’t realise that this was a mistake , it only indexed 1 page, the homepage that had the canonical url in place. Google continually crawled the site over a 2-3 month period but only indexed the homepage. Yahoo and Bing indexed all 30 or so pages. After a while I got bored, when I realised Google wasn’t going to figure this out itself and removed the canonical element and the site’s pages got indexed.
October 19th, 2009 at 7:49 am
[…] future updates. You can also subscribe to email updates.Michael Wall has written about a short test he ran on a small site using the rel=canonical tag. Very interesting, and certainly one test does not a trend make, but I’d have expected more […]
October 19th, 2009 at 8:21 am
Hi Mr Wall,
I saw the exact same situation a few months back where a webmaster mistakenly placed the canonical element on his home page (site had several hundred pages – not badly ranked either) and the other pages started getting removed from the index rather rapidly.
i.e. he placed “http://his-home-page.com/” as the canonical element on every page (by mistake).
If I recall correctly, about 50 (perhaps more – I’m quite foggy on the number and don’t want to exaggerate) were removed from google’s index completely within a few days.
He had no choice but to simply wait for his site to be recrawled after that error.
Regards,
Ricardo
October 19th, 2009 at 12:06 pm
Ooh, fascinating. I’ve seen reports of the tag failing miserably to de-index duplicate content caused by strings of parameters, but this is the first I’ve heard about it being over-effective when implemented incorrectly. Nice find!
October 19th, 2009 at 7:28 pm
I have the same experience as Jono, I have seen it not be very effective at removing duplicate content. Interesting to see it behaving in this way.
October 22nd, 2009 at 2:00 pm
Guys thanks for dropping by, certainly looks like it’s needs a bit of fine tuning. I’ll maybe test this out again.
February 5th, 2010 at 5:58 pm
[…] given priority and the site was de-indexed. Arguably when there a conflict’s and an obvious indexing mistake has been made Google should ignore both directives and include all pages in it’s […]
December 31st, 2011 at 6:43 am
[…] checked the source code in the browser for non index elements, checked for any canonical mistakes, checked the history of the site to see if the site had been black listed, but couldn’t […]