How to Fix Canonicalization Issues For Better PageRank
The first step on the way to fix canonicalization issues is to find them.
Here is what I do.
- Go to www.google.com
- Click on Advanced Search next to the text box.
- Enter * in the "all these words" text box.
- Select the drop down for Results per page and set it to 100 results.( the
maximum).
- In the "Search within a site or domain" enter your site name to be
canonicalized.
- Hit Advanced Search.
- Once you get the results, browse to the bottom of the page and click on
the link which says "repeat the search with the omitted results
included."
Copy as many results you can copy and paste into an Excel
file.Or you may continue working on the Google page. From here on, its a bit
of manual work as you find out which pages are showing up as duplicates.
I run a script on my webserver that sends me an error message every time a page is not found by Googlebot. This helps take care of any over correction of canonical issues.
You can also use Virante's duplicte content finder tool. There is a link at the bottom of this article.
Here is another way to find duplications.
Go to Google Webmaster tools. click Diagnostics, then click HTML Suggestions.
On this page, you will see a list of error descriptions with the number of pages on the right side.
For example, while writing this article, I found the following error.
If any of the items show 2 or more pages, you have issues.
When I clicked the duplicate meta descriptions link in the above image, I got this.
Why do you think Google thinks there are two pages?
Look closely and you will find that one page has /Articles\ArticleDisplay.aspx?...
while the other has /Articles/ArticleDisplay.aspx?...
I have canonical issues because of a "/" in the wrong direction!
Now, once you have found the pages that are duplicated, here is how you can fix the most common canonicalization issues.
- Remove all duplicate pages but read what you need to do prior to this.
- Identify the pages you are going to keep and add the following "canonical
hint" to their head sections.
<link rel="canonical"
href="http://www.example.com/product.php?item=swedish-fish" />
- Use a 301 redirect in the header.This tells google that this page has now
moved and is available at the new location.
- Do not use www.your homepage.com and http://yourhomepage.com. Pick one and
be consistent. I like to use http://www.NobleRiver.com.
- Use all lowercase if possible. I happened to use First letter upper case
at the beginning, so I try to stick with it.
- Try to find issues that are causing duplication and get rid of them.
- ASP.NET websites can use the global.asax BeginRequest method to check for
url inconsistencies.
- Use the robots.txt file to block all access to the duplicate pages.
The following code will disallow all access to the "oldfiles" folder.
<meta name="robots" content="noindex,nofollow">
- You can request your pages be removed from the Google Index via the Crawler Access link under Site Configuration in Google WebMaster Tools. Read what you need to do prior to this.
Here is how to get there.
- Login to Google Webmasters.
- Select your website.
- Click Site Configuration
- click Crawler access.
- click Remove URL.
- Google Webmaster tools will allow you to adjust parameter settings.
To get to this page,
- Login to Google Webmasters.
- Select your website.
- Click Site Configuration
- Click Settings.
- Click Adjust Parameter Settings
Here add the Parameter value you need Google to disregard and choose action = "Ignore". This will
allow googlebot to strip out the extra parameters in the URL so that all pages lead to the canonical content will have the same URL.
If you are handy with ASP.NET coding, and you have an ASP.NET based website,
here's how to point to your canonical page from your duplicate pages
This needs to go into the Page_Load section of your code behind page.
Response.Clear
Response.Status = "301 Moved
Permanently"
Response.AddHeader "Location",
www.yourdomain.com/newpageURL
Response.Flush
Response.End
Resources: