To be a valuable search engine you need to have a fresh index with lots of new information in it… all the time. To get this information a search engine needs to crawl the web and then filter this information into usable chunks of data that relate to search queries.
One of the things search engines do to ensure the quality of their data is remove duplicate content from their index as it fills up their servers with more copies of a page than is needed and creates a bad user experience for searchers. Could you imagine getting the same article from different websites on all ten results from a search engine page? Searching for information is about cross-referencing different bits and pieces to make an overall informed judgement about something, someone or somewhere. Search engines know this, that’s why they spend so much time making sure the content that is served up for a search query is relevant, non-spammy and will create a great user experience.
So we have established that search engines don’t like duplicate content, but what exactly is duplicate content? Like the name implies, it’s content on your website that is identical to other content. As mentioned before this is bad for search engines so making sure you have unique content on your website is a must if you want any chance of ranking well.
What causes duplicate content?
There are many reasons for duplicate content on your website, here are just a few:-
Print pages
If you have a normal web page and then an additional ‘print friendly’ page, you now have two copies of that page on your website. As most search engines use the content of a page to test through their duplicate content filters, this means potentially you will have penalties applied to these pages as it will be seen as two pages with exactly the same content.
Fix - You could put all your print friendly pages into a directory on your server and then disallow the search engine crawlers to this directory using a robots.txt file.
Canonicalization issues
Don’t worry… I thought “what the hell does that word mean?” when I first saw it too. Essentially your website homepage could have multiple URLs pointing to it. For example:-
http://yourwebsite.com
http://www.yourwebsite.com
http://yourwebsite.com/index.htm
http://www.yourwebsite.com/index.htm
https://yourwebsite.com
https://www.yourwebsite.com
https://yourwebsite.com/index.htm
https://www.yourwebsite.com/index.htm
These could all point to your homepage (please note this is an extreme example but still possible all the same). If a search engine crawls and indexes all these versions of URLs there could be multiple versions of your homepage… or worse your entire website in the index. This would be very bad news indeed.
One thing to keep in mind is that if your competitors are smart and notice that you haven’t re-directed your URLs correctly, they could point links from other websites or directories to your different URLs causing a search engine to crawl these (basically creating a forced crawl of all your different URLs) that could lead to a drop in rankings.
Fix - First you will need to find out if there are any additional versions of your website or homepage in the index. You can do this by using the site: operator (put site: before all your URLs in a search engine’s search box to check if they’re in the index eg. site:http://www.yourwebsite.com) If you have multiple versions of your site in a search engine’s index you will need to ‘301 re-direct’ the unwanted URLs to your main URL as a fix. (If you want further info on how to do a ‘301 re-direct’, leave a comment and I’ll get back to you.)
Manufacturers product descriptions
If you’re selling a product online through a distributor or manufacturer, chances are the products they provide come with a standard piece of text or a product description that many people use on their sites. If there are a hundred other websites out there with your product, or worse still an entire range of products that you sell, you will have duplicate content issues.
Fix - Really the only way to get around this is to modify your content so that it’s unique. Try writing your own product descriptions so your content is unique and original.
Product pages
If you have a shopping cart, product pages are a hotbed for duplicate content. Usually most people will add multiple products to their site using the same product description but only changing colour, size or another minor element to differentiate these products. As most of the content is the same you could have hundreds of pages with duplicate content on them.
Fix - You could re-write all your shopping cart pages however, if you have a few thousand products this could be a very large job indeed. Another option is to analyse your website and find out which product generates the most revenue for you and filter out the others using a robots.txt file. (This isn’t the best solution however, you may find the lift in rankings due to less duplicate content penalties will increase your revenue.)
Stolen content
If others have stolen your content, this could lead to a search engine indexing the wrong version (theirs!!). To see if anyone has stolen your content try using Copyscape.
Multiple domains
If you have multiple domains you will want to ‘301 re-direct’ these to your main domain name. Don’t set up multiple websites using the same content on different domains as this will cause you issues with duplicate content filters.
The final word
As you can see there’s a few ways your site can produce duplicate content. If you are aware of these issues and take appropriate measures to ensure your site doesn’t suffer from these, you shouldn’t have too many problems.