content is a problem that worries
many webmasters. Rumor has it
that duplicate content can hurt
your Google rankings and that
web pages that copy your web site
content can harm your rankings.
For that reason, Google recently made an official statement about duplicate content.
What is duplicate content and what is not duplicate content?
Duplicate content are substantive blocks of contents within the same domain or across different domains that are identical or very similar.
Google mentions several things that can lead to duplicate content:
"Forums that generate both regular and stripped-down mobile-targeted pages, store items shown (and -- worse yet -- linked) via multiple distinct URLs, and so on. In some cases, content is duplicated across domains in an attempt to manipulate search engine rankings or garner more traffic via popular or long-tail queries."
If the same article is available in multiple languages (for example English and Spanish) then Google doesn't view that as duplicate content. Occasional snippets such as quotes also won't be flagged as duplicate content.
What does Google do if it finds duplicate content?
Google tries to filter duplicate content from the search results. The reason for that is that Google wants to present a diverse cross-section of unique content in the search results.
"During our crawling and when serving search results, we try hard to index and show pages with distinct information. This filtering means, for instance, that if your site has articles in 'regular' and 'printer' versions and neither set is blocked in robots.txt or via a noindex meta tag, we'll choose one version to list.
In the rare cases in which we perceive that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we'll also make appropriate adjustments in the indexing and ranking of the sites involved.
However, we prefer to focus on filtering rather than ranking adjustments ... so in the vast majority of cases, the worst thing that'll befall webmasters is to see the "less desired" version of a page shown in our index."
That simply means that Google will pick one of the web pages if it finds more than one page with the same content.
How can you avoid duplicate content problems with your web site?
Duplicate content can lead to
problems with search engines.
For that reason, follow the tips
above so that search engines have
as few problems as possible with
your site. If you find a web site
that copies your original content,
you can file a DMCO
- Tell search engines which pages they should index: If the printer friendly versions should not be indexed, block them in your robots.txt file.
- Use 301 redirections: If you restructured your web site, use permanent 301 redirections to redirect users and search engine spiders.
- Always use the same links to link to a page on your site: Don't link to /page, /page/ and /page/index.htm if the URLs always display the same web page.
- Use top level domains to handle language specific content: If you have German pages, use a .de domain for these pages.
- Use the preferred domain feature of Google's webmaster tools: Google allows you to choose if you prefer the www version or the non-www version of your URLs.
- Syndicate carefully: Make sure that other web sites link back to your site if they use your content.
- Avoid boilerplate repetition and publishing stubs: If possible, don't include the same lengthy copyright text on the bottom of every page. Better use a short version with a link to the full version. If you have category pages without any content, don't publish them.
- Understand your content management system (CMS): If you use a content management system, make sure that it doesn't publish the same content in multiple formats.
If you want to make sure that your web pages get high rankings
on search engines, you should make it as easy as possible for search
engines to parse your pages. Use IBP's Top 10 Optimizer to create
your web pages as
search engine friendly as possible.