7 Ways to Find (and Fix!) Duplicate Content

Posted on Jun 05, 2013
By Rhonda Hurwitz

Duplicate content happens.

You may be aware that content theft can create duplicate content and search engine penalties for your website or blog.

But, sometimes we inadvertently create duplicate content on our own websites, by not understanding content creation best practices!

Recent search engine updates have made it more important than ever to identify and address duplicate content ... before it impacts your organic search ranking or link popularity.

Here's how to detect and fix both types:

Detect and Fix Onsite Duplicate Content

Webmaster tools provide easy ways to help address duplicate content issues within your site or blog. Here's a gameplan:

  • Top Level Domains: Select Preferred URL Structure

To avoid creating multiple URLs delivering the same content, decide what your preferred URL structure is.

For example, should our website address be http://www.icopyright.com? Or should it be simply http://icopyright.com? For top level domains (tLDs), do you want the site to appear with or without the ‘www’? It doesn’t matter which you pick, as long as you pick and use one or the other.

Once decided, this preference can be set within Google’s free Webmaster Tools under ‘Configuration’.

With the tLD sorted, you can now address individual instances of duplicate content.

  • URL Structure at the Page Level: Unique and Consistent

Determine which unique URL you would prefer to use for each piece of content. Flatter URL structures that keep the content closer to the root domain are better for SEO and can help influence a higher rate of clicks on calls-to-action.

Here’s a hypothetical example: http://icopyright.com/about/digital-content-copyright-protection.

Once you’ve selected a preferred URL structure, be consistent. Use the preferred URLs throughout site navigation, in anchor text links and in sitemap files.

  • Apply 301 Redirects

A 301 redirect points search engines from a page with an earlier URL structure to the page with the newly structured URL , so that search engines do not perceive duplicate content -- a fantastic way to reunify duplicate content.

If you find duplication based on earlier URL structures, use a 301 redirect to indicate that the content has permanently moved.

  • Implement Canonical Tag

301 redirects are server settings.

If you’re not comfortable implementing a server change or your hosting provider does not support the use of 301 redirects, the canonical tag can also be used.

All major search engines including Google, Bing and Yahoo currently support the use of the canonical tag (rel=canonical), which can help point search engines to your preferred URLS when inserted into the page code of your site.

The official Google Webmaster Tools explanation, including an example, can be found here.

(PS - Here's an explanation and example of this in use, for commenter Bruce:

There’s a very simple way out of this: the canonical tag. In the *<head>* of the page, you put the preferred URL in a tag like this (this is a sample from our own blog):

The href should be the URL you like most for the content. If a search engine sees duplicate content on the site, it’ll use the canonical URL as the one and only one page on the site; there will be no dup content penalty).

  • URL Parameter Handling Tool

For duplicate content issues that arise due to multiple URLs with query string parameters, consider using the URL parameter handling tool within Google Webmaster Tools.

(A query string is the part of a URL that contains data to be passed to web applications. Query strings contain parameters or variables. Sometimes these parameters impact the content of the page).

To clarify, here's an example:

www.XYZClothing.com/products/women?category=dresses&color=green is a query string with parameters for ‘Category’ and ‘Color’.

Other URL parameters do not impact page content and are solely for tracking (like a session ID) or sorting purposes.

www.XYZClothing.com/products/women?category=dresses&sort=price_ascending delivers the same content as www.XYZClothing.com/products/women?category=dresses.

Google's Webmaster tools to detect and clarify these situations.

  • Reconsider Robot.txt Use

Of note, Google no longer recommends blocking access to duplicate content with a robot.txt file.

(This file is in the root directory of the website (www.XYZClothing.com/robots.txt) and it instructs search bots on what to index within the site).

If you are currently managing onsite duplicate content with a robot.txt file, you can read more about Google’s newest recommendations here.

Detect and Fix Offsite Duplicate Content

What about offsite issues that create duplicate content?

Guest blogging, article syndication and maliciously pirated or scraped content can all negatively impact your organic search ranking, author rank and most importantly – control over your own original content.

Takeaway for bloggers, writers and publishers:

You work hard to create unique content. Make sure it appears both on site and across the web as it should!

  • Use Webmaster tools to identify where the issues exist and fix duplicate content, both on and off your website or blog.
  • Using an advanced duplicate content detection and resolution tool can make this job much easier.

Attending to these 7 suggestions now will pay search big result dividends later! 

Free eBook: Learn from Your Publishing Peers!

Download our eBook filled with content strategies and insights from online editors, bloggers and content marketers on the cutting edge. 

Online publishers share growth content strategies