The old chestnut that WordPress ‘creates duplicate content issues’ keeps coming up. Someone even wrote it to me in an email recently.
Let’s be clear: WordPress does not create duplicate content problems.
What WordPress does is allow the same content, on the same site, to be accessed via a number of different URLs (or permalinks): tags, categories, day archives, month archives, year archives and author archives.
This is not duplicated content. It’s the same content on the same website accessed via different URLs.
Duplicate content is the same content on different websites.
Hear it from Google
At the recent Google Site Clinic held in London this point was addressed square on – here’s a snippet of the report on this specific topic from the Google Webmaster Central Blog:
Duplicate content within a website is generally not a problem, but can make it more difficult for search engines to properly index your content and serve the right version to users.
There are two common ways to signal what your preferred versions of your content are: By using 301 redirects to point to your preferred versions, or by using the rel=”canonical” link element.
You can read the full version here.
As I’ve written before, Google understands how platforms like WordPress operate and does not see the same content on the same website accessed via different URLs as duplicated.
And you have a range of ways to ensure that the URL you want to be seen as the primary version of your page is seen as such:
The Canonical tag
WordPress introduced the canonical tag function as a default in version 2.9. And the All-in-one-SEO-pack plugin, Platinum SEO and the other SEO plugins also feature the canonical tag and enable you to decide whether or not to use it.
With the plugins, all you have to do is check the option and your posts will be canonicalised.
So duplicate content is not created by WordPress.
Preserving Link Juice
The more valid concern is that you can lose link juice as a result of the same content being indexed via multiple URLs.
So the way to fix this is to
noindex each of your archives – or, at least, those that you don’t consider that important.
nofollow all the archives on my WordPress sites with the exception of the category and tag archives.
That’s because I’ve set up my categories and tags carefully to ensure that related content can be found easily (more details here).
So I use the canonical tag on each article, set
index,follow on my category and tags pages, and
noindex all the other archives.
This enables me to focus the search engines on the primary version of each article, and it gives them a structured pathway along which to crawl the rest of the site so they can index it both easily and fully.
And it minimizes lost link juice.