Optimizing your WordPress Blog for Google: Part 1

How to avoid the duplicate content penalty when using a WordPress blog

There’s no denying it, Google is the powerhouse of the Internet; more than capable of showering targeted visitors on your website like a torrential downpour backed by gale-force winds. That is, if you happen to be in Google’s good graces and rank highly for popular keywords.

Bloggers often times find themselves in the position to receive great traffic from Google simply because of the way that blogs work; namely, they allow for quick inclusion into major search engines even when using the default setup. Being the most popular blogging software on the planet, WordPress is the content management system (CMS) of choice for bloggers due to ease of use and the large community of supporters that help to develop plugins for this open source blogging software.

However, just because WordPress blogs can help websites to get noticed by the major search engines easily, this doesn’t mean that they are inherently search engine friendly; in fact, the default settings for WordPress blogs almost guarantee that, if left untouched, your blog will end up suffering from the duplicate content penalty.

The duplicate content penalty and WordPress blogs

The duplicate content penalty is a term that is used to describe what happens when a web page is removed from the primary search results for a certain keyword phrase due to identical content elsewhere on the Internet. In Google, the lower-ranking websites and individual pages that contain the duplicate content are hidden, and this phrase is displayed instead:

In order to show you the most relevant results, we have omitted some entries very similar to the X (number) already displayed.

This happens primarily in two situations; with article marketing, A.K.A. article directory submissions, and when you post content to an unmodified blog. Since this article is focused on WordPress blog optimization, we’ll focus on how you can avoid the duplicate content penalty for the latter.

Unmodified WordPress blogs are search engine “unfriendly”

By using a WordPress blog with its default setup, you are creating an atmosphere where your blogs’ content is almost guaranteed to suffer from the duplicate content penalty. The reason for this is simple: By default, your WordPress blog will have the exact same content, word for word:

  1. On the blogs main page
  2. Within the blogs RSS feed
  3. On the category page
  4. On the monthly archive page
  5. On the posts unique page

Clearly, having 5 different instances of the exact same content within a single website will undoubtedly lead to your content being seen as “duplicate” content by Google, and other major search engines for that matter.

Avoiding the duplicate content penalty with a WordPress blog

There are a couple of things that can be done with your WordPress blog that will help you to keep your blog in the primary search results:

Limit the text shown on your blogs pages

The duplicate content penalty comes about due to large amounts of significantly identical text being shown on numerous pages throughout the Internet, not a few characters or even a couple of sentences. WordPress blogs allow bloggers the option to use what is called the “more” tag, where they can limit the amount of text that is displayed on the blogs main page, the category pages and the archive pages. By utilizing this tag, WordPress users are limiting the content for a post to be displayed, in full, only on the posting page itself; thus removing four of the five aforementioned instances of locations of duplicate content when adding a post to their blog, which can help to decrease the chances of being affected by the duplicate content penalty.

However, if you have an established blog – it would be a cumbersome chore to edit every existing post, not to mention it’s another step you would have to endure to use this option for all future posts. Fortunately, there is a WordPress plugin that will replicate the effects of the “more” tag, yet not requiring you to manually add it to your blog posts.

Evermore WordPress Plugin

The evermore WordPress Plugin will replicate the effects of the “more” tag, effectively limiting the amount of text shown on your blogs main page, category page and archive pages. You have the ability to specify the character limit, so you can have more or less text displayed in these other areas depending on your needs. Installing this plugin is as easy as uploading a file via FTP to your WordPress plugin directory, and the effects can be reversed completely by deactivating it.

Choose “Summary” for your blog’s RSS feed file

Another way to help stop the chances of becoming the next victim of the duplicate content penalty is to simply change the settings from within your WordPress blogs administration area.

Once you’re logged into your blogs administration area, go to Options>>Reading. Under the sub-heading “Syndication Feeds”, select the “summary” option. This will effectively cut down your chances of suffering from the duplicate content penalty.

Using the robots.txt file to keep Google (and other search engines) away

Now, of course you want search engines to crawl your blog, but in some cases it’s not in your best interest to have all of your pages crawled and indexed by all search engine robots. Another trick to helping you avoid the duplicate content penalty is to tell the Googlebot to stay away from your RSS feed. Some bloggers even go so far as to disallow the Googlebot from crawling their blog archives and category pages, but that is a personal preference; in all honesty we have yet to see conclusive evidence as to which approach is best. Here is an example of a set of commands that will help you to keep the Googlebot from accessing your blogs feeds.

Sample robots.txt file to disallow the Googlebot from your RSS feed

(if your WordPress blog is at the root of your domain)


User-agent: Googlebot
Disallow: /feed/$
Disallow: /feed/rss/$
Disallow: /trackback/$

You can check out some advanced commands for your WordPress blogs robots.txt file on the Ask Apache website. As with any modifications of this type, you should implement changes with extreme caution as improperly forming commands to your robots.txt file may cause it to be excluded from search engines all together!

Using the tips and tricks outlined here, you can help keep your blog in Google’s primary index and avoid the duplicate content penalty that many other bloggers suffer from. It goes without saying that one of the major contributing factors to the duplicate content penalty is syndicating articles from article directories; and for best results it’s always suggested that your blog is comprised of unique content rather than syndicated content.

This article is a 2 part series on how to optimize your WordPress blog for Google. In our next article, we will show you how to optimize your WordPress blog for ultimate search engine friendliness and also how you can pull your blog out of Google’s supplemental index as well as covering ways to stay out of it.

9 thoughts on “Optimizing your WordPress Blog for Google: Part 1

  1. Its not that easy to get done for dup content. You need to copy up to 90% of the text.

  2. … sure but you need to wait until article part2 ;)

  3. Uh oh..! Google reads rss feeds, and I am quietly confident that it also monitors them, so you might want to take that out of your robots.txt file.

    Google has publicly stated that in the event of syndicated content if the article appears on multiple websites and contains a link to the original source that the original source will be rewarded and not the site with the greater page rank. Therefore one can assume that google knows a thing or two about the world of blogging :) and duplicate content not only site-wide but net-wide.

    keep in mind that google will see link data for the article in the
    RSS feed, category page, archive page and the unique Unique page and therefore easily figure out what the original source is.

    If you could find an example of a poorly SEO’d wordpress blog with search terms returning the archive and not the original post then I would love to see it.

    cheers
    Nick

  4. @Nicholas Mullen,

    sure the RSS feed could helpful to index your website, but is not needed for that purpose. Within WordPress a ping is send to the Google blog search and your new post is included after only a few hours.

    The risk for indexing a blog’s RSS feed is about content duplication, the best thing is that only the pages with your posts are indexed by Google. The other pages need only to get followed

  5. Thank you for these great advises! I loved your article and I also found it very helpful because I had problems with duplicate content penalty on one of my blogs. Now I know what to do:)

  6. Thanks for this helpful information. I put the evermore plugin up, I rather like the shortened versions on my category pages and in the feed into my home page. Looks nice thanks. I thought that part of the reason that blogs do so well in the SE’s was because they can access the feed? Why block?

  7. An interesting and useful article, thanks. I have been wondering about SEO for WordPress blogs recently as I have a new one that I have just set up (see my URL).

    I use ExpressionEngine mainly and so I’m used to that, but WordPress is, it seems, a bit of a different kettle of fish when it comes to Google.

    Anyway, I’ve been playing about to see what combination of things gets better posts-only results whilst at the same time not being too brutally restrictive to search bots in general.

    So far, I’m trying an ‘archives and cats in drop-downs only’ approach and using the all-in-one-SEO optimisation plugin as well. Added to that a xml sitemap restricted to posts only.

    Too early to tell what the results will be yet as google already grabbed the archive pages so I need to wait for them to drop out of the index before I can see what’s left.

    I’m optimistic that, hopefully, and with luck — there will be something left. ;)

  8. I don’t like your advice about going to partial feeds. Here’s why. First of all feed readers don’t want to have to come to your site – that’s why they reading the ‘feeds’ instead. Also, I read an article recently from a high profit website that compared the number of hits to their site once they switched to FULL FEEDS – it was dramatic. This article removed the fears of full feeds out of my head. Using Full feeds doesn’t mean these readers will never come to your site – I use feed burner and a lot of my readers end up coming to the site. Anyway I’ll shut up about that subject.

    On my blog, at the main index page, I only show an ‘excerpt’ of each page. This limits the duplicate content – and also allows readers to more quickly scan the latest articles to get a feel for what they want to read. Next on my ‘categories’ page – I only list the title. I’m trying to help my readers find what they want and fast before they take a hike!!!

    Also, I removed the ‘archives’ section of the sidebar. From my perspective it’s a complete waste of space and gets longer with each passing month.

    Thank you for the info about robots.txt – I copied your example and updated my file. Anywho – I enjoyed the read and will be back!!!

    Barry

Comments are closed.