Category Archives: Wordpress

Reduce Duplicate Content in WordPress Archives with robots.txt

My WordPress blog url structure for posts is of this pattern: /blog/year/month/date/post-title/

Ignoring the fact that a post could also show up in /blog/ and /blog/category/ and other places, the same post could show up in:

/blog/2007/

/blog/2007/07/

/blog/2007/07/21/

/blog/2007/07/21/post-title/

The duplication is not as big of a concern (to me) as the possibly poor user experience when someone encounters one of the first 3 url patterns in a search engine results page. Presumably, they’re interested in reading a particular post. If they get a url that doesn’t go directly to the post, they’d have to scroll through the page to look for it, or use Find. Sometimes, results are irrelevant in those multi-post pages because the keywords are taken from posts that have nothing to do with each other.

I’ve been thinking of using robots.txt to block out indexing of the year, month, date archives, but wasn’t sure how to preserve the posts, since they also contain the same patterns in the url. Using Google’s Webmaster Tools robots.txt analysis tool, I played around with some patterns and tested the url patterns. I came up with the solution that preserves the posts while blocking the archives and the feeds. :)

In robots.txt, add these lines (this assumes you have a user-agent line already):

Allow: /blog/200*/*/*/*/
Disallow: /blog/200

Example results according to Webmaster Console:

URL Googlebot
http://takethu.com/blog/2007/ Blocked by line 15: Disallow: /blog/200
http://takethu.com/blog/2007/07/ Blocked by line 15: Disallow: /blog/200
http://takethu.com/blog/2007/07/21/ Blocked by line 15: Disallow: /blog/200
http://takethu.com/blog/2007/07/21/post-title/ Allowed by line 14: Allow: /blog/200*/*/*/*/

If you haven’t tried out Webmaster Tools, maybe you can see now how useful it is. I’m recommending it as a webmaster myself, and not because I work at Google. :)

Aside:

I’ve read that this particular pattern (y/m/d/p/) isn’t helpful (for ranking) since a search engine might not like that the content is a few directories deep. That’s too bad. I personally find it helpful to see the date in the url because it indicates how timely the content is. Also, it’s similar to a file structure on a computer. So if one were to go up a directory, there are other files from the same date and so on and so forth. We’re supposed to build sites for users, not search engines and although there are times when compromise is made to appease search engines in order to help them help users find content, this was something I didn’t want to compromise on. Also, this pattern is useful if I ever want to have the same post title at different times, without using a post id in the url, which isn’t as informative as the date structure.

Before applying these changes to your own site, please thoroughly test the patterns from your own site. I’m providing this information as a starting point so bloggers can have an idea of what patterns to use. The responsibility of proper implementation still rests upon the webmaster.

How to Display All Blog Posts

For reasons I will explain in another post about Linked Custom Search Engines, I wanted to generate a page that would include all of my posts in their entirety. There didn’t seem to be a ready-made solution. I found a plugin but it didn’t show all posts the way I wanted. This is different from the current view where the posts are paginated when there are x number of posts per page.

I came across this support page, displaying postbypost yearly archives, where it game me a good starting point. I learned about The_Loop. I eventually managed to create a page that shows all posts on one page.

I figured I would share the code to help others in a similar situation. All Posts Template file, is a template file to put in the themes folder. Remove the .txt that I had to add to the filename to upload to my blog. Create a new page and choose AllPosts as the template. If you want to minimize duplicate content, add a robots.txt entry for that page. It’s possible to add url parameters so that posts from specific years are displayed: 2007 posts.

How I Create Widgets for WordPress

I haven’t been able to find clear instructions on how to add widgets to WordPress. This widget tutorial for developers was as close as I could get, but it applies to the Widget Plugin before WordPress included Widgets in v 2.2.

I’ve figured out a system that has worked for me. I can’t say that it is the best or only way. It’s just a way for me to accomplish what I want. That’s also why I’m not calling this a how-to.

First, create a new file called widgets.inc.php and place it in the themes folder. Here is an example based on a widget to spread Firefox:

<?php
/* Widgets I added */
	function wp_widget_firefox($args) {
	extract($args);
?>
	<?php if (strpos($_SERVER['HTTP_USER_AGENT'], 'MSIE') !== false) {?>
		<?php echo $before_widget; ?>
		<?php echo $before_title; ?>
		<?php _e('Try Firefox'); ?>
		<?php echo $after_title; ?>
		<ul><li>
			<a href="http://www.spreadfirefox.com/?q=affiliates&id=YOUR_ID&t=210"><img border="0" alt="Firefox 2" title="Firefox 2" src="http://sfx-images.mozilla.org/affiliates/Buttons/firefox2/firefox-spread-btn-1b.png"/></a>
		</li></ul>

		<?php echo $after_widget; ?>
		<?php
		}
		?>
<?php
	} //close function
?>

Now edit /wp-includes/widgets.php.

FIND:

add_action('init', 'wp_widgets_init', 1);

BEFORE it, ADD:

$themes_path = '/home/ABSOLUTE/PATH/TO/BLOG/wp-content/themes';
include("$themes_path" . '/widgets.inc.php');

Edit the path accordingly.

In the wp_widgets_init() function, add these lines for each widget you want to add. Using the Firefox widget example:

$class['classname'] = 'widget_firefox';
wp_register_sidebar_widget('firefox', __('Spread Firefox'), 'wp_widget_firefox', $class);

WordPress Widget: Spread Firefox

I coded a widget to help Spread Firefox via a WordPress widget. Here is the code:

function wp_widget_firefox($args) {
	extract($args);
?>
	<?php if (strpos($_SERVER['HTTP_USER_AGENT'], 'MSIE') !== false) {?>
		<?php echo $before_widget; ?>
		<?php echo $before_title; ?>
		<?php _e('Try Firefox'); ?>
		<?php echo $after_title; ?>
		<ul><li>
			<a href="http://www.spreadfirefox.com/?q=affiliates&id=YOUR_ID&t=210"><img border="0" alt="Firefox 2" title="Firefox 2" src="http://sfx-images.mozilla.org/affiliates/Buttons/firefox2/firefox-spread-btn-1b.png"/></a>
		</li></ul>

	<?php echo $after_widget; ?>
	<?php
	}
	?>
<?php

}

I honestly don’t know if this exact code will work for people because I came up with my own system of creating widgets. However, you should be able to use this as a template if you know how to add widgets to your WordPress blog. Note that the code assumes that it will be in a file that contains other widgets. That’s why the php tags are the way they are.

I wrote the code so that it only shows the widget when the user agent/browser is IE. There’s no point in showing it for people who are already using Firefox, or any other browser for that matter. I don’t like to see things that don’t apply to me.

For example, that DirecTV commercial with Pamela Anderson. She looks at the camera and berates the viewer for not watching her on DirecTV. As a matter of fact, I am using DirecTV. Advertising needs to be smarter than that.

BTW, although the link has affiliate info, Spread Firefox Affiliates don’t make money as with other affiliate programs. For me, I just like to know how I am contributing to the effort.

How to Minimize Duplicate Content in WordPress Blog

After reading this post at Google’s Webmaster Group, I was inspired to find out how to stop displaying full content on pages where posts were listed in categories, archives, etc. I am not so concerned about duplicate content as much as I don’t like for people to find a result in a category page but they have to look around the page to find it, or the post got moved to another page in the category and thus can’t find what they were looking for.

I didn’t know how to go about doing it so I searched. This page, Showing full posts on homepage, but snippets elsewhere, was a good start.

In short, the key is to edit the theme’s archive.php:

Change:

<?php the_content(); ?>

to:

<?php the_excerpt() ?>

A while back, I had copied the archive.php from the Default theme for another purpose, because Ocadia didn’t have its own version of the file. I ended up not using the file for anything important so I didn’t modify it much. Once I edited it to show snippets, or excerpts, on category and archive pages I noticed that it didn’t look like the homepage listings. For consistency, I copied the code between the divs for <div class="post"> in the theme’s index.php. With that change, pages for categories and archives no longer showed the posts in their entirety.

However, posts that were listed on “previous pages”, such as the one linked to at the bottom of the home page, continued to be a source of duplicate content. In the ocadia index.php, the code for search page is written to show excerpts. What I did was add a condition so that pages showed excerpts, too, like this:

<?php if (is_search() || is_paged()) { ?>

<?php the_excerpt() ?>

Now, if you go to the deeper pages of the index, category, etc, it shows post excerpts.

I also read recommendations to do the same to the home page. When I made the change to the home page, I did not find it aesthetically pleasing. The excerpted posts on the home page of the blog made it look like a splog that had scraped the content–not a good first impression for visitors. This was a situation where user experience trumped search engine optimization.

I just made these changes tonight. Time will tell if this will help or not.