Thanks to Matt Cutts for the heads up on how to use Google Webmaster Tools to get more information on our websites’ 404 File Not Found errors. As the Google Webmaster Central Blog announced: Webmaster Tools shows Crawl error sources.
For those who have not used the diagnostic tools of Webmaster Central, one of the informative features is getting a listing of the urls that resulted in 404 File Not Found errors. Previously, it was an exercise in futily, though, since we did not know where the incorrect link(s) originated. Now, that we know where the sources are, we have a better chance of correcting the links.
When I went to find out what caused my site’s 404 errors, it was pretty enlightening. I found out what appeared to be a hack attempt on my blog. I found some bad urls in the form of takethu.com/blog/page/NUM/?ref=BADSITE.COM. Fortunately, my blog was up-to-date so those urls didn’t do anything malicious nor contain anything bad on the pages. I checked Google’s cache to confirm that there was no spam. However, those results did show up in a Google site search of my blog so I needed to do something to get rid of them. This was what I added in my robots.txt to tell search engines to drop those urls from their indices:
I love being able to use wildcards in robots.txt. Another nifty tool in Webmaster Tools is “Analyze robots.txt”, which enables testing of robots.txt disallow/allow patterns against actual urls to see if Googlebot will respond to the urls correctly.
Another thing I found was that there was a broken file path for a flash file on my site. Once I found out on what page it occurred, I was able to come up with a solution and fix it.
Thanks to the Google Webmaster Tools team for giving us webmasters such a useful tool.