I spent some time analysing my Google Webmaster Tools crawl errors this week and tidied up some of those pesky 404’s that keep showing up.
There was a time when I had zero 404 errors on this site, and very proud of it I was too.
But my pride was misplaced.
As this site has grown and evolved I've taken pages down, I've rationalised tags and I've changed its configuration (which does lead to 404 errors, believe it or not).
As a result I was returning well over five hundred 404 errors at one point.
How you create 404 errors
Was this a bad thing? Not necessarily, but how did I create so many 404 errors?
Well the largest contributor was when I rationalised my tags. At one point I had over 1,100 tags on this site, many with only 1 article in them, some with none. Today I have 36.
I didn't remove them all in one sitting, but every tag I removed created a 404 error.
I've also changed the number of posts showing on the front page, at one point going as high as 16.
So how did that create 404 errors?
If I have 20 posts on the site, with the front page set to show 5 posts per page, I'll have 4 pages of posts if you keep clicking the 'Previous entries' link at the bottom of the page.
But if I change my configuration to show 10 posts per page you'll only find 2 pages of posts by clicking the 'Previous entries' link. That creates two 404 errors right there.
That number then gets multiplied by the number of categories and tags you have on your site, since category and tag pages, by default, display the same number of posts per page as your posts-page does.
So you can begin to see how the 404 errors ramp up pretty quickly when you change your site configuration.
Different approaches to dealing with 404's
In this article on the SEOMoz blog, Rand Fishkin refers to two different views on 404 errors:
- You should have none, so you should 301 redirect any and all that occur, either to a related page or to your home page
- Don't worry about it - just let any erroneous link go to a 404 page.
He also refers to a half-way house: leave some 404 errors in place but selectively redirect others to related pages on your site.
That's the approach I've taken, and here's why:
The third highest generator of 404 pages on this site is articles that I've removed.
They were poor quality articles, written when I was just starting out. Some of them were inaccurate, some referred to a business model I now actively dislike and some were just badly written.
In any event, I no longer wanted them on the site and I didn't want them in the search engines' indices either.
And, as Matt Cutts advised in this article, if you want a page to be removed from Google's index just make it return a 404 error and wait for Google to re-crawl (and de-index) it.
So most of those articles that I trashed, I left to return a 404 error.
But there were some that had a high number of incoming links from external sites, and for which there was some related information on this site, so I redirected those to a relevant tag page, category page or article.
All the tags I trashed have been left to return a 404 error. (When I rationalised my tags I re-tagged articles whose original tags were being removed, so all the tags I removed were empty).
I've now reduced the number of articles on post-pages on this site to 8, which has effectively removed the 404 errors that were created when I increased the number per page originally.
So now the number of 404 errors this site returns is dropping - from around 500 it's down to 97, and it will continue to drop steadily as Google de-indexes the pages and tags that I've removed and not re-directed.
So are 404 errors bad?
As I said at the beginning: not necessarily.
They do, of course, create a frustrating user experience.
But if you use a custom 404 page, as I've done on this site, you can minimise your visitors' frustration by offering an apology and giving them some options. Here's my 404 page.
Then make sure you selectively set up 301 redirects to other pages on your site, as long as they have information that's relevant to the page that's returning the 404.
Finally, as per Matt Cutts' advice, if you really want a page removed from the search engines' indices then leave it as a 404 until they re-crawl and subsequently de-index it.
Thoughts? Questions? Leave a comment.