Continuing with our Search Engine Optimization (SEO) discussion, it’s necessary to understand just how search engines see our WordPress sites. While the computation and algorithms are complicated and proprietary to each search engine, an understanding of the basics gives us the tools needed to optimize our sites.
A brief introduction to web crawlers
Hopefully, I’m not ruining any fantasies here, but there isn’t a room full of people at Google looking through web pages and indexing the contents of different sites. That would require way too many people and cost way too much. Instead, large search engines generally implement web crawlers, programs which quickly read web page data. This process called crawling or spidering generally performs the following actions:
- Read the prominent* page data on the current page
- Read all hyper-links to other web pages and add them to a list of future pages to crawl
- Move on to the next web page
* What a particular crawler deems to be “prominent” data depends entirely on the purpose of the crawler
There are a two important things to note here:
- Once a crawler leaves your site, they may not come back for a while!
- Since a crawler follows the links on your page, you had better make sure that the links count (and link to more of your content)
Thankfully for us, there is a set of standards which are used a guidelines for these web crawlers (also called robots).
What are these meta robots?
Spend any time in a web design forum, or browsing through an HTML book and you’re bound to come across the concept of meta tags. As defined by Wikipedia, meta elements are metadata which can be embedded in HTML web pages that provide structured information about the current page. These elements can include such things as keywords, location data, and content type.
The meta tag we’re interested in right now is the robots meta tag. Using this tag carefully, we can specify three important attributes for links on our site**.
- noindex – An attribute suggesting that the robot not add the data contained within the tag to the index (or search engine as the case may be)
- nofollow – An attribute suggesting that the robot not add the link contained within the tag to the list of future pages to crawl.
- noarchive – An attribute suggesting that that robot not to store a cached copy of your page.
**There are more attributes that are search engine specific, but I’m only covering the major ones here
You will recognize a couple things right away. First, is that I was very careful to use the word suggest. The author of the robot can choose whether or not to take these suggestions to heart. Google seems to do a good job of following these suggestions, but I can’t speak for other search engines. Second, the malicious user will notice that you can keep the robot from indexing other people’s sites that are linked to from your page.
Using these meta tags in the heading of a particular page applies them to the whole page. The syntax is as follows:
<meta name="ROBOTS" content="NOINDEX, NOFOLLOW">
For more information, check out this helpful link.
The downside to this method is that you are specifying these properties for the whole page. While this may be desirable in some instances, more often than not it’s desirable to specify such attributes on a link by link basis. We can do that by adding the “rel” attribute to the standard link syntax as follows:
<a rel="noindex, nofollow" href="http://google.com">Google</a>
How to use meta robots for good?
So here’s a true story. This may be a bit embarrassing for yours truly, but in sharing, I hope that I can help others avoid the same mistake. Earlier this year, I was looking through my Google Webmasters account to see which keywords I was hitting with this site. (For more information on how to setup Google Webmaster Tools and how to find keywords, see my post entitled What is SEO and why do I need it?) I was a bit surprised to notice that my my number one keyword was “RSS”. Clearly, being a Photography / Blogging site, this wasn’t good! Thankfully, if you click on a keyword in Google Webmaster Tools, you’ll be given a list of what pages this keyword appears in. I was shocked to find out that the keyword “RSS” appeared numerous times on every page of my site!
Digging a little deeper, I found that every place I had a link to my RSS feed (sidebar, post metadata, etc…) these links were being indexed. And since the text that is also a link seems to be given a higher ranking by Google, the keyword “RSS” was doing very well!
So, I went and changed the links to my feed to add the nofollow and noindex attributes, and waited. One thing you will learn about SEO is that changes don’t happen overnight! After a few weeks, the top keywords on my site were back to useful keywords.
Keywords are a very important since they are how new visitors can find your site. By assisting web crawlers and leading them to the correct content, you can make your site appear more readily when people search for the chosen keywords. However, do keep in mind that nothing replaces content. Just drawing visitors to your site and having them leave disappointed is not the way to go. It’s important to realize that some search engines will penalize your site if they feel you are incorrectly guiding their web crawler. So I guess I would caution you, the designer, to ensure that you only use these techniques to help your site appear correctly in searches — or more importantly, to not appear in searches that are completely irrelevant. RSS Feed anyone?