The NoIndex tag

Published 09 March 07 09:58 AM | Search and the Art of Website Maintenance

One interesting thing about web content nowadays is that it is served almost exclusively by CMS systems. Most companies and even people with personal sites (like my own) use some sort of CMS system like Sitecore, DotNetNuke, Microsoft CMS, or EpiServer (my personal favourite) to manage their information and post it to the web.

The problem for most search engines, both local and global, is that the pages in their systems are based on templates that have the same menus and information on every page. There is usually just a small section in the middle of the page that actually has the content that the page is about. There are often even news items or advertisements in the sidebars of the templates. A lot of this recurring content also fits very well with the most important concepts of the organization. Therefore, many searches return all the pages when searching for a general concept expected on the site. This produces a lot of noise in the search results. Many of my customers ask how they can avoid this.

 

Some other vendors (eg. Microsoft Sharepoint) have offered the suggestion of returning a different version of the site to the search engine when it crawls by recognizing the Agent Identifier of the search engine and then returning only the content parts of the page. This causes a lot of hassle and requires some sort of programmatic intervention, sort of like a browser check.

Many years ago, before I started with Mondosoft, we had already solved the problem by inventing a special tag pair that can easily be placed around the sections of the template (or around user controls) that you don't want indexed. This tag pair was originally <noindex></noindex> but has since been changed to <!-- noindex --> <!-- /noindex -->. The change puts the HTML tag pair in comments so that other crawlers/browsers do not get confused by it and the pages are HTML standards compliant.

I know that the World Wide Web Consortium did look into this issue but didn't come up with a way to exclude specific content from crawlers. The best suggestion I could see them coming up with was having noindex as a class element in tags. This however, would screw up your design and formatting if you were using cascading style sheets (CSS).

I recommend all our customers use this tag pair if you can - you will see an immediate improvement in your search results!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Read the complete post at http://blog.mondosoft.com/art-of-search/archive/2007/03/09/18.aspx