SHAREPOINTSearch.com

Welcome to SHAREPOINTSearch.com Sign in | Join

in Search
Skip Navigation Links
Home
Resource CenterExpand Resource Center
Research CenterExpand Research Center
BIG Resource List
Blog Zone
Forums
RequestsExpand Requests
About Us

How to avoid indexing menu-part of the webpage, but still crawl it?

Last post 07-14-2008 9:53 AM by lars. 1 replies.
Page 1 of 1 (2 items)
Sort Posts: Previous Next
  • 07-11-2008 10:41 AM

    • jns
    • Top 10 Contributor
    • Joined on 01-03-2008
    • Posts 7

    How to avoid indexing menu-part of the webpage, but still crawl it?

    I have a solution with Search server 2008 Express up and running on a website. The content that are crawled/searced is a website with a menu to the left.

    The problem is the search results. - Some results are irrelevant since the searched word is only present in the menu part of the result page, and not in the "content" part of the result page. - Is there any way one can tell the crawler that this type of content (the menu) should not get indexed, but off course crawled (to follow the links and crawl them.).

    I have a solution myself, but if there was an easier way to set this up in the administration tool, I would prefer that!

    My own solution to the problem would be:
    1) When the user agent is the sharepoint crawler, I will replace the menuitems with new menuitems where the text of the menuitem is allways written as a dummy text like "MyDummyMenuItemForSharepoint".
    2) When the crawler crawls through the menu, it will find the "MyDummyMenuItemForSharepoint" word very often, but this is irrelevant since you will never search for MyDummyMenuItemForSharepoint.
    3) Irrelevant results will not show up in the search results.

    Example:
    You search for the word "tax" and get a set of results where "tax" is part of the page. The lowest ranked results also contain the word "tax", but it is in the menu part of the page, thus making it an irrelevant page, since the page does not cantain any information about "tax", except for a link to a page called "tax". I would like to exclude these pages.

     

    Thanks in advance for any help!

  • 07-14-2008 9:53 AM In reply to

    • lars
    • Top 10 Contributor
    • Joined on 06-04-2007
    • Posts 17

    Re: How to avoid indexing menu-part of the webpage, but still crawl it?

    No, it is unfortunately not possible to configure some kind of crawl rule to ignore content from certain regions of a Web page. I think the solution you propose is the best way to go as you are in control of the Web server being crawled. The problem would have been more tricky if you needed to crawl an external Web site where you have bo access to reconfigure the Web server. In this case you would need to consider solutions like:

    • Developing your own Protocol Handler
    • Crawling through a custom Proxy where you filter out unwanted content.

    But these solutions are for sure harder to implement than the one you propose. 

    Cheers
    Lars Fastrup

     

Page 1 of 1 (2 items)
SHAREPOINTSearch.com is not affiliated with or endorsed by the Microsoft Corporation.See our Terms, Conditions and Privacy Statements
SharePoint is a trademark of the Microsoft Corporation.