This is going to be a quick little post where I confess to doing something stupid. I should have known better. We are currently prototyping a MOSS 2007 Enterprise solution for a government organization. This customer has a fairly large public website, and a similarly large intranet site. While demonstrating how to configure search scopes, I created a scope pointing at the external site, and changed its settings to "unlimited" without really thinking about what that meant. Well, what it means is, it's a good way to fill up your server's hard drive, and a good way to annoy other departments of the government. It turns out that in "unlimited" mode, Sharepoint crawls every link, then keeps crawling links found on destination pages, branching out until, I assume, it has indexed the entire Internet. I also learned that Sharepoint doesn't handle the ROBOTS.TXT files correctly, so sensitive information could be crawled along with everything else. I couldn't believe this, until I found a post here that confirms it: Sharepoint only looks at the root of the site for ROBOTS.TXT files, and ignores them wherever else they may be: Observations During our testing we discovered the following. 1. robots.txt file is cached for 24 Hours following it's first request by the crawler. The implication of this is that changes to robots.txt require either a restart of the Office Search Service or a delay of upto 24 hours before they are respected by the gatherer....