Search engines work by sending out small programs called spiders to crawl around the Internet and gather information about pages to be included in what the search engine indexes. The goal of the search engines is to compile the haystack and then, through sophisticated algorithms employeed in their search technology, help you find the needle in it. It's great to be found when you want to be found, but rather mortifying when someone finds the humorous birthday site a friend built for you with your Baby Hippo nickname and embarassing old pictures with bad hairstyles from your distant past. More importantly, however, some information can be dangerous in the wrong hands, as noted by the National Infrastructure Protection Center (NIPC) in its recent warning on Considering The Unintended Audience.
One method you can use to conceal nonpublic pages is the use of special html code in the Meta tag section of your Web site. The special code is often referred to as robots.txt and its purpose is to instruct search engine spides not to crawl certain portions of your site. Here are a few resources to help you understand how it works and how to deploy it:
- A Standard for Robot Exclusion
- Robots Exclusion
- Keeping Robots Out of Your Corner of the Net
- Webmaster Resources: No Robots
- Spiders and Robots Exclusion
Since the robots.txt code is not universally recognized, highly sensitive data should either be left off the Internet or placed behind password protection.First published on BankersOnline.com 4/01/02