robots.txt disallow filetype txt
Google Hacks… or more aptly Google Dorks are a handy tool for anyone that not only enjoys SEO, but searching in general. [1]
If you use the ‘disallow’ word you can block parts of your sites to be found by search engines. [2]
Book about how to use Google and other search engines for hidden information. [3]
Google helps the attacker by allowing a search for the “disallow” keyword. [4]
Most of the sites now a days do not allow direct access to the folder using server side securities and Error like ” Acess Denied” which will be a problem to view the pages directly. [...] On the internet there are a lot of site owners that hide some of the site’s pages or even the entire site from the search engines. [2]
You can add additional keywords to that query in order to search for specific directory names that people don’t want to be found, but I’ll leave that to the Google hacker forums. [...] It struck me as a funny irony that the files which exist in order to tell search engines what to not index are themselves included and easily searchable. [5]
Of course no human being will ever read what you write, but their bots and grepping algos will do it for the owners of the “free” email services (or of the “free” search engines), presenting them nice tables built on your private data as a result. [3]
Webmasters wanting to exclude search engine robots from certain parts of their site often choose the use of a robot.txt file on the root of the server. [4]
MSN Bot has been here 46 times. [6]
Wasabi rates this entry 6 out of 10. [4]
Robots.txt is a file that by convention placed in the main folder of a web site which provides some information to the search engines (the robots) who visit. [5]
If you already have an account, log in. [6]
There should be no reason for me not to play with them and entertain you, titillate and hopefully even educate. [1]
Sources:
[1] Google Hacks for Dorks and SEO prowlers
[2] How To Find Secret Sites And Articles
[3] How to Find Anything on the Web
[4] Google Hacking Database
[5] On Web Archives and White Houses
[6] Hacker’s Favorite Search Queries 2