What is Robots.txt?
The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct robots (typically search engine robots) how to crawl and index pages on their website.
Cheat Sheet
Block all web crawlers from all content
User-agent: *
Disallow: /
Block a specific web crawler from a specific folder
User-agent: Googlebot
Disallow: /no-google/
Block a specific web crawler from a specific web page
User-agent: Googlebot
Disallow: /no-google/blocked-page.html
Sitemap Parameter
User-agent: *
Disallow:
Sitemap: http://www.example.com/none-standard-location/sitemap.xml
Optimal Format
Robots.txt needs to be placed in the top-level directory of a web server in order to be useful. Example: http://www.example.com/robots.txt