robots.txt best practice guide

The robots.txt file is an often overlooked and sometimes forgotten part of a website and SEO.

But nonetheless, a robots.txt file is an important part of any SEO’s toolset, whether or not you are just starting out in the industry or you are a chiseled SEO veteran.

What is a robots.txt file?

A robots.txt file can be used for for a variety of things, from letting search engines know where to go to locate your sites sitemap to telling them which pages to crawl and not crawl as well as being a great tool for managing your sites crawl budget.

You might be asking yourself “wait a minute, what is crawl budget?” Well crawl budget is what what Google uses to effectively crawl and index your sites pages[1]. As big a Google is, they still only have a limited number of resources available to be able to crawl and index your sites content.

If your site only has a few hundred URLs then Google should be able to easily crawl and index your site’s pages.

However, if your site is big, like an ecommerce site for example[2] and you have thousands of pages with lots of auto-generated URLs, then Google might not crawl all of those pages and you will be missing on lots of potential traffic and visibility.

This is where the importance of prioritizing what, when and how much to crawl becomes important.

Google have stated that “having many low-value-add URLs can negatively affect a site’s crawling and indexing.” This is where having a robots.txt file can help with the factors affecting your sites crawl budget.

You can use the

Read more from our friends at Search Engine Watch