Yesterday while I was having a blast reading “The Anatomy of a Large-Scale Hypertextual Web Search Engine[1],” I happened across some fun facts.

We got into some of the more technical goods from the paper yesterday[2], but figured these would also be an worthwhile — or at least more enjoyable — read. Friday and all.

1. “Wow, you looked at a lot of pages from my web site. How did you like it?” – people encountering a crawler for the first time

They note that they received almost daily emails from people either concerned about copyright issues or asking if they liked the site after looking at it. For many people with web pages, this was one of the first crawlers they had seen.

“It turns out that running a crawler which connects to more than half a million servers, and generates tens of millions of log entries generates a fair amount of email and phone calls. Because of the vast number of people coming on line, there are always those who do not know what a crawler is, because this is the first one they have seen. Almost daily, we receive an email something like, “Wow, you looked at a lot of pages from my web site. How did you like it?” There are also some people who do not know about the robots exclusion protocol[3], and think their page should be protected from indexing by a statement like, “This page is copyrighted and should not be indexed.”

More innocent times.

2. A billion web documents predicted by 2000

“It is foreseeable that by the year 2000, a comprehensive index of the Web will contain over a billion documents. .

Read more from our friends at Search Engine Watch