Well done Google.
Now there are issues, largely because of its success.
Our approach to content at the Google Blog explains how it works with content owners and its desire to respect their rights.
The case is well put by Danny Sullivan at Search Engine Watch. He says:
In terms of copyright, Google stresses that it generally sticks to what's known as fair use, though the post doesn't use those words. The idea is that it shows very short summaries of stories, pages, thumbnails of images but doesn't reprint this material, requiring people to clickthrough to the actual material from places like Google News.
Of course, in the case of cached pages, many including myself would argue that Google goes beyond fair use. Cached pages are an example where content can be viewed without clicking through to the original site, and the opt-out approach for that doesn't feel appropriate at all.
Google also notes there are cases when it wants to go beyond fair use, to make broader use of content where permission would be required. The deal with the Associated Press is cited as one of several examples here.
To me, this is also a way for Google to help defuse the idea that some publications have, such as the Belgian newspapers recently, that Google can be bought off to avoid lawsuits. To me, this is Google stressing that it will do content deals in some cases, but that these content deals aren't necessarily being done to avoid lawsuits, especially when it feels it is acting within fair use guidelines. That's my speculation and take on this, of course. Google didn't comment when I asked if this was the reason for raising the AP deals.
Moving past Google saying it respects copyright, it then stresses that it allows people to opt-out, even if it feels it has fair use rights. In general, I agree with this method, which Google along with the other major search engines generally follow. Trying to get permission from each web site to index it would be an impossible task, and one that's not necessarily even legally required. Opt-out through things like robots.txt is an effective way to protect rights holders plus benefit the public as a whole. I do hope they'll change cached pages to opt-in, however.
I have commented before that blocking a search engine is a way of excluding people from finding your organisation. It reduces the digital footprint of the organisation for most people and, with the exception of the enthusiast, denies the organisation benefits available from its online asset.
Of course, whether Google caches a page or not does not mean it is not cached. It is and can be found. The Internet has a collective memory that means all content can be recovered anyway.
The 8 years Google experience has brought great benefits to us all. If some organisations want to throw it away, they will regret it.