Crawling the Catalogue! Why not?
Most often you probably go to Google to look for books. You could find a particular author or something on the New York Times Best Seller list, and most likely somewhere in your results list there will be the Amazon page for the item you're hunting down. Great! Even if you don't know what Amazon is you can stumble upon it by random searching and you get to buy the book you're looking for.
Now what about the Library, does it have the book? Can't spend all your money on books right? Well for starters you need to be in the know to realize that your city has something called an Online Public Access Catalogue and that you should go check it specifically for the book. Wouldn't it be just simpler if you could find these resources directly on Google without firing up the Catalogue. Of course it would but there are artificial barriers created so you can't do that.
Robots.txt and the Catalogue
Those with experience with web development have probably come across something called robots.txt. Basically it is a text file that is located in your web directory that tells spiders, and the Google indexer what pages of your site to crawl. After being crawled of course the next step is to be included into Google Search results. Most normally you'd use robots.txt to stop the indexing of private portions of your site or the parts that are automatically generated that probably wouldn't make sense to have in Google.
However that is not always the case. You can use a robots.txt to stop the indexing of any content you choose. Take for example:
A couple of Library catalogues that disallow crawling. So the short of it is you'll never get a catalogue page in a Google result screen.
What's up with that?
Bailing out on Google
The idea of bailing on Google is by far nothing new. Making news recently is Rupert Murdoch and his plan to remove News Corp content from the Google index. That would include such sources as Fox News, and the venerable Wall Street Journal. The justification behind the move is that Google is giving away too much for free and that somehow devalues the content since people then expect free. That of course is no way to run a business. It's got to be said however that Google is a source of a lot of traffic and blocking that off will stop sending a large amount of eyeballs to News Corp sites. It has been estimated the WSJ.com could lose potentially 25% of it's traffic. What is even more interesting though is that Microsoft and Bing might end up paying NewsCorp to de-list from Google. I understand that WSJ.com is trying to make money so giving out free access would make that tricky but the parallel in the Library doesn't make sense. Libraries are free from the start and whither public or academic their mandate is to serve their users in the best way possible. 25% percent more traffic (ok, just a guess) from allowing a Google crawl in your catalogue sounds like something worth doing.
Still finding results from your library
There is hope. Not all of the Library's catalogues and holdings are hidden behind OPACs available only to those in the know. Some initiatives are under way that try to provide local library results for the books you look up. WorldCat is probably the best working example but it is far from complete. That supposes of course too that you know WorldCat exists and you can use it to find local copies of the book you want.
Robots.txt also is not the only definitive method of allowing or stopping the Google indexer. Looking into it deeper I'm sure it is possible to find examples of Libraries and other web services that are working hard at exposing Library Holdings to search engines. Probably a good use of time considering how prevalent the public search engine is.
Library books in Google. There should be an app for that.
