Reviving Dead Web Pages

Learing how to retrive web sites that no longer exist from a search engines cache may turn out to be one of the most important skills you learn. Being able to revive a web page that might still be cached by a search engine can help you bridge potentially important gaps in information and research.

One way to do this and the easiest by far is using the Way Back Machine at the Internet Archives (link)

However, the Way Back Machine *does not* have a copy of every web page ever uploaded to the web. Some site owners specifically request that thier sites not be crawled for inclusion in the archives. While other webmasters will include small lines of code the block the crawlers (better known as robots.) So once that "blocked" page disappears it's gone forever.

But you still might be able to find other pages that are just as important and here's how you do that:



From The Google help center (link):


Advanced Operators

Google supports several advanced operators, which are query words that have special meaning to Google. Typically these operators modify the search in some way, or even tell Google to do a totally different type of search. For instance, "link:" is a special operator, and the query [link:www.google.com] doesn't do a normal search but instead finds all web pages that have links to www.google.com.

Several of the more common operators use punctuation instead of words, or do not require a colon. Among these operators are OR, "" (the quote operator), - (the minus operator), and + (the plus operator). More information on these types of operators is available on the Basics of Search page. Many of these special operators are accessible from the Advanced Search page, but some are not. Below is a list of all the special operators Google supports.


and the cache search operator:



If you include other words in the query, Google will highlight those words within the cached document. For instance, [cache:www.google.com web] will show the cached content with the word "web" highlighted.

This functionality is also accessible by clicking on the "Cached" link on Google's main results page.

The query [cache:] will show the version of the web page that Google has in its cache. For instance, [cache:www.google.com] will show Google's cache of the Google homepage. Note there can be no space between the "cache:" and the web page url.


So now you can look for deleted web pages on your own and fill in those research gaps.


Sites that contain more than one copy of a web page:

Archives - It
http://www.archive-it.org/public/all_collections

Wayback Machine
http://www.archive.org/web/web.php


WebCite (Mostly health related pages)

http://www.webcitation.org/query

Other single copy archives include:

Gigablast (Goes Back 1 year from the current calendar date)

http://www.gigablast.com/

Exalead (Goes Back 6 month from the current date)

http://www.exalead.com/


Family-source (Goes back to 2005)

http://www.family-source.com




And the sites below contain futher technical explaination, tools and or guides on how to do cache searching:


http://www.searchengineshowdown.com/others/archive.shtml


http://www.robotstxt.org/faq.html

http://www.onlinemag.net/mar02/OnTheNet.htm

http://web.archive.org/collections/web/advanced.html

http://www.google.com/help/features.html


http://www.google.com/help/operators.html

http://www.googleguide.com/cached_pages.html

http://en.wikipedia.org/wiki/Page_cache

http://help.yahoo.com/l/us/yahoo/search/basics/basics-09.html


http://www.pagefactor.com/

http://urlcut.com/ManyFacesGoogle

http://blog.searchenginewatch.com/blog/060118-165021

http://www.webuildpages.com/cache/cachetoolpublic.pl

http://www.web-caching.com/

http://www.web-cache.com/

http://squidbook.org/index-two.html

http://www.caching.com/

http://www.ircache.net/

http://pages.cs.wisc.edu/~cao/links.html

http://www.w3.org/Propagation/

http://www.mnot.net/cache_docs/

http://vancouver-webpages.com/CacheNow/

http://forskningsnett.uninett.no/arkiv/desire/

http://www-sor.inria.fr/projects/relais/reading-list.html

http://excalibur.usc.edu/

http://www.research.rutgers.edu/~davison/web-caching/bibliography.html


http://www.networkworld.com/netresources/caching.html


http://www.w3.org/Daemon/



Related Post:

Making Pdf Files

Using Emule to get and share information

0 comments

Translate Page Into Your Language

Image Hosted by UploadHouse.com



Image Hosted by UploadHouse.com









del.icio.us linkroll

Archive

Counter

Counter

web tracker

Widget

Site Meter

Blog Patrol Counter