Google
 
   
Login
Username:

Password:


Lost Password?

Register now!
Search
Main Menu
service
top books
Polls
What do you think about php-deluxe.net?
Excellent!
Cool
Hmm..not bad
What the hell is this?
encyclopedia
recommendation
Freenet DSL
Who's Online
8 user(s) are online (7 user(s) are browsing encyclopedia)

Members: 0
Guests: 8

more...
partner

Deep web

The deep web (or invisible web or hidden web) is the name given to pages on the World Wide Web that are not part of the surface web that is indexed by common search engines. It consists of pages which are not linked to by other pages, such as Dynamic Web pages. Dynamic Web pages are basically searchable databases that deliver Web pages generated just in response to a query and contain information stored in tables created by programs such as Microsoft Access, Oracle database or SQL. The Deep Web also includes sites that require registration or otherwise limit access to their pages, prohibiting search engines from browsing them and creating cached copies.

Non-textual files such as multimedia (image) files, Usenet archives and documents in non-HTML file formats such as PDF and DOC (computing) documents used to form a part of deep web, but now are more easily accessible to search engines, especially Google.

The deep web should not be confused with the term dark web or dark internet which refers to machines or network segments not connected to the Internet. While deep web content is accessible to people online but not visible to conventional search engines, dark internet content is not accessible online by either people nor search engines.

=Surface web=

To better understand the invisible web consider how conventional search engines construct their database defining the surface web: Programs call spiders or web crawlers start by reading pages an initial list of websites. Each page they read is indexed and added to the search engine s database. Any hyperlinks to new pages are added to the list of pages to be indexed. Eventually all reachable pages have been indexed or the search engine runs out of time or disk space. These reachable pages are the surface web. Pages which do not have a chain of links from page in the spider s initial list is invisible to that spider and not part of the surface web it defines.

In opposition to the surface web is the deep web . The great majority of the deep web is composed by searchable databases. To understand why these databases are invisble to spiders (and their search engines) consider the following: :Imagine someone has collected a great amount of information books, texts, articles, images, etc. and put them together online in a website, creating a database reachable only via a search field. This database, as most databases, would work like this: #in a search field the user types the keywords he or she wants #this searching facility looks inside the database and retrieves the relevant content #a page of results is presented bringing the links to every important topic related to the users query

Once a conventional search engines web crawler reaches this site, it will capture the text contained in the main page and in the pages which hyperlinks can be found to (usually about us, contact us, privacy policy, etc.). But the great majority of the information books, texts, articles or images that are only reachable by querying the search field, cannot be reached by the web crawler. The robot cannot predict which words it should type inside the search field. Thus the data is invisible to the search engine.

=Accessing the deep web=

As said before, search engines use web crawlers that follow hyperlinks. Such crawlers typically do not submit queries to databases due to the potential infinitude of queries that can be made to a single database. It has been noted that this can be (partially) overcome by having links to query results, thus increasing Google-style PageRank results for a member of the deep web.

In 2005, Yahoo! made a small part of the deep web searchable by releasing Yahoo! Subscriptions. This search engine searches through a few subscription-only web sites.

Some search tools are being designed to retrieve information from the deep web. Their crawlers are set to identify and somehow interact with searchable databases, aiming to provide access to deep web content. Some examples are: InvisibleWeb.com, LexiBot, Lycos Invisible Web Catalog and Incywincy.

(Health).

Another option is accessing directly the searchable databases. They represent the invisible web itself. A good example is Find Articles (exclusive articles). There are some catalogs listing the major specialized databases, as well as some alternative search engines that focus on finding specialty search engines and databases, such as GoshMe and Topic Hunter.

=Deep web extension=

In a 2000 study by the search company [http://brightplanet.com BrightPlanet], the inaccessible part of the web was estimated to be about 500 times larger, in terms of number of documents, than what search engines already provide access to. Any such figures must be taken with caution, however, due to the difficulty of distinguishing between genuinely different documents and documents that merely represent different database views of the same content.

=References=

  • Gary Price & Chris Sherman. The Invisible Web : Uncovering Information Sources Search Engines Can t See. CyberAge Books, July 2001. ISBN 091096551X
  • Joe Barker. Invisible Web: What it is, Why it exists, How to find it, and Its inherent ambiguity. UC Berkeley - Teaching Library Internet Workshops, January 2004. Last seen online July 2005 at [http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html http://www.lib.berkeley.edu]
  • Michael K. Bergman. The Deep Web: Surfacing Hidden Value. The Journal of Electronic Publishing. August, 2001. Volume 7, Issue 1. [http://www.press.umich.edu/jep/07-01/bergman.html http://www.press.umich.edu/jep/07-01/bergman.html]
  • Alex Wright, In Search of the Deep Web, Salon.com, March 2004, [http://www.salon.com/tech/feature/2004/03/09/deep_web/index.html http://www.salon.com/tech/feature/2004/03/09/deep_web/index.html]
  • =External links=

  • [http://qprober.cs.columbia.edu/ QProber: Classifying and Searching Hidden-Web Databases]
  • [http://metaquerier.cs.uiuc.edu/ MetaQuerier: Exploring and Integrating the Deep Web]
  • [http://library.albany.edu/internet/deepweb.html Deep Web from the library of SUNY-Albany]
  • [http://library.rider.edu/scholarly/rlackie/Invisible/Inv_Web_Main.html The Invisible Web Revealed by Robert J. Lackie of Rider University]
  • [http://www.invisible-web.net/ Invisible-web.net]
  • [http://www.brightplanet.com/technology/deepweb.asp BrightPlanet - Deep Web White Paper]
  • [http://searchenginewatch.com/links/article.php/2156181 SearchEngineWatch - Invisible Web & Database Search Engines]
  • [http://www.deepwebresearch.info/ A blog about deep web search]
  • [http://www.techdeepweb.com/ A How-To Guide to the Deep Web for IT Professionals]
  • =Some deep web related Search Engines=

    In addition, following are sites that claim to search the invisible web:
  • [http://www.findarticles.com Find Articles]
  • [http://www.goshme.com GoshMe]
  • [http://www.hon.ch Health on The Net Foundation]
  • [http://www.profusion.com/index.htm InvisibleWeb.com]
  • [http://www.incywincy.com Incywincy]
  • [http://www.lexibot.com LexiBot]
  • [http://www.scirus.com Scirus]
  • [http://www.topichunter.com Topic Hunter]
  • [http://search.yahoo.com/subscriptions Yahoo! Subscriptions]
  • [http://lib.nmsu.edu/instruction/specialtysearch.htm Specialty Search Engines]
  • [http://searchpdf.adobe.com Adobe PDF search]
  • [http://www.search.com/subjects SearchIQ]
  • [http://www.completeplanet.com Compete Planet]
  • [http://www.invisibleweb.com Invisible Web]
  • [http://www.freepint.com/gary/direct.htm Direct Search]