|
|
Celtica's Computer Club > Webscape > Using Metasearching
Using Metasearching
Most surfers have used one or other
of the Web search engines. These resources collect Web pages and create
databases for people to browse or search. They can be quite comprehensive
or fairly specialized. Each of them has its unique content and presents
a unique interface, requiring a unique set of rules for searching and displaying
search results differently. To exhaust a search, one often has to use several
of them and has to be familiar with the different interfaces and searching
rules.
A metasearch is a central place with
a uniform interface where a query can be entered and the search can be
conducted simultaneously in as many search engines and directories as necessary,
and search results can be brought back and displayed in a consistent format.
Tools with these features have come to be called metasearch engines.
Unlike the individual search engines
and directories, metasearch engines do not have their own databases; they
do not collect web pages; they do not accept URL additions; and they do
not classify or review web sites. Instead, they send queries simultaneously
to multiple Web search engines and/or Web directories. Many of the metasearch
engines integrate search results: duplicate findings are merged into one
entry; some rank the results according to various criteria; some allow
selection of search engines to be searched.
Before conducting a metasearch engine
search, it is important to find out which search engines are included in
your search. Most metasearch engines default to the major search engines,
such as AltaVista, Excite, Lycos, and Infoseek. Others will also include
Usenet searches, and other specialized databases. Negotiations between
the metasearch engine companies and the individual search engine companies
may also result in a major search engine being excluded from a metasearch
engine. For example, Northern Light would not allow any of the metasearch
engines to robotically search its index since this process drains its resources.
Development of metasearch engines lags behind development of search engines
and some metasearch engines are still including defunct search engines.
Successful use of a metasearch engine
depends on the status of each of the individual search engines used. Some
may be heavily loaded at the time; some may be unreachable. The added features
mentioned above require further resources from the metasearch engines,
resulting in slower response time, a serious problem with many of the metasearch
engines. Many of them, therefore, have a timeout period, so that attempts
to work with a particular search engine can be abandoned if no response
comes from it within a set period of time.
Remember too that a query submitted
to a metasearch engine, with its uniform search interface and syntax, is
to be applied against the diversity of individual search engines. It is
therefore impossible for metasearch engines to take advantage of all the
features of the individual search engines. Boolean searches, for example,
may produce varied results. Phrase searches may not be supported. Other
features, such as query refinement, are sacrificed.
Moreover, metasearch engines generally
do not conduct exhaustive searches: they do not bring back all the pages
from each of the individual search engines. They only make use of the top
10 to 100 hits from each of them. While this is sufficient for most searches,
individual search engines must be consulted if one needs to go beyond the
top hits as determined by the metasearch engines. Some metasearch engines
facilitate this need by providing query links back to the individual search
engines.
The following metasearch engines
are among the major ones currently available.
Ask Jeeves
http://www.askjeeves.com/
Simple syntax; results presented
in pull-down menus; number of matches reported from each search engine;
no integration; no ranking; interesting design; fairly good response time;
limited number of search engines used.
Debriefing
http://www.debriefing.com
A new contender; searches AltaVista,
Yahoo, Infoseek, Excite, Webcrawler , Lycos and Hotbot in the English version;
its French version searches Yahoo France, PagesWeb, Ecila, Infoseek France,
Excite France and Lokace; it supports boolean (+ -) and phrase searches
(" "); collates the results, ranks them and removes duplicates; provides
the most significant domain name for a search; in the advanced search mode,
it allows for searches within a particular site (no need to provide a complete
URL)
Dogpile
http://www.dogpile.com/
Relatively new; searches Web sites,
Usenet, FTP sites and newswires (25 in all); for first time users, start
with "Custom Search" where one can set the order and the number of the
25 search engines so that results from one's favorite sites return first,
and/or exclude certain sites (skip) from the search engine list, a very
handy feature; timeout can be set from ten to 60 seconds; it searches three
sites at a time and if there are enough results (ten hits), the search
will stop, otherwise it will continue on to the next three sites. Ten records
from each of the three sites will be displayed. Further hits from the three
sites can be retrieved with a click, and the next three sites can be searched
with a click as well. Search results are displayed with summaries; number
of hits from each site is reported; Boolean searches are supported; response
time is very good; no integration of results. See also MetaFind below.
Highway 61
http://www.highway61.com/
Searches only Yahoo, Lycos, Webcrawler,
Infoseek and Excite (used to search AltaVista as well); AND and OR searches;
number of hits from each site is reported; results displayed with summaries;
sites coming from most search engines are ranked higher; interesting way
of presenting options: timeout period is presented as "Your patience level."
I like the developer's sense of humour in admitting that "this is not an
exact science" when referring to how many hits a search should return.
Response time leaves much room for improvement.
Internet Sleuth
http://www.isleuth.com/
One of the largest collections of
searchable sites, divided into several major categories: Web search engines
and directories, reviewed sites, news, business and finance, software and
Usenet; very flexible selection of search engines to be included (Hold
Ctrl key to select multiple databases, Shift key to select a range). Maximum
search time can be set between ten seconds and two minutes (used to be
five); no integration of results; display of search results can be customized
to show titles only or titles with summaries; number of results from each
site can range from ten to 100; convenient arrangements for retrieving
more records from individual search engines; response time is moderate.
Mamma
http://www.mamma.com/
Searches the Web, Usenet, news,
stock symbols, company names, MP3 files, pictures and sound; supports optional
phrase searches and searches limited to titles only; optionally shows summaries;
Boolean operators can be used (+ and -). It claims to present results in
a uniform format by relevance and source. A limited number of search engines
is supported: AltaVista, Excite, Infoseek, Lycos, WebCrawler, and Yahoo.
No arrangement for further searches in the individual search engines. Response
time is moderate.
MetaCrawler
http://www.go2net.com/search.html
One of the earliest metasearch engines,
purchased by go2net from University of Washington. It is best to customize
it before using: set default interface (regular, power, or low bandwidth);
select the default Boolean operators to be used (OR, AND, or as a phrase);
may limit results from Web pages from North America, Europe, Asia, Australia,
South America, Africa, Antarctica, or U.S. educational, commercial or government
sites; set timeout period, and number of results from each source; or start
with power search where all the options can be set before searching; results
are displayed with summaries, integrated and ranked; response time is fairly
good; Web search includes only the major search engines: Lycos, Infoseek,
WebCrawler, Excite, AltaVista, and Yahoo. Many other types of databases
have been added recently - computer products, usenet, files, stock quotes.
ProFusion
http://www.profusion.com/
Excellent options in search engine
selection: one can choose the best three, the fastest three, all or any
of the available search engines; Boolean and phrase searches are supported;
searches the Web or Usenet; search results can be displayed with summaries
or without; one can have up to 50 links of search results checked to make
sure they are live. Results are integrated and number of hits from each
search engine is reported; search terms can be saved for future reruns
(This feature seems to have disappeared). Unfortunately, ProFusion tends
to be very slow in response time, but with recent address change, speed
has dramatically improved.
Search.com
http://www.search.com/
Searches Google, Ask Jeeves, LookSmart
and dozens of other leading search engines
Verio Metasearch
http://search.verio.net/
The advanced query interface has
a very powerful scoring feature, allowing one to decide which individual
search engine's results carry more weight than others; maximum delay time
can be arbitrarily set; number of search results can range from ten to
all; returns the most meta-information about a site, including relevance
rank and score, and number of search engines ranking a site in its top
ten hits. Slow response time.
|