From the book Search User Interfaces, published by Cambridge University Press. Copyright © 2009 by Marti A. Hearst.

Ch. 4: Query Specification

In the query specification part of the information access process, the searcher expresses an information need by converting their internalized, abstract concepts into language, and then converting that expression of language into a query format that the search system can make use of. This chapter discusses the mechanisms by which information needs are expressed. The two main dimensions for the query specification process are:

  • [1] The kind of information the searcher supplies. Query specification input spans a spectrum from full natural language sentences, to keywords and key phrases, to syntax-heavy command language-based queries.

  • [2] The interface mechanism the user interacts with to supply this information. These include command line interfaces, graphical entry form-based interfaces, and interfaces for navigating links.

Each is discussed in the sections below.

4.1: Textual Query Specification

Queries over collections of textual information usually take on a textual form (querying against multimedia is discussed in Chapter 12). The next subsections discuss different kinds of textual input for query specification.

4.1.1: Search Over Surrogates vs. Full Text

In older bibliographic search systems, users could only search over metadata that hinted at the underlying contents. To look for a book on a topic in an online library catalog, the searcher was restricted to the text in the title or the few subject labels that the librarian who had catalogued the book had used to describe it (Borgman, 1996b). A book that mentions an interesting idea, but only in a secondary manner, would most likely not be indexed with that term (Cousins, 1992). Using a standard library catalog, a searcher who was interested in discussions of narwhales in literature would not have been able to discover that Moby Dick contains several discussions of these unicorn-like fish. Search over full text gives users direct access to the words used by the author, rather than being restricted to matching against a document surrogate. Full-text search has become the norm for most kinds of search over textual content.

4.1.2: Keyword Queries

With the rise of the Web came the dominance of keywords as the primary query input type. Keyword queries consist of a list of one or more words or phrases -- rather than full natural language statements -- whose intention is to find documents containing those words that are likely to be relevant to the user's information need. Example English keyword queries (from Google Trends and Dogpile SearchSpy, November 1, 2008) include flip cam, fresh chilli paste recipes, and video game addiction. Some keyword queries consist of lists of different words and phrases, which together suggest a topic (e.g., early voting florida). Many others are noun compounds and proper nouns ( thanksgiving wallpaper, jedi mind tricks, sherlock holmes, and daylight savings time change). Less frequently, keyword queries contain syntactic fragments including prepositions ( tots to teens, matron of honor speech) and verbs ( watch pokeman), and in some cases, full syntactic phrases ( history of fish farming in nigeria, construction plans for a trebuchet).

Statistics on query length and composition have been recorded since early in the Web's existence. These measures have shown that the average query length has grown over time, and the percentage of one-word queries has shrunk. Average query lengths were approximately 2.4 in 1997 and 1998 (Silverstein et al., 1999, Jansen et al., 1998). One-word queries occupied 31% of the queries in a study using data from 1997 (Jansen et al., 1998) and 26% from a study using data from 1998 (Silverstein et al., 1999). Jansen and Spink, 2006 found inconclusive evidence for a slight trend toward the shrinking of the percentage of one-word queries. By contrast, another study by Jansen et al., 2005 found the percentage of three word queries increasing from nearly 28% in 1998 to 49% in 2002. More recently, Jansen et al., 2007b conducted a study using 1.5M queries gathered in May 2005, from the Dogpile.com search engine. This study found that the mean length of the queries was 2.8 terms, with only 18.5% as one-word queries. Table 4.1 shows the distribution of query lengths for this dataset.

Length Occurrences Percent
1 281,000 18.5
2 491,000 32.3
3 373,000 24.5
4 194,000 12.7
5 95,000 6.3
6 45,000 3.0
7 22,000 1.5
8 12,000 0.8
9 6,000 0.4

Table 4.1 Frequencies of query lengths (in words), rounded to the nearest thousand. The longest query was of length 25. Data from Jansen et al., 2007b, based on query logs from Dogpile.com on May 6, 2005.

In the early days of the Web, search engines used statistical ranking functions, but these did not work well with the short keyword queries that Web searchers were writing. Furthermore, searchers were not usually interested in high recall for the kinds of information available on the Web. Searchers also found statistical ranking on short queries to be confusing, not understanding why, say, the highest ranked documents might contain only one query term out of three (Lake, 1998).

The meaning of search term conjunction is more easily understood than statistical ranking (Muramatsu and Pratt, 2001). However, when only a small collection of documents is available, a ranking algorithm that automatically requires all terms in a query to be present in the retrieved documents will end up with empty results sets (and very frustrated users) much of the time. With a larger document pool, a query on any combination of three or fewer terms is likely to return a result, even if all the terms are conjoined. Because the number of (English) pages on the Web is so enormous, conjoining query terms is likely to return many results, and because they are more transparent to the user than a weighted term score, conjunction-based queries became the default for Web search.

Conjunction-based queries, combined with giving higher priority to results in which keywords occur close together, displaying of query keywords in context in the document summary (Clarke et al., 2007), and giving higher ranking to pages from higher quality Web sites, combined to make search engine ranking of keyword queries more successful and more understandable than in its early days.

As discussed above, in Web search engines, keyword specification and matching has become the norm. An alternative to keyword queries is to allow users to enter full sentence questions or search statements, and have the system attempt to find relevant information and/or supply answers to questions. Evidence from the literature suggests that asking questions using natural language is a more intuitive input method, and that people who are new to using search engines tend to assume that asking a question is the right way to start. This is not surprising, since asking questions is how we seek information from other people.

In an early study from 1997, Pollock and Hockley, 1997 found that some novices try to enter natural language queries to Web search engines. Studies of the behavior of children who have not yet learned the limitations of search engines show that for many their natural inclination is to simply ask the search system a question. Bilal, 2000 found, in a study of 22 seventh-grade children searching for factual information, 35% attempted to begin with a natural language question. The search engine under study produced empty results sets with this kind of query, and so the children had to unlearn the question-asking approach. In another study by Schacter et al., 1998, 63% of 32 fifth and sixth-grade children used full-sentence queries on Web search engines. In a pre-web study using an electronic encyclopedia, Marchionini, 1989 found that many younger searchers entered full sentence questions into the system. Finally, in the early days of the Web, the search engine AskJeeves (now Ask) purported to allow people to ask natural language questions. Although it never really supported this functionality in a robust and scalable way, many people were attracted to the engine because they thought it would be able to answer questions.

Pollock and Hockley, 1997 also found that, for novice searchers, the notion of iterative searching was unfamiliar. They assumed that if their first attempt failed then either they were incapable of searching or the system did not contain information relevant to their interest. Thus, novice searchers must learn to expect that they must scan a search results list, navigate through Web sites, and read through Web pages to try to find the information they seek. A novice searcher who starts with a question like “For NFL players who have suffered depression after sustaining multiple concussions, which positions did they play?” quickly learns that the search engine does not respond by answering this question. Searchers must learn that search engines work by finding articles that match the words that they typed in, and that they must guess which combinations of words are likely to be found in documents that contain relevant information, e.g., NFL depression concussions.

Aula et al., 2005a note one expert explicitly articulating a strategy to reflect the unnatural nature of keyword querying, stating “I choose search terms based not specifically on the information I want, but rather on how I could imagine someone wording a site that contains that information.” Reflecting on the disconnect between keyword search and what might be a more natural search strategy, a lead engineer for ranking at Google was recently quoted as saying “Search over the last few years has moved from `Give me what I typed' to `Give me what I want.”' (Hansell, 2007).

4.1.3: Automated Question Answering

The most natural use of natural language queries is in the asking of questions. The main reason search engines did not originally support automated question answering is that the technology did not exist to do so in a robust, scalable manner. (There were also early efforts to supply natural language interfaces to database systems, which did not catch on for similar reasons (Androutsopoulos et al., 1995).)

There has recently been an upsurge of research in full-sentence question answering, propelled in large part by an increase in government funding and by a TREC-based question answering task and competition (Voorhees, 1999, Voorhees, 2003). For several years starting in 1999, the track focused on factoid questions: fact-based short-answer questions such as What Spanish explorer discovered the Mississippi? and When did World War I end? Subsequent competitions added the tasks of finding a list of answers to the question, e.g., Who were the members of the Oakland A's starting lineup in 1976?, definition questions such as What is a golden parachute?, and general biographical questions such as Who is Marilyn Monroe? that require systems to combine information from multiple documents, and to synthesize a more general answer. In 2005 the task was made still harder, requiring systems to provide answers for a series of cascading questions. For example, When was Amway founded?, Where is it headquartered?, Name the officials of the company., and What is the name `AMWAY' short for? The corresponding research has resulted in systems that, for factoid-type questions especially, can enjoy great success.

Rather than generating an answer from scratch, question answering systems attempt to link a natural language query to the most pertinent sentence, paragraph, or page of information that has already been written. They differ from standard search engines in that they make use of the structure of the question and of the text from which the answers are drawn. An early example of an automated general-topic question answering system was the Murax system (Kupiec, 1993), which determined from the syntax of a question if the user was asking for a person, place, or date. It then attempted to find sentences within encyclopedia articles that contain noun phrases that appear in the question, since these sentences are likely to contain the answer to the question. For example, given the question “Who was the Pulitzer Prize-winning novelist that ran for mayor of New York City?”, the system extracted the noun phrases “Pulitzer Prize”, “winning novelist”, “mayor”, and “New York City”. It then looked for proper nouns representing people's names (since this is a “who” question) and found, among others, the following sentences:

The Armies of the Night (1968), a personal narrative of the 1967 peace march on the Pentagon, won Mailer the Pulitzer Prize and the National Book Award. In 1969 Mailer ran unsuccessfully as an independent candidate for mayor of New York City.

Thus the two sentences link together the relevant noun phrases and the system hypothesized (correctly) from the title of the article in which the sentences appear that Norman Mailer was the answer.

Another early approach to automated question-answering was the FAQ Finder system (Burke et al., 1997) which matched question-style queries against question-answer pairs on various topics. The system used a standard IR search to find the most likely FAQ (frequently asked questions) files for the question and then matched the terms in the question against the question portion of the question-answer pairs.

Numerous more recent approaches use variations of the Murax approach, adding statistics that utilize the vast collection of the Web to successfully answer many kinds of questions (Dumais et al., 2002, Harabagiu et al., 2000, Ittycheriah et al., 2001, Ramakrishnan and Paranjpe, 2004, Ravichandran and Hovy, 2001). There are a number of startup companies that build on this research and attempt to support natural language querying and question answering using scalable natural parsing and other computional linguistics technologies. For example, the startup Powerset (recently acquired by Microsoft) successfully responds to questions like What did Steve Jobs say about the iPod? with sentences like Steve Jobs has stated that Apple makes little profit from song sales, although Apple uses the store to promote iPod sales. and Steve Jobs has argued that the iPod nano was a necessary risk ... and questions such as origins of the word Halloween? with sentences like The term Halloween is shortened from All Hallows' Even. This works in part by expanding words like “say” to “has argued” and “word” to “term”.

Major Web search engines currently do not support full-sentence queries or question answering, most likely because the technology still is not reliable enough to present in mature products. However, as discussed in Chapter 1, some are becoming more adept at responding to long queries. Natural language queries are also likely to become an important supplement for organizations' web sites and for software products, to more effectively handle customer questions.

An interesting aspect of question answering that differentiates it from standard information retrieval is that the answer of interest contains words that are not in the query. Thus boldfacing the query terms in answer to a question is not the right thing to do -- rather, it is the answer that should be highlighted. A question also arises as to what kind of and how much context to show surrounding the answer. An advantage of question answering is that the user can receive a simple, terse answer, such as 30 feet to the question How deep is it safe to dive without worrying about getting the bends? But a contextless answer from a machine-based source is not necessarily the best interface, as a reasonable user would want the system to justify, explain, or at least show the context supporting the answer. Lin et al., 2003 conducted a usability study with 32 computer science graduate students comparing four types of answer context: exact answer, answer-in-sentence, answer-in-paragraph, and answer-in-document. To remove effects of incorrect answers, they used a system that produced only correct answers, drawn from an online encyclopedia. They found that most participants (53%) preferred paragraph-sized chunks, noting that a sentence did not supply much more information beyond the exact answer, and a full document was oftentimes too long. That said, 23% preferred full documents, 20% preferred sentences, and one participant preferred exact answer. Kaisser et al., 2008 performed a followup-study which showed that for many queries, the ideal answer length and type can be predicted by a person by seeing only the query. They also found that people judging relevance prefer results of different lengths, depending on the type of query. For a fact-based query, phrase or sentence-length results were preferred. For queries seeking advice or general information about a topic, longer answers were preferred.

4.1.4: Paragraph-length Textual Queries

Another form of text-based querying is one in which the user requests that the system find documents that are substantially similar to a very long query or a user-supplied document. This kind of query was much studied in the information retrieval research literature in the 1980s-1990s. The emphasis was on producing retrieval results with high recall, since finding all relevant documents can be quite useful for scholars trying to do thorough literature reviews, legal professionals trying to find all information about a case (Blair and Maron, 1985), and intelligence analysts trying to assess a situation.

This desiderata of high recall was embodied in the TREC competition ad hoc ranking task (TREC is discussed in Chapter 2), where the goal was to retrieve 1000 relevant documents for every query. In the TREC scoring, what mattered was whether or not a document had been judged relevant, not how relevant it was, nor how much new information it brought beyond what had already been seen. (A system would not be penalized if the first k retrieved documents all contained redundant information.) It was found that long (paragraph-length) queries produced higher recall than short queries.

For long queries, it is unlikely that every term will be found in one document within a collection, and so are best served by statistical ranking algorithms which apply empirically determined weights to each query term, and combine them statistically to provide a ranking.

It is unclear how realistic it is to expect people to search for documents that are similar to other documents. (A startup company named DeepDyve is currently promoting search using large paragraphs of text as the query, and research documents as the collection, to improve search as a research tool.) However, text similarity measures are very useful for specialized tasks, such as automatic text categorization (Sebastiani, 2002), matching reviewers to submitted manuscripts (Dumais and Nielsen, 1992), and clustering similar news articles for news aggregation services (Das et al., 2007).

4.1.5: Automated Transformations of Textual Queries

Search engines may manipulate or transform textual queries in several ways. Some systems automatically normalize morphological variants of words, that is, convert a query on dogs to one on (dog OR dogs) or convert building to (build OR building) (the latter may lead to poor results, because the noun form of build is conflated with the verb form derived from building). This practice is sometimes known as stemming. Other transformations include ignoring common function words ( stopwords), and expanding terms with synonyms or related forms (e.g., converting mold to mould).

Search engine ranking algorithms must walk a fine line between making transformations that are hidden but helpful and making the user feel that the system is inscrutable. As mentioned in Chapter 1, Muramatsu and Pratt, 2001 studied users' understanding of the transformations that search engines apply to query terms. Fourteen participants were asked to issue one pre-specified query across two search engines for each of four transformation types: Boolean operators, stopword removal, suffix expansion, and term order variation. Queries and search engines were carefully chosen to exaggerate and highlight the effect of the query transformation, and participants were asked to first state what they thought would happen before seeing the results. In detail, the transformations investigated and the participants' responses to them were:

  • Application of Boolean operators, comparing a search engine that automatically OR'd all terms versus one that AND'd terms. The query was scuba snorkel arooba ; the misspelling of “aruba” was used to force the AND'ed engine to produce an empty results listing. Eight out of 14 participants expected the system to return many results related to both scuba and snorkeling but few if any mentions of Aruba. After seeing the results, 5 participants could not come up with a meaningful explanation of the results; 6 were close to accurately describing the difference, but 3 required the side-by-side comparison to do so. The last 3 participants could explain the behavior for the AND query but not the OR.

  • Stopword removal, comparing a system that removed stopwords versus one that did not, on the query to be or not to be (all stopwords). Before the query was issued, 10 out of 14 expected to see results related to Shakespeare and/or Hamlet. After seeing the empty results for one engine, 3 participants assumed the search engine did not contain literature in its collection or had a different collection, 6 expressed low expectations for good results because of the commonality of the words, but only 3 expressed the notion of the search engine dropping the results. 9 out of 14 were unable to explain the results in a way that remotely resembled removal of stopwords.

  • Suffix term expansion, comparing a search engine that does the expansion versus one that does not (in this case the query was run which would be expanded to running and runner). Participants were surprised at the lack of results pertaining to athletics for the engine that did not do term expansion, and only 3 participants were able to explain the results in a manner related to suffix expansion.

  • Term order sensitivity, comparing a search engine that changes results based on term order and proximity versus one that does not consider either property. In this case, two queries were compared: fire boat and boat fire. Only 5 out of 14 participants expected the results for the two queries to differ, and only 5 participants had a vaguely accurate explanation for the difference, with one person stating that one engine “separates the two words and searches for the meaning” while the other “understands the meaning of the phrase”. In scanning the bolded query term hits in the document surrogates, some participants noticed the difference in proximity for ranking.

These results suggest that stemming is useful, but removing stopwords can be hazardous from a usability perspective. It also suggests that certain features, such as taking order and proximity into account, can be used to improve the results without confusing users despite the fact that they may not be aware of or understand those transformations. Other studies suggest that it is better to show potential transformations to users and let them decide whether or not to use them than to do them explicitly; however, if automatic transformations are unlikely to lead to misconceptions, they can be beneficial.

Another way search systems use query transformations is to show them as suggestions for revising the query. This topic is discussed extensively in Chapter 6.

4.2: Query Specification via Entry Form Interfaces

Today, the standard interface for query specification is an entry form and an activation button. The label on the button ranges from Go to Search to Fetch, and Microsoft's search engine dispenses with the textual label entirely, instead showing only a magnifying glass icon (see Figure 4.1a). (Just prior to completion of this book, Microsoft renamed their search engine Bing and made major changes to its interface.) Execution of the search action usually can also be triggered by striking the keyboard Return key when the entry form is selected. Today, search forms often include drop-down menus that show previously issued queries whose prefix matches what has been typed so far (see an example from Microsoft's search in Figure 4.1b). On Web sites whose content is divided into different categories, some search engines use a combination of a drop-down menu to select a category of content to search within, alongside the entry form. This is sometimes called scoped search, and an example from eBay is shown in Figure 4.2. The default is usually set to search across the entire site or collection.

(a)

(b)

Figure 4.1: Standard query specification forms. (a) Microsoft's Live search does not include a textual label on the activation button. (b) An illustration of the increasingly common interaction mechanism in which a search engine query form provides a drop-down list showing queries that the user has issued in the past that matches the prefix typed so far. From live.search.com. (Microsoft product screen shot reprinted with permission from Microsoft Corporation.)

Figure 4.2: eBay.com's search form with subject-oriented drop-downs for “scoped” search.

Some interfaces allow the searcher to cut-and-paste long swaths of text into the search box, whereas others present short entry forms and limit the number of terms that the system will accept for processing. Researchers have speculated that short query forms lead to short queries. To test this hypothesis, Franzen and Karlgren, 2000 designed two different query entry form boxes. The first showed one empty line and only 18 visible characters (but would accept up to 200 characters) and the second showed 6 lines of 80 characters each, which allowed arbitrarily long queries to be entered. The authors asked 19 linguistics students to use one of the two interfaces and to find relevant documents for three queries. There was a statistically significant difference between the two conditions, with those using the short box using 2.81 words on average, versus 3.43 for those using the larger box.

(a)

(b)

Figure 4.3: Interfaces used in an experiment in which the message for query specification was varied. (a) The sparse version, and (b) the more verbose version, from Belkin et al., 2003.

The wording above the text box can influence the kind of information that searchers type in. Belkin et al., 2003 presented 32 participants with a large query box (5 lines of 40 characters each) and varied the message shown (see Figure 4.3). In one case searchers saw a heading of “Query Terms” above the search box, and in another they saw “Information Problem” above the box, and within the box, the message: “Please describe your information problem in detail: (The more you say, the better the results are likely to be.)” They found on average that participants entered longer queries with the more verbose interface (6.02 words versus 4.19) and performed significantly fewer iterations (2.09 versus 2.64). However, they did not find a relationship between correctness of results and query length.

After many years of small search forms, it has recently become fashionable on many Web sites to make the font in the search box entry form very large and colorful, thus drawing attention to the search facility. It has also become common to put a hint into the search box to indicate what kind of search the user should do (see Figure 4.4, and Figure 1.7).

Figure 4.4: An example of a modern style entry form, with a large font size for the text of the query and a grayed-out hint indicating what kind of information to enter. From powerset.com. (Microsoft product screen shot reprinted with permission from Microsoft Corporation.)

Studies and query logs show that people often confuse the web browser's address bar with a search entry form (Hargittai, 2002). In recent years, Web browsers have implemented support for Web search directly into the address bar, and the Chrome browser has gone so far as to eliminate the distinction between an address bar and the query form altogether. In that browser, everything is assumed to be a query to a Web search engine unless URL syntax is used explicitly.

Figure 4.5: Microsoft desktop search start dialogue box, which requires the user to select a type of information to search over before making a search entry form available. (From Microsoft Windows XP Professional Version 2002. Microsoft product screen shot reprinted with permission from Microsoft Corporation.)

When presenting a query interface, it is important not to force the user to make selections before offering a search box. For instance, the search dialogue box for Microsoft Windows XP forces the user to decide which type of information, in terms of file format, they want to search over before seeing a search box (see Figure 4.5). An entry form should immediately be offered that defaults to searching all file types, with an option to refine. Alternatively, refinement can be offered after the initial query, as shown in Microsoft Researcher's Phlat interface (Cutrell et al., 2006b) (see Figure 8.3 in Chapter 8).

4.3: Dynamic Term Suggestions During Query Specification

Chapter 6 discusses interfaces for suggesting terms to augment the user's query after they have received results. More recently, interfaces have appeared that suggest query terms dynamically, as the user enters them. In some cases, these dynamic term suggestions appear before the searcher has seen any retrieval results, and in others, the system dynamically shows documents that match the characters typed so far, adjusting the results list as more characters are typed. Dynamic query term suggestions (sometimes referred to as auto-suggest, autosuggest, or search-as-you-type) are a promising intermediate solution between requiring the user to think of terms of interest (and how to spell them) and navigating a long list of term suggestions.

Figure 4.6: One view of dynamic query term suggestions. An example from Microsoft in which only the first word in the query is matched against past queries.

Some dynamic term suggestion systems show only query suggestions whose prefix matches what has been typed so far. Figure 4.6 shows an example from Microsoft's dynamic query suggestions interface, which shows frequent queries whose first words contain the prefix that has been typed so far, canc, including cancer, cancun weather, and cancel. Dynamic query suggestions are not restricted to matching the prefix of the query alone. For instance, at eBay, typing in the letter d in the query form shows suggestions such as d oorbusters, d igital cameras}, and d s lite}. Continuing to do shows suggestions like do orbusters, do oney bourke,} and do ll}. Web search engines today provide similar functionality in their toolbars.

The dynamic query suggestion approach falls within guidelines for dynamic queries by Shneiderman, 1994. Although no usability studies have been done for this kind of interface, a large log study by Anick and Kantamneni, 2008 found that, when measuring on four distinct days over a period of 17 weeks and 100,000 users, users clicked on the dynamic suggestions in the Yahoo Search Assist tool in 30--37% of the sessions (see Figure 1.4 in Chapter 1). The rapid spread of this facility suggests that dynamic real-time term suggestions are becoming the norm.

White and Marchionini, 2007 performed a study on a similar interaction method, on what they call real time query expansion (see Figure 4.7). After the user types a word and presses the keyboard space bar, the system queries a Web search engine and extracts terms from the surrogates for the 10 top-ranked documents. The top 10 term suggestions are shown after the first term is typed. The user can select one or more of the suggested terms by double-clicking it, or ignore the suggestions. This process continues with the system suggesting additional terms after each word is entered, until the query is completed (by pressing the Return key). Thus, the idea is similar to dynamic term suggestions, but less interactive, and responsive at the word level only, as opposed to at the character prefix level.

Figure 4.7: Experimental dynamic query suggestion form, from (White and Marchionini, 2007) . New terms are suggested only after the space key is pressed.

White and Marchionini, 2007 compared this approach to a baseline system with no feedback (using Google Web search with identifying information removed) and another version of their system in which term suggestions are shown alongside the search results, after the query is entered (standard term suggestions). The study consisted of 36 students who compared the interfaces in a within-participants design. Using pre-defined queries, the study distinguished between known-item searches and open-ended exploratory searches, hypothesizing that term expansion would be more effective for the latter. When comparing time taken and quality of results, there were no significant differences among the systems, although the numbers trended towards the real time query expansion being more effective. The quality of search results were assessed by two judges, and the precision was found to be higher in the exploratory task for the dynamic term suggestions than for the post-retrieval suggestions, and both were higher than for the baseline. No quality differences were found for the known-item tasks.

Satisfaction scores revealed that participants found the baseline to be more effective and more usable, but found the dynamic suggestions to be more engaging and more enjoyable. Post-study questionnaires suggested that if the response time for the query suggestions had been faster, participants would have found them more useful. Many commented negatively on the delay (1.8 seconds average) between hitting the space bar and seeing the suggestions. Since modern term suggestion interfaces are much more reactive, this suggests that they are most likely found useful. Participants also made positive comments about the post-query suggestions, indicating that they were often helpful when the first query was unsuccessful.

White and Marchionini, 2007 point out the potential danger in showing query term suggestions before retrieval results are seen, as the suggestions can lead the searcher down an erroneous path. They cite as an example the high prevalence of the suggested term ride for the query Who was the first female astronaut in space?. The correct answer is Soviet cosmonaut Valentina Tereshkova, but mention of Sally Ride, the first American woman in space, is frequent in the retrieved document summaries. This, compounded with the fact that the verb ride is a meaningfully related term to space travel, caused some participants to erroneously augment their query with this term. White and Marchionini, 2007 note that if users see search results first, they are less likely to make this kind of mistake.

4.4: Query Specification using Boolean and other Operators

Before the rise of the Web, most commercial full-text systems and most bibliographic systems supported only Boolean and command-based queries. In many systems, users could query only over document surrogates, and not the text of the documents themselves as they were not available in electronic format (Cousins, 1992). When full text was available, as seen in newswire and case law, Boolean queries were used as well, in part because they are more efficient to compute than statistical ranking (Rose, 2006). A typical example, drawn from commands from the Dialog Pocket Guide 2006 (Dialog, Inc, 2006) is shown here:

(PCR OR POLYMERASE(W)CHAIN(W)REACTION? OR DNA(W)SEQUENC?) AND (CANCER? OR PRECANCER? OR NEOPLASM? OR CARCINO?)

The question mark indicates a request for stemming match and the (W) notation requests that the terms be located adjacent to each other and in the order specified. The sequence of terms separated by OR's are to be treated as a disjunction; two long disjunctions are connected by an AND, or conjunction operator. Thus, this is a complex query requesting documents that discuss cancer and DNA sequencing. It is richer than a standard keyword query in that it suggests alternative wordings for each key concept.

Most people who used Boolean command-line systems had extensive training on their use, and often were willing to spend time carefully formulating their queries, especially if they were charged by the query or by the minute. Thus, the users of these systems may not have had problems with the complexity of this syntax. Unfortunately, however, studies have shown time and again that most users have difficulty specifying queries in Boolean format and often misjudge what the results will be (Boyle et al., 1984, Hildreth, 1989, Greene et al., 1990, Michard, 1982, Young and Shneiderman, 1993, Hertzum and Frokjaer, 1996, Dinet et al., 2004).

One problem with Boolean queries is that their strict interpretation tends to yield result sets that are either too large, because the user includes many terms in a disjunct, or are empty, because the user conjoins terms in an effort to reduce the result set. This problem occurs in large part because the user does not know the contents of the collection or the role of terms within the collection.

Boolean queries are also problematic because most people find the basic semantics counter-intuitive. Many English-speaking users assume everyday meanings are associated with Boolean operators when expressed using the English words AND and OR, rather than their logical equivalents. To inexperienced users, using AND implies the widening of the scope of the query, because more kinds of information are requested in such a query. For instance, in English, the phrase “dogs and cats” may imply a request for documents about dogs and documents about cats, rather than documents about both topics at once. Similarly, “tea or coffee” usually implies a mutually exclusive choice in everyday language, as opposed to the union of the concepts as dictated by Boolean semantics. In addition, most query languages that incorporate Boolean operators also require the user to specify rigid syntax for other kinds of connectors and for descriptive metadata. Most users are not familiar with the use of parentheses for nested evaluation, nor with the notions associated with operator precedence.

Another problem with pure Boolean systems is they do not rank the retrieved documents according to similarity to the query. In the pure Boolean framework a document either satisfies the query or it does not. Commercial systems with Boolean queries usually order documents according to some kind of descriptive metadata, usually reverse chronological order. (Since these systems usually index time-sensitive data such as news wires, date of publication is often one of the most salient features of the document. To some degree, this is still true for specialized search, such as blog search, where ordering by currency or by popularity is a common ranking mechanism.)

Despite the generally poor usability of Boolean operators, most search engines support notation using AND, OR, and NOT, as well as characters for wildcarding, stemming, and range specification for dates. For example, in the Dialog system (Dialog, Inc, 2006) the operator ? appended to a sequence of characters allows for matching of all words that begin with those characters, e.g., biome? would match biometric, biometrics, biomedical, etc. In the AOL web search engine, ? is a wildcard for one character only. The Google search engine allows a wildcard character operator *, which allows for one or more unspecified words to appear between the two specified words; multiple stars allow for multiple unspecified words. For example, “president * lincoln” will match pages that contain the former president's full name. Google also supports number ranges; the query abraham lincoln 1860..1863 brings up hits that mention the former president along with one or more years in that range.

4.4.1: Term Proximity in Boolean Queries

In general, proximity information can be quite effective at improving precision of searches (Hearst, 1996, Clarke et al., 1996, Tao and Zhai, 2007). The most commonly used operator on the Web is the double-quote operator "", used to surround adjacent words, as in "San Francisco", which signifies that the enclosed terms must be found directly adjacent in the retrieved text. The disadvantage of exact match of phrases is that it is often the case (in English) that one or a few words fall between the terms of interest (as in big black dog when searching on big dog). Another consideration is whether or not stemming is performed on the terms included in the phrase. The AOL adjacency operator is similar to double quotes except that it allows for morphological variants, and can be combined with other operators, e.g., (abraham OR abe) ADJ lincoln will find matches for both abe lincoln and abraham lincoln. The AOL search engine also supports proximity queries in the form of a NEAR operator, e.g., dogs NEAR/3 cats means find the word dogs within 3 words of cats, in either order. In some cases the best solution is to allow users to specify exact phrases but treat them as if they indicated small proximity ranges, with perhaps an exponential fall-off in weight according to the distance between the terms. This has been shown to be a successful strategy in non-interactive ranking algorithms (Clarke et al., 1996). Some web search engines provide behavior related to this in select circumstances. For example, Google often ignores middle initials when doing matches against people's names, even when those names are enclosed in double quotes.

4.4.2: Post-Coordinate and Faceted Boolean Queries

One technique for imposing an ordering on the results of Boolean queries is post-coordinate or quorum-level ranking [Ch. 8]{salton89}. In this approach, documents are ranked according to the size of the subset of the query terms they contain. So given a query consisting of cats dogs fish mice, the system would rank a document with at least one instance of cats, dogs, or fish higher than a document containing 30 occurrences of cats but no occurrences of the other terms. eBay introduced an interesting way of expressing search results according to coordinate or quorum-level ranking. When no results are found for a query such as gap ultra low rise 8r, the searcher is shown a view indicating how many results would be brought back if only k out of n terms were included in the query, as illustrated in Figure 4.8.

Figure 4.8: eBay quorum ranking suggestions, indicating the number of hits that would be returned if each of the terms shown in strikeout font were removed from the query.

Another approach to improving results with Boolean queries is to have the searcher break their query up into different facets, that is, different topics, and specify each facet with a set of terms combined into a disjunction. The entire query is then combined into one conjunction, in effect indicating that at least one term from each concept should be present in the retrieved documents (Meadow et al., 1989a). Combining faceted queries with quorum ranking yields a situation intermediate between full Boolean syntax and free-form natural language queries.

An interface for specifying this kind of interaction can consist of a list of entry lines. The user enters one topic per entry line, where each topic consists of a list of semantically related terms that are combined in a disjunct. (This sort of query works naturally with the interface shown in Figure 4.9.) Documents that contain at least one term from each facet are ranked higher than documents containing terms only from one or a few facets. Hearst, 1996 showed that when at least one representative term from each facet is required to be in close proximity with one another, the resulting precision is very high. However, this kind of querying specification is not commonly used in practice. Instead, facets are now widely used in navigation interfaces for collections (see Chapter 8).

4.4.3: Web-based Improvements to Boolean Query Specification

In command-line based interfaces, Boolean operators and their operands are typically typed on one line, and combined using parentheses (see Figure 4.10a). Although Web search engines support use of operators directly in the query box, they usually also supply a graphical form-based method for using filters and restrictions. Figure 4.9 from the Education Resources Information Center (ERIC) shows an example.

By serving a massive audience possessing little query-specification experience, the designers of Web search engines devised what were intended to be more intuitive approaches to Boolean query specification. Early versions of Web search interfaces used drop-down menus and Web forms that allowed users to choose from a selection of simple ways of combining query terms, including “all the words” (place all terms in a conjunction) and “any of the words” (place all terms in a disjunction). These kinds of options are still available in the “Advanced Search” option of many Web search engines (see Figure 4.11).

Another early web-based solution was to allow syntactically based query specification, but to provide a simpler or more intuitive syntax. In the mid-1990s, the AltaVista Web search engine introduced the mandatory operator, meaning that the results retrieved must include the query term, indicated with a + as a prefix before the required word. At that time, Web search results were ranked using statistical algorithms, and so the top results for a three-keyword query might not contain all of the query terms (if, for example, the algorithm chose to rank a document with 100 occurrences of the first keyword ahead of a document with one occurrence of all three keywords). The mandatory operator gave users more control over how the search engine treated their keywords. Unfortunately, users sometimes mistakenly thought the + acted as an infix AND rather than a prefix mandatory operator, thus assuming that cat + dog will only retrieve articles containing both terms (where in fact this query requires dog but allows cat to be optional). The need for this operator became obviated after search engines began returning only those documents that contain every keyword.

Figure 4.9: An example of a Web forms-based interface from the Education Resources Information Center (ERIC)h search system for specifying complex combinations of Boolean operators over keywords and other fields.

Researchers have developed many clever interface designs to improve Boolean query specification via visual interfaces. These attempts at visualizing Boolean queries are discussed in Chapter 10, but have not been widely adopted in practice. The one exception is for database query specification, an area in which graphical command building applications have become popular. However, searching over structured information is inherently different than searching over unstructured text.

4.4.4: Query Operator Usage Statistics

As discussed above, the major search engines offer a number of operators that can be applied to queries to make them more focused and exact. However, Web log statistics show that only a small fraction of queries take advantage of these operators. It is difficult to obtain accurate statistics from current Web search engines, so below are reported a set of statistics that have been gleaned by researchers using the few available open Web log resources. The actual numbers most likely fall somewhere in the middle of the reported ranges.

A meta-analysis by Jansen and Spink, 2006, which analyzed seven query log studies between 1997 and 2002, found that the percentage of search operator use has remained steady over time. Spink et al., 2002 showed a Boolean usage rate of about 10% for Excite users. Jansen et al., 2005 reported a Boolean usage of about 6% and a usage of other query operators at approximately 15%. The most recent study by Jansen et al., 2007b, conducted over 1.5M queries, found that 2.1% contained Boolean operators and 7.6% contained other query syntax, primarily double-quotation marks for phrases. White and Morris, 2007 studied interaction logs of nearly 600,000 users issuing millions of queries over a period of 13 weeks in 2006. They found that 1.1% of the queries contained at least one of the four main Web operators (double quotes, +, -, and site:) and 8.7% of the users used an operator at least one time.

Hargittai, 2004's study showed that even among the small fraction of users who did attempt to use operators, several completely misunderstood their meaning. None of the participants used the negation operator -, although at least one put a space before the hyphen in what was intended to be a hyphenated term: lactose intolerant -recipes, thus ensuring that no relevant results were returned. 16% of the participants used double quotation marks, but a few of them used them incorrectly or superfluously. One participant put quotation marks around almost every individual word, and another, apparently having misunderstood some advice from a friend on how to improve her results, put them around all terms in her query, thus often yielding empty result sets.

It may be the case that operators are only used by advanced searchers. For the White and Morris, 2007 study described above, they partitioned users into experts and non-experts based on whether or not they had used operators in their queries. They found significant differences in search behaviors between these two groups; the experts had different behavior when navigating search results, and they were more successful at eliciting and clicking on relevant documents. Aula and Siirtola, 2005 also found evidence that expert searchers were those more likely to use operators, based on questionnaire data filled out by 236 experienced Web users. Those with information seeking professions were more likely to say they use Boolean and other operators.

There is evidence that Web search engine ranking algorithms have been successful at compensating for the fact that their users do not use query operators. Eastman and Jansen, 2003 studied the effects of removing search operators from 100 queries selected from query logs and compared the resulting documents to those retrieved by the advanced query. They found the use of most query operators had no significant effect on coverage, relative precision, or ranking, although the impact did vary depending on the search engine.

4.5: Query Specification Using Command Languages

Most systems that support Boolean logic allow it to be embedded within a command language, meaning a syntax that usually includes commands (sometimes referred to as “verbs”) followed by arguments. For instance, in the old University of California Melvyl system (Lynch, 1992), the syntax would look like:

COMMAND ATTRIBUTE value {BOOLEAN-OPERATOR ATTRIBUTE value}

e.g.,

FIND PA darwin AND TW species OR TW descent

or

FIND TW Mt St. Helens AND DATE 1981

The user must remember the commands and attribute names, which are easily forgotten between usages of the system (Meadow et al., 1989b). Compounding this problem, despite the fact that the command languages for the two main online bibliographic systems at UC Berkeley had different but very similar syntaxes, after more than 10 years one of the systems still reported an error if the author field was specified as PA instead of PN, as is done in the other system. This lack of flexibility in the syntax was characteristic of older interfaces designed to suit the system rather than its users.

The functionality of command languages allows selection of collections to search over, resources to use (including thesauri), fields to search over, and format of results display. Some of these languages rival database query languages in their expressiveness. However, they are also complex and difficult to use, requiring the user to remember and accurately type cryptic commands. Most older search systems required special training and expertise, and many users relied on skilled intermediaries such as corporate or university librarians to issue queries.

(a)

(b)

Figure 4.10: Web-based interfaces for the Dialog command-based search system, originating in 1998. (a) Query specification form, including search history, from Dialog, Inc, 2002a. (b) Source and field selection form, from Dialog, Inc, 2002b. Images published with permission of Dialog LLC. Further reproduction prohibited without permission.

A common strategy for dealing with this problem, employed in early systems with command-line based interfaces, and in web-based library catalogs, was for the system to encourage the user to create a series of short queries, show the number of documents returned for each, and allow the user to combine those queries that produce a reasonable number of results (Dialog, Inc, 2002a, Dialog, Inc, 2002b). For example, in the early Dialog system, each query produced a resulting set of documents that was assigned an identifying name. Rather than returning a list of titles themselves, Dialog showed the result set number with a listing of the number of matched documents. Titles could be shown by specifying the set number and issuing a command to show the titles. Document sets that were not empty could be referred to by a set name and combined with AND operations to produce new sets. If this set in turn was too small, the user could back up and try a different combination of sets, and this process was repeated in pursuit of producing a reasonably sized document set.

To use these older systems to their fullest, users also needed to know the names of the different fields to query against, and the names of the sources they wanted to search over. The following more complex example, also derived from the Dialog Pocket Guide 2006 (Dialog, Inc, 2006) illustrates this:

?s (biometric? and security)/TI,LP 1621 BIOMETRIC?/TI,LP 40268 SECURITY/TI,LP

S1 678 S (biometric? and security)/TI,LP

?s S1 and CS=(HARVARD AND MEDIC?)

S2 52 S1 and CS=(HARVARD AND MEDIC?)

?T S2/3/1-10

The first line begins with s, a shorthand for the command select. The question mark indicates a request for stemming match and the /TI,LP notation indicates that the search should be done over titles and lead paragraph fields only. The number of results found for each part of the query are shown on the following three lines, along with an automatically named results set S1. The next command restricts the results to only those whose corporate source contains the terms HARVARD and a name that begins with MEDIC. The next command requests the system to show the first 10 results from the preceding results set, in format number 3.

Needless to say, non-specialist users are not able to specify queries of this complexity. In the late 1990s, systems like Dialog and online library catalogs upgraded to web-based forms interfaces, thus lessening the memory burden. Figure 4.10 shows versions of the query specification form and results modification forms, which still bear a great deal of resemblance to their TTY-based counterparts.

Collections
book Search full text of books
define: Show a definition for the given word
phonebook: Show (residential) phone book listings
movie: Find reviews and showtimes for films
stocks: Show stock information for a given ticker symbol
weather Show weather forecasts for a given location
Web search
site: Search within the specified Web site or domain
allinachor: Search within anchor text of links to page
intitle: Search within page titles
source: Search within the given source in the news collection

Table 4.2 Sample Google commands. Each is meant to be followed by an argument.

Figure 4.11: Google Web search engine advanced search form that supports search commands, from July 2007.

Search engines typically supply a forms-based interface to aid with command specification, in addition to supporting free typing of the commands within the main entry form. Figure 4.11 shows an example of such a form.

Interestingly, Web search engines have recently been building command languages into their query specification abilities. These are not required for use but can provide a savvy searcher with considerable power. These allow both for traditional uses of search commands, such as restricting the query to search over a particular field, as well as for more innovative commands that map to very specific information needs. The Google search engine has a wide range of these (see Table 4.2), including commands such as SWA 49 (what is the status of Southwest Airlines flight 49?) and stocks:NOK (what is the current stock performance of Nokia?).

These commands are associated with what is known as search shortcuts : specialized searches for commonly sought types of information such as weather, stock quotes, and airplane flight status information. The results of these shortcuts are also shown, when appropriate, in some search results listings. For example, maps and weather may be shown in response to a search on a city's name. Google also allows users to enter calculations (e.g., 298 * 23.4/23) and unit conversions (e.g., 30 dollars in euros) directly into the search box.

Google's mobile search takes the command language notion still farther (see Chapter 12). The problem with command languages like these, of course, is that the user must remember a command correctly in order to use it, although a current research trend is to support ever-more flexible variations on command languages.

4.6: Conclusions

The query is the bridge between the user's current understanding of their information need and the information access system. Query specification today is primarily accomplished by the user typing keyword queries into an entry form or following hyperlinks at a known Web site (the use of links as a form of query specification is discussed in detail in Chapter 8).

This chapter discussed two main aspects of query specification: the kind of information supplied by the searcher and the style of interface used for expressing that information. Query specification was described as an activity in which the user specifies:

  • Keyword queries,
  • Natural language queries, including queries as questions,
  • Paragraphs of text,
  • Queries containing Boolean operators, and
  • Queries with command-based syntax.

The chapter discussed how textual queries are modified by the underlying search system, including stemming and stopword removal. The chapter also described graphical entry forms for query specification, and discussed the increasing importance and popularity of dynamic term suggestions that appear as the user types their query. Other query modifications such as spelling suggestions and term expansions are discussed in detail in Chapter 6.

The decline of syntax-driven command languages (except for use by specialists) provides an instructive example of how interfaces that were originally designed to be easy to implement in a computer are being replaced by interfaces that are intuitive for users. It is impressive that today people respond in surveys saying that Web search engines are easy to use; for the first few years of their existence, people complained that their behavior was confusing. Not only did the algorithms change, but also the assumptions behind them changed to have a focus on intuitiveness for the everyday user.

In future it is likely that query specification by spoken language will become increasingly popular; this is discussed in the section on mobile search in Chapter 12. It is also likely that long queries, including queries structured as questions, will continue to increase in usage as the algorithms responding to them improve.

<< Previous:  (Ch. 3)
(Ch. 5) : Next >>
Top (Ch. 4)