Project Callisto Search Quality Review Module 4 - Version 3

Page Quality and Needs Met
Lesson 1 of 10
Section 14: The Relationship between Page Quality and Needs Met
 The Needs Met rating is based on both the query and the result. You must carefully think
about the query and user intent when assigning a Needs Met rating.
 The Page Quality rating slider does not depend on the query. Do not think about the query
when assigning a Page Quality rating to the LP.
 Some results don’t have a Page Quality slider. If a result block has no Page Quality rating
slider, you do not have to give a Page Quality rating. If there is a Page Quality slider, please
assign a Page Quality rating based on the landing page.
 Useless results should always be rated FailsM, even if the landing page has a high Page
Quality rating.
 On-topic, helpful, but low Page Quality results should get lower Needs Met ratings than
on-topic, helpful, and high Page Quality results. The Needs Met scale encompasses all
aspects of "helpfulness", and many users find low Page Quality results less helpful than
high Page Quality results.
 The HM rating should be given to helpful, high Page Quality pages that are a good fit for
the query. The HM rating may also be used for results that are very helpful, medium
quality, and have other very desirable characteristics, such as very recent information.
 The HM rating may not be appropriate if a page has low Page Quality or has any other
undesirable characteristics, such as outdated or inaccurate information, or if it is a poor fit
for the query. Have very high standards when rating the HM rating.
 SM is often an appropriate rating for low quality but on-topic pages. However, a page can
have such low Page Quality that it is useless for nearly all queries. Gibberish pages are a
good example of pages with low Page Quality that should be rated FailsM. An exception to
this is queries with clear website intent, where the target website should be rated FullyM
even if the page has low Page Quality.
 Remember that if a page lacks a beneficial purpose, it should always be rated Lowest Page
Quality - regardless of the page’s Needs Met rating or how well-designed the page may
be. Please review Section 4.1 for a summary of other types of Lowest Page Quality pages.
Examples of Page Quality and Needs Met
Porn, Foreign Language, and Did Not Load Results

Lesson 2 of 10
Section 15: Rating Porn, Foreign Language, and Did Not Load Results
You will be asked to assign Porn, Foreign Language, and Did Not Load flags to result blocks when
appropriate. Some rating tasks may also ask you to identify Upsetting-Offensive and/or Not-for-
Everyone results. All flags should be assigned based on the result alone and do not depend on
the query.
The rating flags will look like this:
Click on the flag name to select it. The flag will turn red and change the "No" to "Yes. For example,
this result block shows when the Foreign Language flag should be used.
Rating With The Porn Flag
The Porn flag should be assigned to all porn pages, regardless of the query or user intent. It
indicates pornographic content, including images, links, text, pop-ups, and ads. The flag is not
based on the query's intent.
 If user intent is not porn-seeking, landing pages with porn content should be rated Fails
to Meet. Uninvited porn can be unhelpful or useless. Non-porn intent queries include girls,
wives, mature women, gay people, kissing, and cheerleaders.
 Some queries have both non-porn and porn interpretations. For example, the following
English (US) queries have both a non-porn and an erotic or porn interpretation: [breast],
[sex]. We will call these queries "possible porn intent" queries.
 For "possible porn intent" queries, please rate as if the non-porn interpretation were
dominant, even though some or many users may be looking for porn. For example,
please rate the English (US) query [breast] assuming a dominant health or anatomy
information intent.
 To rate porn landing pages for clear queries, assign a high Needs Met rating based on
user helpfulness. Even with porn intent, assign a Porn flag. Pages with poor user
experience, like downloading malicious software, should receive low ratings.
 Please report images containing child pornography (URLs only). See section 15.3, of the
guidelines, for a definition of child pornography.
Needs Met Rating for Clear Porn Intent Query Examples
For very clear porn intent queries, assign a rating to the porn landing page based on how helpful it
is for the user. Even though there is porn intent, the page should still be assigned a Porn flag.
 Do not simply rate all porn pages for porn queries as MM or HM. Even though the query is
porn and the result is porn, the page must fit the query and be helpful to get a high Needs
Met rating.
 Pages that provide a poor user experience, such as pages that try to download malicious
software, should also receive low ratings, even if they have some images appropriate for
the query.
Needs Met Rating for Possible Porn Intent Queries
Some queries have both non-porn and porn interpretations. For example, the following English
(US) queries have both a non-porn and an erotic or porn interpretation: [breast], [sex]. We will call
these queries "possible porn intent" queries.
 For "possible porn intent" queries, please rate as if the non-porn interpretation were
dominant, even though some or many users may be looking for porn. For example, please
rate the English (US) query [breast] assuming a dominant health or anatomy information
intent.
Needs Met Rating for Clear Non-Porn Intent Queries
If the user intent is clearly not porn-seeking, a landing page that has porn for its MC should be
rated Fails to Meet.
 When the user intent is clearly not porn, a porn result should be considered unhelpful or
useless. Uninvited porn is a very bad experience for many users.
Foreign Language Flag
The Foreign Language Flag should be assigned based on the landing page's language rather than
the result block's appearance. It is important to flag all foreign pages with the flag, even if most
users in your locale do not expect a foreign language page.
Do not assign the Foreign Language flag when the language on the landing page is one of the
following:
 The task language. Example: The language you are working in is French and you see a
landing page in French. You would not flag the landing page Foreign Language.
 A language that is commonly used by a significant percentage of the population in the task
location.
 English. Even if the language you are working with is different, if the landing page appears
in English, you do not flag the landing page as Foreign Language.
 Please assign the Foreign Language flag even if you personally understand the language,
but most users in your locale do not.
 Please assign the Foreign Language flag based on the language of the landing page, not
the appearance of the result block.
 Please remember to flag all foreign pages with the Foreign Language flag, even if most
users in your locale would expect or want a foreign language page for the query.
 Sometimes it is difficult to determine what language the landing page is in. The LP may
have multiple languages or no words at all. In these cases, try to represent users in your
locale. Does it feel like a foreign language page? When in doubt, don’t use the Foreign
Language flag.
Needs Met Rating for Foreign Language Results
You must assign a Needs Met rating for all result blocks in your task, even if the result blocks have a
foreign language landing page.
To assign a Needs Met rating for all result blocks in a task, even if they have a foreign
language landing page, flag them as FailsM as they are usually useless. If users can read the
language, don't use the Foreign Language flag. If the query indicates most users would expect or
want a Foreign language result, assign the FullyM rating.
Note: If you are unable to evaluate the Page Quality rating of a Foreign Language result, you do not
need to assign a Page Quality rating and can leave the slider at N/A.
Foreign Language Examples

Videos are often an example where foreign language pages are helpful and desired. Think about
user intent and what pages are good for users. If the query "asks" for a foreign language song,
band, film, sporting event, etc., then a video of the song, band, film, sporting event, etc. is helpful
since it can probably be understood or enjoyed even though it is in a foreign language. For these
types of queries, foreign language results are often expected.
Did Not Load Flag
Did Not Load is used to indicate technical problems with the webpage that prevent users from
viewing any Landing Page content.
Use the Did Not Load flag when:
 The Main Content (MC) of the landing page is a web server or web application error
message and there is no other content on the page: no navigation links, no home link, no
Supplementary Content (SC), and no Ads. See here(opens in a new tab) for a Wikipedia
 The landing page is completely blank: no MC, no SC, and no Ads.
 Assign the Did Not Load flag based on the landing page, not the result block.
Do Not Use the Did Not Load flag when:
 Malware warnings, such as “Warning - visiting this site may harm your computer!”.
 Pages that have removed or expired MC (e.g., expired classified listing, removed social
media post, products or services unavailable).
 Pages that are inaccessible because you need a subscription to view the MC.
Example of a Did Not Load landing page. You cannot tell that the landing page doesn’t load by
looking at the result block.
Needs Met and the Did Not Load Flag
To rate a result block, assign a Needs Met rating based on its usefulness for the query. If the
landing page doesn't load, assign the Did Not Load flag and rate the page FailsM. If the page has
an error message, assign a Page Quality rating.
 Sometimes the page partially loads or has an error message. Give Needs Met ratings based
on how helpful the result is for the query. Error messages can be customized by the
website owner and are part of a well-functioning website. Sometimes these pages are
helpful for the query.
 If you are unable to evaluate the Page Quality rating of a Did Not Load result, you do not
need to assign a Page Quality rating and can leave the slider at N/A.
 Rating tasks may require identifying Upsetting-Offensive and Not-for-

Everyone results. Upsetting-Offensive content may offend users of all ages, genders, races,
religions, and political affiliations, while Not-for-Everyone content may be uncomfortable
or inappropriate for certain settings.
o Mark content that may be unpleasant or uncomfortable for some people in your
locale (e.g., content that may not be appropriate in a public space, professional
environment, or school) as Not-for-Everyone.
Needs Met and Did Not Load Flag Examples

Queries with Multiple Interpretations and Intents
Lesson 3 of 10
Section 16: Rating Queries with Multiple Interpretations and Intents
Some queries really only have one meaning. Consider the query [iphone], English (US). There may
be different user intents for this query (research iPhones, buy an iPhone, go to the iPhone page on
Apple’s website), but all users are basically referring to the same thing: the phone made by Apple,
Inc.
Some queries truly have different possible meanings. Consider the query [apple], English (US).
Some users may want to find more information on the computer brand or the fruit. We refer to
these different meanings as query interpretations.
When giving Needs Met ratings for results involving different query interpretations, think about
how likely the query interpretation is and how helpful the result is:
Dominant Interpretation
A very helpful result for a dominant interpretation should be rated Highly Meets , because it is very
helpful for many or most users. Some queries with a dominant interpretation have a FullyM result.
Common Interpretation
A very helpful result for a common interpretation may be Highly Meets or Moderately Meets ,
depending on how likely the interpretation is.
Minor Interpretation
A very helpful result for a very minor interpretation may be Slightly Meets or lower because few
users may be interested in that interpretation.
No Chance Interpretations
There are some interpretations that are so unlikely that results should be rated FailsM . We call
these "No Chance" Interpretations.
Rating Queries with Both Website and Visit-in-Person Intent
Some queries have two possible strong intents:
1. Go to the website intent: in order to, for example, find out information, buy something
online, make a reservation, schedule an appointment, interact with customer support, or
fulfill some other need that can be satisfied online.
2. Visit-in-person intent: user wants to visit the store, business, etc. in person
We know the user intent is to accomplish one or the other, but it is unclear which one the user
wants. For these queries, result blocks that only satisfy one intent should NOT get a Fully Meets
rating.
Both Website and Visit-in-Person Intent Examples
Specificity of Queries and Landing Pages
Lesson 4 of 10
Section 17: Specificity of Queries and Landing Pages
Some queries are general, some are specific:
General Specific Even More Specific
[library] [harvard library] [harvard anthropology library]
[takeout chinese restaurants in downtown

[restaurants] [chinese restaurants]
Austin]
[practice interview questions used for

[interview questions] [interview questions for teachers]
teach for america]
 Always rate how helpful the result is, not specificity for it
 For broad categories (restaurants) popular (prominent) examples are considered helpful
o When the query is a broad category, such as [cafes] [restaurants] [hotels] [books]
[tourist attractions in Paris] etc., popular and prominent examples may be
considered very helpful
 It may be difficult to rate general queries
 Please do web research to help you understand what is popular and prominent in different
locations
Results for specific queries are easier to rate on the Needs Met scale because we know more about
what the user is looking for. Giving a Needs Met rating for results for general queries can be
difficult. As always, your rating is based on how helpful the result is for the query, not the
specificity fit.
Specificity of Query Examples
Needs Met Rating and Freshness
Lesson 5 of 10
Section 18: Needs Met Rating and Freshness
Some queries demand very recent or "fresh" information. Users may be looking for "breaking
news" such as an important event or natural disaster happening right now. Here are different types
of queries demanding current/recent results.
Query: "Breaking News" Stories
Examples: [tornado], [tsunami]
Assume users need the information right away. Imagine someone who needs immediate weather
information because a big storm is coming. Information about last year's weather would not be
helpful.
Queries: Recurring event queries, such as elections, sports events, TV shows, conferences, etc.
Examples: [olympics], [american idol], [redsox schedule], [tax forms], [elections]
Assume users are looking for the most recent or current information about the event. For example,
if the Olympics are happening right now, users searching [olympics] want information about the
current Olympics, not results from years ago. If the next Olympics are a few months away, users
are probably interested in the upcoming Olympics.
Query: Current information queries
Examples: [population of paris], [amount of u.s. debt], [airfare from ny to sfo], [next federal
holiday]
Assume users are looking for the most current information, such as prices or airfare.
 When a query demands recent content, only pages with current, recent, or updated
content should get high Needs Met ratings. For these queries, pages about past events,
old product models and prices, outdated information, etc. are not helpful. They should
be considered "stale (obsolete)" and given low Needs Met ratings. In some cases, stale
results are useless and should be rated FailsM.
 For some queries, there may be "newsy" or recent information user intent, as well as more
"timeless" information user intent. Users issuing queries for celebrities or politicians may
be interested in biographical information, or users may be looking for the latest news or
gossip.
 Freshness is generally less of a concern for Page Quality rating.
o "Stale" pages can have high Page Quality ratings. For example, some highly
reputable news websites maintain "archival" content.
o Unmaintained/abandoned "old" websites or unmaintained and
inaccurate/misleading content is a reason for a low Page Quality rating.
Note : The date the page was created may be different from when the content was last updated
or modified. When content is updated, the page will sometimes show the date of the update, not
the date the page was created.
Additional Flags in Some Rating Tasks
 Mark content that may be upsetting or offensive from the perspective of a typical user in
your locale as Upsetting-Offensive, keeping in mind that people of all ages, genders, races,
religions, and political affiliations use the Internet to understand the world and other
points of views.
 Mark content that may be unpleasant or uncomfortable for some people in your locale
(e.g., content that may not be appropriate in a public space, professional environment, or
school) as Not-for-Everyone.
Misspelled and Mistyped Queries and Results

Lesson 6 of 10
Section 19: Misspelled and Mistyped Queries
You will notice that some queries are misspelled or mistyped. Here are some examples of queries
that are obviously misspelled:
Misspelled Query Query Interpretation
The only reasonable query interpretation is the company named

[federal expres], English (US)
Federal Express.
[new england patroits], English US) The only reasonable interpretation is the NFL football team.
The only reasonable interpretation is the famous singer/actress,

[byonce knowles], English (US)
Beyonce Knowles.
 Some misspelled or mistyped queries are more difficult to interpret. Use your judgment
and do query research
 For obviously misspelled or mistyped queries, you should base your rating on user intent,
not necessarily on exactly how the query has been spelled or typed by the user
 For queries that are not obviously misspelled or mistyped, you should respect the query
as written, and assume users are looking for results for the query as it is spelled
Name Queries
Consider the query [john stuart], English (US). There is a very famous Jon Stewart, the comedian
and former host of a popular U.S. television show. However, we should not assume that the query
[john stuart] has been misspelled. There are many people named John Stuart. We will respect the
query as written and assume the user is looking for someone named "John Stuart".
Non-Fully Meets Results for URL Queries & Product Queries
Lesson 7 of 10
Section 20: Non-Fully Meets Results for URL Queries
Raters sometimes ask the question, "For a well-formed working URL query, are the only acceptable
Needs Met ratings for a result either Fully Meets or Fails to Meet?" The answer is no. There can
be other helpful results for URL queries.
Issuing a URL query to find information about a website, such as reviews or recent news, can
provide results that give reviews and reputation information that is helpful for a URL query. We
recommended this to you as one method of reputation research in the PQ guidelines.
Example:
Query: ‘potterybarn.com’
Section 21: Product Queries: Importance of Browsing and Researching
Some product queries, such as [ipad reviews], have a clear information-seeking (Know) intent.
Other product queries, such as [buy ipad], have a clear purchase (Do) intent. And some product
queries, such as [ipad store.apple.com], have a clear navigation (Website) intent. However, most
product queries don’t obviously specify one type of intent.
Users may not always plan to buy products online that they are browsing and researching, for
example, cars or major appliances. Even though the ultimate goal may be to purchase a product,
many other activities may take place first: researching the product (reviews, technical
specifications), understanding the options that are available (brands, models, pricing), viewing and
considering various options (browsing), etc.
Note - Page Quality ratings for product results need extra care and attention.
Product Queries Examples
Rating Visit-in-Person Intent Queries

Lesson 8 of 10
Section 22: Rating Visit-in-Person Intent Queries
When there is a user location for a visit-in-person intent query and a location has not been
specified in the query itself, such as [chinese restaurants] with a user location of Boston, MA ,
results in or near the user location are the most helpful. However, how close is "near"?
The type of business and/or entity should be taken into consideration when deciding if the
distance of the visit-in-person result is too far. For example, most people are not willing to travel
very far for a gas station, coffee shop, supermarket, etc. Those are types of businesses that most
users expect to find nearby.
 Users might be willing to travel a little farther for certain kinds of visit-in-person results:
doctors’ offices, libraries, specific types of restaurants, public facilities like swimming pools,
hiking trails in open spaces, etc. Sometimes users may accept results that are even farther
away, such as a very specialized medical clinic.
 When we say users are looking for results "nearby", "nearby" can mean different distances
for different queries.
Always use your best judgement for these query cases!

Examples Where User Location Does (and Does Not) Matter
The user location may not always change our understanding of the query and user intent.
Rating English Language Results in Non-English Locales
Lesson 9 of 10
Section 23: Rating English Language Results in Non-English Locales
The following rating guidance is for raters in non-English locales (such as English (US), English
(IN), English (CA))
Your Needs Met ratings should reflect how helpful the result is for users in your locale.
 Ratings can be more difficult when the query includes English names, words, etc., or when
it's unclear whether English results would be satisfying for a particular query.
 Keep in mind that every locale will have unique considerations regarding the number and
variety of languages (such as official languages, regional languages, local dialects, etc.),
writing systems, and keyboard input languages commonly in use.
o While these guidelines may not include examples for your locale, it
is important that you represent users in your task location and culture in order to
interpret the query and rate results. When in doubt, please assume that users
would prefer results in the task language unless the query clearly indicates
otherwise
Examples of English (and Non-English) Results in Non-English Locales
These examples use Hindi (IN) and Korean (KR) as the locales. In both cases, we cannot assume
that users in these locales4i.e., Hindi-speaking users in India, or Korean-speaking users in Korea
able to read English. Unless most users in the locale would be satisfied by English results for the
query, we will consider them unhelpful or even useless (FailsM).
Here are two examples where the query includes proper nouns typed in Latin script, such as
famous people, places, titles of books or films, etc. For these queries, users would prefer to see
results in the language of their locale.
Examples of Queries Satisfied by English Results
For queries about global businesses and organizations, users may expect or want to visit the
English language version of the business/organization's official website in some locales. Similarly,
for queries seeking technical information such as manufacturer part numbers, product specs,
scientific or chemical formulas, etc., the answer to the query may be typically expressed in the
English language in some locales.
For these queries, users may expect or want to see English results in order to satisfy their need.
Please use your judgment and knowledge of your locale to determine the appropriate rating.
Rating Dictionary and Encyclopedia Results for Different Queries
Lesson 10 of 10
Section 24: Rating Dictionary and Encyclopedia Results for Different Queries
When assigning Needs Met ratings for dictionary and encyclopedia results, careful attention must
be paid to the user intent.
Like all results, the helpfulness of dictionary and encyclopedia results depend on the query and
user intent. Dictionary and encyclopedia results may be topically relevant for many searches, but
often these results are not helpful for common words that most people in your rating locale
already understand.
Reserve high Needs Met ratings for dictionary and encyclopedia results when the user intent for
the query is likely "what it is" or "what does it mean" and the result is helpful for users seeking that
type of information.
Think very carefully about the helpfulness of dictionary and encyclopedia results for ordinary
words and common items. If few users would benefit from a dictionary or encyclopedia result
for a common word, a Slightly Meets rating may be appropriate. If very few or no users would
benefit, then Fails to Meet is appropriate.
Examples of Rating Dictionary and Encyclopedia Results

Project Callisto Search Quality Review Module 4 - Version 3

Uploaded by

Project Callisto Search Quality Review Module 4 - Version 3

Uploaded by

Page Quality and Needs Met

Porn, Foreign Language, and Did Not Load Results

The rating flags will look like this:

Needs Met Rating for Clear Porn Intent Query Examples

Needs Met Rating for Clear Non-Porn Intent Queries

Foreign Language Flag

Needs Met Rating for Foreign Language Results

Foreign Language Examples

Did Not Load Flag

Use the Did Not Load flag when:

 The landing page is completely blank: no MC, no SC, and no Ads.

Do Not Use the Did Not Load flag when:

Needs Met and the Did Not Load Flag

 Rating tasks may require identifying Upsetting-Offensive and Not-for-

Needs Met and Did Not Load Flag Examples

Section 16: Rating Queries with Multiple Interpretations and Intents

Rating Queries with Both Website and Visit-in-Person Intent

Some queries have two possible strong intents:

Section 17: Specificity of Queries and Landing Pages

Some queries are general, some are specific:

General Specific Even More Specific

[library] [harvard library] [harvard anthropology library]

[takeout chinese restaurants in downtown

[practice interview questions used for

 It may be difficult to rate general queries

Section 18: Needs Met Rating and Freshness

Query: "Breaking News" Stories

Examples: [tornado], [tsunami]

Examples: [olympics], [american idol], [redsox schedule], [tax forms], [elections]

Query: Current information queries

 Freshness is generally less of a concern for Page Quality rating.

Additional Flags in Some Rating Tasks

Misspelled and Mistyped Queries and Results

Section 19: Misspelled and Mistyped Queries

Misspelled Query Query Interpretation

The only reasonable query interpretation is the company named

The only reasonable interpretation is the famous singer/actress,

Section 20: Non-Fully Meets Results for URL Queries

Section 21: Product Queries: Importance of Browsing and Researching

Rating Visit-in-Person Intent Queries

Section 22: Rating Visit-in-Person Intent Queries

Always use your best judgement for these query cases!

Section 23: Rating English Language Results in Non-English Locales

You might also like

zproxy.org