USGS logo Yale School of Medicine logo.
The Canary Database
Yale Occupational and Environmental Medicine Program
135 College St
Room 366
New Haven, CT, USA
06510-2283




How to Search the Canary Database

The Canary Database search interface uses several techniques to make searching seem to "just work." Our goal is that anyone familiar with basic Pubmed or Google searching will feel right at home, and also that more experienced searchers will be able to reuse familiar syntax and techniques.

To learn more about how to search, see:

 

Basic Searching

To search the Canary Database, enter any search terms in the search box at the top of the screen. All search terms you enter will be required for a record to match. By default, search terms will be matched against all available data. See "Query Syntax" and "Searchable Fields" to learn how to construct complex searches, and to search specific fields.

The Canary Database uses several vocabularies to match different names of equivalent concepts. For example, searches for "dog", "dogs", or "canis familiaris" should all match any record that has been curated with the species "Dogs". Similarly, searches for either "SARS" or "severe acute respiratory syndrome" should both match studies curated with the exposure "SARS Virus" or the outcome "Severe Acute Respiratory Syndrome." See "How It Works" for more information about the vocabularies we use.

The search process will rank all matching records according to their relevance to your search terms, and the matches will be listed in order, with the "most relevant" matches at top.

Top of page.

Query Syntax

Several standard techniques for searching are available in the Canary Database. Boolean, wildcard, fielded, phrase, and proximity searching are all possible. Many of these techniques can also be combined together. Examples of each follow:

Type Examples Notes
Booleans

(mosquitofish or trout) and australia

canada not toronto

+tularemia -anthrax

Use parentheses to specify logical grouping/precedence.

"and" and "or" combine terms in typical fashion. "not" requires the following term to not match.

Adding "+" to the left of a search term requires that value to match; adding "-" to the left of a search term requires that value to not match.

Wildcards

terror*

sm?th?

"*" added to the right of a search term will match zero-to-many characters at the end of the term (right truncation). In this example, it will match any of "terror", "terrorism", or "terrorist".

"?" will match zero or any one character. In this example, either "Smith" or "Smythe" will match.

Note that wildcards do not work at the left of a search term.

Fields

smith [au]

smith.au.

"beluga whale" [spec]

species:"beluga whale"

Several familiar ways to specify a search field are available.

value [field] is "Pubmed-style" and works for all fields.

value.field. is "BRS-style" and also works for all fields.

See the list of Searchable Fields below for complete details.

Phrase

"endocrine disruptors"

"sars virus" [exp]

"sleeping giant state park".loc.

Use double quotes to search for a specific value with multiple terms separated by spaces.

Note that this works with fielded search. Quotes are optional for single-term values, and can be left out.

Proximity

Lappivara~ [au]

"ebola chimpanzees"~5

Using "~" after a single-word search term will match spelling variations (i.e. "edit distance") in that single word.

Using "~5" after a multi-word search phrase will match multiple terms found within five words of each other (i.e. "proximity").

Top of page.

Searchable Fields

Many fields are indexed and available for searching. To find specific values for any particular field, specify the field name in a query like this (for an author search):

    daszak.au.
    daszak [au]
    daszak [author]
    author:daszak
    "zelikoff jt".au.
    "zelikoff jt" [AU]
    author:"zelikoff jt"

All of the above will search for the specified value in the author field. Note that to search for an author name using both last name and initials, best results will be obtained by enclosing the last name and initials in double quotes.

All available search fields are listed below. For any particular field, any of the abbreviated or complete field names may be searched, and will yield equivalent results.

Abbreviation(s) Field Name
1au First author (matches *only* first author)
ab, abstract Abstract
af, affiliation Affiliation
all All fields (Note: This is the default)
au, author Author
exp, exposure Exposures
gn, grantnum Grantnum
issn Issn
is, issue Issue
jn, journal Journal
kw, word, keyword Keyword
loc, location Location
me, meth, methodology Methodology (Study type)
out, outcome Outcomes
pg, page, pages Pages
pd, date, year Publication date
rn, registrynum Registry number
rf, risk_factor Risk factors
spec, species Species
mh, sh, subject Subject
ti, title Title
ui, uid Unique identifier
vol, volume Volume

Top of page.

Finding Related Records

When you find a record that interests you, click on the "Related" tab to find links which search the database for similar records. Links will be available to search for more records based on study information such author or journal names, and on curated data such as species and exposures.

Currently this is limited to search a single similar value (i.e. "more from this author" or "more about this species"). We are working to add other ways, included an advanced search screen where multiple "similar values to search" may be specified, and other algorithmic means to find "studies like this one."

Top of page.

Still Under Development

We are continually working to improve the database model and indexing strategies for the Canary Database. Because the studies we curate include information from a variety of abstracting and indexing sources, we are exploring additional ways to make advanced searching capabilities easy to use across these source records. We expect to offer a flexible advanced search feature soon.

Top of page.

How It Works

The Canary Database uses PyLucene, a Python version of the Lucene information retrieval library, in its search interface. Lucene is very flexible and very fast, and it allows us to index and search a wide variety of fields in curated studies.

We use the UMLS Metathesaurus to match subject headings and species names from MeSH and the NCBI Taxonomy, and we also cross-reference species names from ITIS. We use the USGS GNIS and NIMA GNS gazeteers to curate study locations with over seven million geographic feature names.

We're very grateful to the developers of Lucene and PyLucene for making such an excellent and sophisticated suite of tools available as Free Software, and also to the publishers of the vocabularies and related tools mentioned above, for their excellent, free-of-charge products.


Top of page.