Advanced Searching with QUOSA
This note describes the main advanced searching capabilities provided by QUOSA Information
Manager. We do not describe the initial full text article retrieval from a PubMed or similar search;
nor reviewing those articles, links with Citation managers, or folder management.
The subjects covered are:
· Search alerts or automated updating of previous searches
· Secondary searches within existing search lists or folders
· Concepts4Clustering
· Batch queries of PubMed
· Terms4Clustering
· Using Terms4Clustering with Genes and Aliases search lists
Search alerts
It is possible to set up Quosa automatically to update any search you have made with new
additions to the search results returned by PubMed. At present search alerts works only when
Quosa is running, although we hope to remove this constraint shortly. Search Alerts will work
with any search gateway where you do not need to log on, including “General Web” with Google.
Note that PubMed ranks returns to a search according to the date of accession. This means that
new entries to PubMed will be at top of a list of returns, so that if you had asked for the top 50
returns, a new search will differ from an old search by including the new PubMed entries. Quosa
notes this difference and adds the new entries to your existing list. Optionally, QUOSA can be
configured to send you an email alerting you to a new addition.
You can configure a search alert with the Set Alert icon (Express View) or tool hammer icon
(Advanced View) on the Results pane. First select the search you want to be updated by clicking
on the primary search name in My Searches and Folders. If you then click on the Set Alert or
hammer icon, you will be able to click on or will be taken directly to Configure Search Alert. You
can then set the time and frequency at which you want searches conducted. You could also set
up an e-mail alert and/or a “Destination Folder” (either local or on a Virtual Library) where you
wish to save the cumulative result of all alerts.
Once you have configured a Search Alert, the search list will move to My Alerts in MY Searches
and Folders. It is otherwise unchanged. You can still select items from within the search results
and extract them to EndNote or copy them to folders. Note that if you delete an item from a
search alert list, then QUOSA will notice its absence at the next update and include the
restoration of the item in the update.
Remember that QUOSA needs to be running at the designated update time in order for Search
Alert to work.
Secondary searches within existing search lists or folders
Quosa offers powerful secondary search capabilities on either primary searches you have
conducted on PubMed (or similar gateways) or on folders of full text articles which you have built
up. Note that your primary searches will typically have been made on information only available
in abstracts. Now that you have downloaded the full text article with the help of QUOSA, you
have your first opportunity to search the full text all those articles.
QUOSA's features of analysis, automation, and speed enable you quickly to find what you need
from a primary search. In doing so it also opens up the opportunity to retrieve large sets of full
text articles from primary searches and to quickly and effectively use QUOSA search tools to
identify the articles of interest to you.
Boolean queries
The simplest such tool is Search in Results. This conducts a secondary search with a Boolean
query of your choice on any search list or folder available under My Searches and Folders. Help
with formulating Boolean queries is provided from our website at http://www.quosa.com/download/BooleanHelp.html
The Search in Results tool is accessible from the magnifying glass icon on the Results pane
toolbar. You need to have a search list or folder selected before using this, which will be the case
if you have files listed in the Results pane. Clicking the magnifying glass icon will bring up a
dialogue box for your Boolean search query on the full text. It also provides the option to add
search terms to be applied to metadata, such as author or journal. When you click “Search” on
the dialogue box, the secondary search is performed very rapidly. The dialog box stays open in
case you wish to conduct a second search. Notice that a new search list has been created in My
Searches and Folders, positioned as a sub search to the primary search. Notice that this new
search list is the one currently selected. If you proceed to do a further search under the current
set, that would be a search on the secondary list and not on the primary search. Reselect the
primary search to search on it.
Note that in addition to being able to search on the currently selected list (the current set), you
have the choice to search on all search lists, all folders, or both all folders and all search lists.
Having conducted a secondary search, you may wish to highlight the results with the search term
used. To do this, click the highlight the icon. Having done this, you'll be able to use Document
Summary as you would with the primary search to review the key text in each article in turn.
Highlighting also has the effect of analyzing the concepts within a text and showing them in the
Concepts pane of the document summary.
Regular Expression and Left Truncation Queries
The Left Truncation option can be selected from the Search in Results dialog box. It allows you
to perform searches to find words that end in certain letters, by starting the search term with an
asterisk and following that with the desired ending (e.g. “ology”). This type of search only works
for the currently selected folder or sarch result.
Regular Expression Queries (also selected from the Search in Results dialog box) are searches
oriented to sets of characters than sets of words. Help with formulating Regular Expression
Queries is provided from our website at http://www.quosa.com/download/RegExpHelp.html
Like search in results, Regular Expression Queries create a sub folder with their results
appended to the primary search list or folder. Note that in this case relevant text in the results will
be automatically highlighted and reproduced in the document summary for each article. If you
choose to re-highlight the secondary research with other text, you should bear in mind that the
highlighter operates like a Boolean query.
Concepts4Clustering
Concepts4Clustering is a method for automatically extracting from a body of article concepts
which occur in more than one article. It can be applied to any search list or folder within My
Searches and Folders. It is useful for both prompting the user with concepts that could be
relevant to his or her inquiry and generating a list of concepts for further searches of the types
described in later sections.
To extract concepts from a list, first select the list and then click the Concepts4Clustering tab.
This tab only visible in the Advanced View and is grouped with the Document Summary, My
Article Organizer, and Terms4Clustering tabs on the Organizer. The Organizer will then switch to
the Concepts4Clustering view. It is possible that the results need to be “analsyed” before the
clsutrign is avialble. If so, you will be warned and asked to peform this via “highlighting” the
restuls with your preferred term for the focus of the enquiry. If no concepts are found, this will be
indicated. Otherwise a list of concepts is presented, the most numerous at the top. For each
concept listed, the number of articles with significant instances of that concept is given. Each
concept is expandable to show the number of articles with significant instances of variants of the
main concept.
When the user clicks on a concept the articles in the Results pane are reordered so that those
including the concept are brought to the top of a list. A new field called “In Cluster” is shown in
the Results pane, with a cluster icon against all articles including the relevant concept. This
reordering occurs whenever a new concept is selected in the organizer pane.
If you right click on a concept in the Organizer pane, you are presented with a number of
choices, including to reorder the concepts alphabetically rather than by cluster size and to save
concepts to a file. Saving concepts to a file allows one to keep a record of generated concepts. It
also allows one to use a set of concepts relevant to one body of articles for searches on the other
lists of articles such as with the batch queries and Terms4Clustering described below.
Import Batch Query
This tool is only available in Platinum versions of the QUOSA Information Manager.
Import Batch Query allows the user to automate a series of primary PubMed searches along with
a further series of secondary searches on the full text articles retrieved in each of those primary
searches. QUOSA can automate the whole batch of searches, saving time and making possible
searches that would otherwise be impracticable.
The tool is accessed from the File | Import | Batch Query File menu at the top of the QUOSA
window. This offers the choice of:
· A batch query on the Author field in PubMed from a list of authors provided by the user, and
· A batch query on the full abstract in PubMed for a list of search terms,
· A general (default search settings) query on any supported online conference
The author and normal batch queries work in the same way. You can see from the Help note
above an example query in Excel as follows:
|
|
A
|
B
|
C
|
D
|
|
1
|
{pthrp}
|
expr
|
[R]S.h
|
[W]diff*
|
|
2
|
{heat stroke}
|
"effect of"
|
[W]bi*r results
|
[R]g.*th
|
|
3
|
{asthma}
|
pl*te
|
[B]"Annexin A8"
|
[R]a[0-9]+ b[0-9]+ c[0-9]+
|
If you use this list, QUOSA will retrieve a number of files under each search term in column A.
Note that this search term is in parentheses - formatted as a PubMed search expression. You
can copy and paste PubMed search expressions for this purpose.
After each such primary search, QUOSA will perform a number of secondary searches on the
results of the primary search, according to the contents of columns B and above in the same row
as the primary search expression.
In the example below, you see a row for which the primary search expression is {asthma) and
then a number of secondary expressions in quotes (to signify that a hit with both terms is
required) and a number which is a measure of the desired proximity of those terms.
|
{asthma}
|
“alias big” ~4
|
“candidate gene responsible” ~4
|
“detectable nr4a3 fusion” ~4
|
Search lists like these can be generated from saving to file concepts generated by
Concepts4Clustering, as described in the earlier section. You can take any list like that and copy
it into an Excel spreadsheet to get an automated batch query. In fact, the transformations needed
to do this in Excel can themselves be automated in an Excel macro.
CSV files can also be used for this purpose.
The process of performing an Import Batch query consist of first setting the size of each of the
primary searches (e.g. 100 articles) and then selecting the Excel or CSV file containing the
primary and secondary search terms.
When QUOSA has finished the batch query, each primary search will be shown in My Searches
and Folders, with each of the secondary searches shown as a sub-search to the relevant primary
search. It is exactly as if a series of manual Searches in Results had been performed on each of
the primary searches.
Terms4Clustering
This is sometimes informally referred to as a dictionary query. It allows the user to see how many
articles in any search list or folder contain any instances of each of a list of search terms, i.e. to
see the cluster size for each search term. The results are presented like Concepts4Clustering
results: the relevant search terms (in this case) ranked in descending cluster size. (The resulting
search terms may also be ranked alphabetically. Terms not found in any article are not displayed
in the results.)
As with Concepts4Clustering, when the user clicks on a search term the articles in the Results
pane are reordered so that those including the search term are brought to the top of the list. The
In Cluster field is shown in the Results pane, with a cluster icon against all articles which include
the relevant search term.
You can use any list stored on your computer as a txt, csv, or xls file. An example would be a text
file created by saving to file a list of concepts generated by Concepts4Clustering, such as the
following, saved as concepts.txt:
alias big
candidate gene responsible
detectable nr4a3 fusion genes
differential gene
differentiation potential
gene mutation
human neuroblastoma cells
mutation alg3 gene causing
net gene promoter polymorphic
nr4a3 fusion genes
polymorphism gene
rara gene fusion products
sert gene promoter polymorphism
tp53 gene mutations
To apply this list to Terms4Clustering, go to “Tools” menu at the very top of the QUOSA window
and select Tools/ Terms4Clustering/Configure Clustering. Then browse for concepts.txt (in this
example) and click OK.
Terms4Clustering will continue to use this list until another is selected. So if one list is used
repeatedly, there is no need to configure the list each time. There is also an option to edit the list.
Once the right list loaded, select the folder to search, make sure files from it are showing in the
Results pane, and then click the Terms4Clustering Tab on the left (only visible in the Advanced
View and stacked with the Documents Summary, Concepts4Clustering, and My Article
Organizer tabs). The Terms4Clustering view will open and display the name of the search
dictionary and the number of hits for each search term.
As mentioned earlier, the articles in the Results pane will be re-ordered when any search term in
the Terms4Clustering pane is selected, with the Cluster icon showing against those articles
including the selected search term. The user can highlight the relevant text with the highlighter
tool and then find it via the Document Summary tab, which shows the passages with hits for all
words in the search expression and also those with only a partial hit.
(Note that if one does not highlight the articles, they may be highlighted with a search term used
earlier for that folder. Also, if for any reason you lose the Cluster-ordered view in the Results
pane, you can retrieve it by clicking first on another tab (e.g. Document Summary) and then
clicking back to Terms4Clustering and then the relevant search term. This won’t affect the
highlighting.)
Highlighting will highlight all articles in the search list. One advantage of this is that it reveals the
incidence of hits which are not selected as being in the cluster. This includes articles with only
partial hits and articles that have passages containing all search words but spaced further apart
than the default proximity in Terms4Clustering.
Searching with Genes and aliases
QUOSA can perform a Terms4Clustering search with a search term list automatically generated
from a query on NCBI’s Gene database. So – as a hypothetical example – one could get a gene
term list from an “aging” query on the Gene database and quickly see the hits for those gene
terms in a large group of articles on asthma.
To do this, go to Tools/ Terms4Clustering/Create Gene Dictionary. You get a dialog box with an
explanation of these next steps. OK that and Entrez Gene appears in the browser pane. We are
going to enter “aging” into the search term field. The return page lists the first 20 returns. In the
toolbar at the top of the browser pane, we set the “Get” number to 29 and click on the Sigma
button.
We then get a dialog box asking for a list file name and location and offering a choice between a
dictionary with grouping and a flat list. The grouped dictionary will result in an hierarchical search
and tiered search returns. A flat list will treat all search terms as equal and primary. Once this
choice is made, QUOSA starts to compile the list. This may take some time. If a dialog box warns
of a NCBI Internal Error, repeat the process from the point of clicking on “Go” in Entrez Gene” to
retrieve genes for your search term.
The result of this either a flat or hierarchic list of genes and aliases drawn from the search on
Entrez Gene in a CSV file, which can be configured as the selected list for Terms4Clustering.
When an hierarchic list is used in Terms4Clustering, the results are first shown at the top level of
the hierarchy and are expandable to show the results for subsidiary terms.
July 08