Run a Word Frequency query

 


You can use Word Frequency queries to list the most frequently occurring words or concepts in your sources.

In this topic


 


Understand Word Frequency queries

Use Word Frequency queries to list the most frequently occurring words or concepts in your sources.

You could use a Word Frequency query to:

  • Identify possible themes, particularly in the early stages of a project

  • Analyze the most frequently used words in a particular demographic. For example, analyze the most common words used by farmers. You could do a coding query to gather all content coded at case nodes with the attribute farmer—then select the result node as the criteria for the Word Frequency query.

If you are using NVivo Pro or NVivo Plus, you can also use a Word Frequency query to:

  • Look for exact words, or broaden your search to find the most frequently occurring concepts. For example, if you look for the most frequent words in a dataset survey, you might find that water, health, and harmful are the most frequently occurring words. However, if you group similar words together, you might find that the concept of pollution (including pollutants, pollution, polluted, and pollutes) occurs most frequently.

Before you run a Word Frequency query, make sure the text content language is set to the language of your source materials—refer to Set the text content language and stop words for more information.

If you a run a Word Frequency query created in a different edition of NVivo, it may search sources that aren't supported in your edition. You will see accurate results but you won't be able to open or view some references. Refer to About queries (Working with queries across editions) for more information.

Top of Page

Create a Word Frequency query using the Wizard

  1. On the Query tab, in the Create group, click Query Wizard.

The Query Wizard opens. Follow the steps on the Wizard.

 

Wizard step Description

Choose the query you want to run.

Click Identify frequently occurring terms in content.

Specify the terms you want to search for.

In the Display words box, specify the number of words displayed in the results—for example, show only the top 20 words.

In the Minimum word length box, type the number of characters of the smallest word you want to include. For example, a word length of 4 will exclude small words from the results.

Select a Grouping option. Choose to find exact matches or group words with the same stem together—for example, you can search for sport and find sporting.
If you have NVivo Pro or NVivo Plus you can adjust the slider to broaden your search to find similar concepts. For example, find sport, play and recreation. Refer to Understand text match settings for more information.

Choose where you want to count words.

Choose whether you want to count words in all your sources, or restrict the count to words in selected items or folders.

Choose whether to add the query to your project.

You can run the query once or choose to add it to your project (and run it).

If you choose to add it to your project, you must enter a name. You can optionally enter a description.

  1. Click Run.

The query is executed and the results are displayed in Detail View.

NOTE  If you want to use Word Frequency query features that are not available via the Wizard—for example, only count words in sources created by specific users—you can add the query to your project and update it later. If you are familiar with NVivo queries, you may prefer to create the query outside the Wizard.

Top of Page

Create a Word Frequency query outside the Wizard

If you are not familiar with NVivo queries, you may want to create your Word Frequency query using the Wizard—the Wizard guides you through the process of setting your query criteria. However, not all query features are available in the Wizard, so you may sometimes want to create your Word Frequency queries outside the Wizard.

  1. On the Query tab, in the Create group, click Word Frequency.

  1. Choose where you want to search for matching text:

  • All sources—search for content in all the sources in your project, including externals and memos
  • Selected Items—restrict your search to selected items (for example, a set containing interview transcripts)
  • Selected Folders—restrict your search to content in selected folders (for example, a folder of interview transcripts)
  1. Specify how many words you want to display:

  • <number> most frequent—include a specific number of words. For example, you could display the 100 most frequently occurring words.
  • All—include all words found in the selected project items.
  1. (Optional) Enter a minimum word length to exclude short words from the results—for example, enter 5 to display only words with five or more letters.

  2. Select a Grouping option. Choose to find exact matches or group words with the same stem together—for example, search for sport and find sporting. If you have NVivo Pro or NVivo Plus you can adjust the slider to broaden your search to find similar concepts. For example, find sport, play and recreation. Refer to Understand text match settings for more information.

  3. Choose whether you want to look for coded content in all your sources, or restrict the search to selected items or folders—click the Select button to choose specific project items.

  4. Click the Run Query button at the top of Detail View.

NOTE To save the Word Frequency query, click the Add to Project button and enter the name and description (optional) in the General tab.

Top of Page

Understand the results

When you run a Word Frequency query the results are displayed in Detail View. Depending on your edition, there are up to four tabs displayed on the right—the Summary, Word Cloud, Tree Map and Cluster Analysis tabs. You can change which tab is displayed by default—refer to the display options in Set application options for more information.

Summary tab

1  The most frequently occurring words excluding any stop words. If you adjusted the slider to return similar words, the most frequently occurring word from the group is displayed in this column.

2  Length—the number of letters or characters in the word.

3  Count—the number of times that the word occurs within the project items searched. If you adjusted the slider to include similar words, this count is the total for all the similar words.

4  Weighted Percentage—the frequency of the word relative to the total words counted. If you adjusted the slider to include similar words, a word may be part of more than one group of similar words. The weighted percentage assigns a portion of the word's frequency to each group so that the overall total does not exceed 100%.

5  Similar Words—other words that have been included as a result of including stemmed or similar words—for example, if you include words with the same stem, then pollutants, pollution, and polluted would be grouped together. This column is not available if you use 'Exact match only'.

Word Cloud tab

This tab displays up to 100 words in varying font sizes, where frequently occurring words are in larger fonts.

When you view the results as a Word Cloud, you can change the style— on the Word Cloud tab (ribbon), choose from a gallery of styles.

Tree Map tab

This feature is available in NVivo Pro and NVivo Plus.

The Tree Map tab displays up to 100 words as a series of rectangles, where frequently occurring words are in larger rectangles.

Cluster Analysis tab

This feature is available in NVivo Pro and NVivo Plus.

The Cluster Analysis tab displays up to 100 words as a horizontal dendrogram, where words that co-occur are clustered together.

When you click on the cluster analysis diagram, the Cluster Analysis tab (on the ribbon) becomes available, you can use the commands on this ribbon tab to:

  • Change the diagram type—you can show the data as a horizontal or vertical dendrogram, a circle graph, or a 2D or 3D cluster map

  • In 2D or 3D cluster maps, select the Word Frequency check box if you want to use word frequency to determine the size of the bubbles in the cluster map.

For more information, refer to Change the appearance or content of a cluster analysis diagram.

Top of Page

See all the references for a selected word

When you run a Word Frequency query, a preview node is created for each word—this lets you see all references to the word. To open a preview node double-click the word you want to explore.

In the preview node, you see each occurrence of the selected keyword in context:

The context (the text around the word) is displayed in gray—by default it is a 'narrow' context. To expand the context for a selected reference, on the Node tab, in the View group, click Coding Context and choose the coding context.

NOTE  For other types of nodes, the name of the ribbon tab is different. For example, if you are currently working in a case node, you will access the above commands on the Case tab.

Top of Page

When determining the frequency of words, NVivo applies the following rules:

  • Words containing punctuation (such as hyphens, periods and other symbols) are divided into separate words. For example, part-time will be counted as part and time.

  • Words containing apostrophes (such as o'clock and d'accord) are treated as one word but if the apostrophe is followed by an 's then the s is not included (Tom's would be counted as Tom).

  • In audio and video transcripts, only words in the Content field (column) are counted—any words in custom transcript fields are ignored.

  • In datasets, only words in codable fields (columns) are counted—any words in classifying fields are ignored.

  • When searching text in selected nodes, if a word is coded against multiple nodes, it is counted once for each node. Similarly, if a word has been coded by multiple users to the same node, it is counted once for each user.

  • Word Frequency queries do not include 'stop words'—refer to Exclude particular words when running Word Frequency queries for more information.

  • A Word Frequency query does not search text in framework matrix summaries

  • Word Frequency queries do not search text within images. PDFs created by scanning paper documents may contain only images—each page is a single image. If you want to use Word Frequency queries to explore the text in these PDFs, then you should consider using optical character recognition (OCR) to convert the scanned images to text (before you import the PDF files into NVivo).

  • If the text content language is Japanese, the 'base form' is listed in the query results, but the count includes any alternate forms of the word—refer to Working with Japanese text in queries for more information.

Top of Page

Exclude particular words when running Word Frequency queries

Word Frequency queries do not include 'stop words'—by default, these are less significant words like conjunctions or prepositions, that may not be meaningful to your analysis. You can view and edit the list of stop words, refer to Set the text content language and stop words for more information.

You can add a word displayed in your query results to the stop words list—select the word you want to exclude from the query results, then click Add to Stop Words List, in the Actions group on the Query tab. The words you add to the stop word list will be excluded the next time you run a Word Frequency or Text Search query.

NOTE  In server projects, only Project Owners can add words to the stop word list—refer to Collaborate in a server project for more information.

Top of Page

You can create a node that includes all the references to a word you select in the Word Frequency query results.

  1. In the query results, select the word you want to use to create a node.

  2. On the Create tab, in the Items group, click Create As Node.

The Select Location dialog box opens.

  1. Select a location and name the node.

  2. Click OK.

NOTE  If the text content language is Japanese, the node will include references to the base form or any alternate forms of the word—refer to Working with Japanese text in queries for more information.

Top of Page

You can run a Text Search query for a selected word in the Word Frequency query results.

  1. On the Query tab, in the Actions group, click Other Actions, and then click Run Text Search Query.

The Text Search Query dialog box opens.

  1. (Optional) Change the Text Search Criteria or Query Options. Refer to Run a Text Search query for more information.

  2. Click Run.

NOTE  If the text content language is Japanese, the Text Search query will find all occurrences of the base form or any alternate forms of the word—refer to Working with Japanese text in queries for more information.

Top of Page