How auto coding themes works

This feature is available in NVivo Plus edition. Learn more


This topic explains how NVivo analyzes sources to determine what themes to code. Text analytics is a complex process—manual coding is always going to be more accurate.

We aim to continually enrich this functionality and we invite you to share your suggestions with us on the NVivo forum.

In this topic


The process

Select multiple sources, nodes or cases and use the Auto Code Wizard to produce results. A node matrix is created, and content is coded to theme nodes.

NVivo analyzes your material using a language pack. Themes are identified by analyzing the content and the sentence structure within it. NVivo assigns significance to some themes over others based on how frequently each theme occurs in the material being analyzed.

The themes are combined into groups and results are presented as a node for each broad idea, with child nodes for each theme within that group.

The relevant content is coded to the theme nodes that are created. The results are summarized in a node matrix which shows the nodes for each broad idea, and the number of coding references from each source.

Top of Page

What makes a theme?

The process detects significant noun phrases (for example, real estate development) to identify the most frequently occurring themes.

The process collects the themes and counts their mentions across all sources in the set being processed.

NVivo actively filters the themes—only the most relevant themes are presented in the results. You can choose which themes to create as nodes at the end of the process.

The automated insights process may produce different results in different languages. For example, if you have a translated version of the same source, analyzing the French version in French and the English version in English may produce different results.

Top of Page

Theme grouping

NVivo groups themes by comparing words with the same stem, for example house, houses and housing. It then filters the themes and excludes those groups that represent a much smaller proportion of your content.

For each group, NVivo uses the most frequently shared phrase or word as the name of the node.

Themes can belong to more than one group—for example, storm water runoff may be grouped with the theme water and the theme storm—and have the same coding references.

  •  Nodes

  • Autocoded Themes

  • water

  • water quality

  • water table

  • clean water

  • storm water runoff

  • change

  • environmental change

  • area change

  • huge change

  • storm

  • storm water runoff

  • storm damage

Top of Page

Collections of material

You can analyze sources, nodes or cases individually or as a combination of items.

If you analyze each item individually, you'll get different results compared to analyzing a group of items together.

1 Items A, B and C mention water quality once. Even though it is only mentioned once, the process identifies water as a concept that is mentioned in the majority of items being analyzed. Water quality is therefore suggested as a theme node in the results. If any of the items A, B or C were analyzed individually, this theme may not necessarily be identified as a theme. For example, if you are reviewing submissions on the same issue, and want to see what overall themes are detected, wait until you receive all the submissions and then process them together as a group.

2  Item D mentions tourism industry several times, but this will not be suggested as a theme node in the results because even though it is mentioned several times, it is only in one source in the group being analyzed. If you ran Item D individually, the theme tourism industry may be identified as a theme.

Top of Page