A Journalistic AI to Build Quality Assessment Dataset

July 12, 2023

The Analysis and Response Tool (ARTT) project, led by Hacks/Hackers and funded by the National Science Foundation (NSF)’s Convergence Accelerator, has published a dataset developed with the Overtone lens. The ARTT and Overtone teams worked collaboratively to build this dataset, which serves as a unique contribution to the field of automated quality assessment of online news articles and other content.

About the Dataset

The dataset is a repository of 1,000 vaccine-related articles, collected from a variety of news media sources. Articles in the data were scored based solely on their journalistic quality by a natural language processing (NLP) algorithm provided by Overtone. Overtone’s algorithm rates text according to the presence or lack of journalistic signals, such as original reporting, good sourcing, meaningful analysis, and exploration of ideas.


The scoring ranges from a score of “1”, which represents less depth or low informational value-add, to a score of “5”, representing high-depth or high informational value-add. The articles in the dataset were sourced from traditional journalism outlets (news and news-leaning websites), as well as from non-journalistic sources of vaccine information, such as governmental websites, healthcare and NGO websites, and medical journals.

Overtone’s algorithm makes assessments based solely on textual content, as opposed to other types of common assessment metrics such as outlet, author, or level of engagement. This analytical focus on content, combined with the inclusion of a diverse set of article types, allowed ARTT and Overtone to examine how different styles of vaccine-related content measured against traditional journalistic quality standards.

“As someone with a journalistic background, I know that the editorial distinctions between articles matter, particularly when readers make decisions. Our lens provides insight into the intention of the text itself. We are happy that our work adds to the ARTT project by giving context into how people pay attention to news,” Overtone’s Chief Product Officer Christopher Brennan said.

Dataset Insights

Besides providing an opportunity to observe how the Overtone algorithm performs, the dataset also offers interesting insights related to assessing vaccine-related content. First, the dataset offers initial guidance on which type of vaccine-related news articles might be worth recommending. Many datasets in the field of quality and credibility rate articles with binary “reliable” or “unreliable” veracity labels, generally at the source level. However, there can be logical fallacies, incorrect representation of claims, and other low-quality signals even among highly reputable news sources. The dataset demonstrates the need to go beyond a source indicator of quality.

As well, the dataset affirms that there are different considerations in quality evaluation for general journalism and quality evaluation for scientific or health communications. In this dataset, strictly informative pieces from authoritative sources, such as content created by member organizations of Vaccine Safety Net, do not perform as well with a journalistic-based algorithm because they are being evaluated against traditional news articles. This observation provided by the dataset suggests that perhaps, in some forms of journalism, the source or outlet is a worthy metric to consider when evaluating quality.

“I hope this dataset can be useful to other researchers and tools, on health information and other subjects, going forward,” said Brennan. “We also appreciate the National Science Foundation’s Convergence Accelerator enabling such important work to be explored.”

The abundance of information available online, and the speed at which it is produced, means that the need for automated quality assessment will continue to grow. This dataset, and its application of quality assessment to vaccine-related content, aims to contribute to that research.

About ARTT

The Analysis and Response Toolkit for Trust (ARTT) project is focused on helping people engage in trust-building ways when discussing vaccine efficacy and other topics online. In October 2022, Hacks/Hackers, the Paul G. Allen School of Computer Science & Engineering, and partner organizations received a new $5 million award from the National Science Foundation’s Convergence Accelerator. The award will support Phase II development of the Analysis and Response Toolkit for Trust (ARTT), a suite of expert-informed resources that are intended to provide guidance and encouragement to individuals and communities as they address contentious or difficult topics online.

This article was originally published on the News Quality Initiative (NewsQ) website on October 24, 2022.