TREC 2024 Lateral Reading Track Guidelines #

Important Announcement: The Overview Paper for the 2024 track has been released. This track has now concluded, and its successor—the TREC 2025 DRAGUN Track—will continue in 2025. DRAGUN stands for Detection, Retrieval, and Augmented Generation for Understanding News.

Overview #

Welcome to the TREC 2024 Lateral Reading Track. This track is for researchers interested in supporting readers in assessing the trustworthiness of search and online content. The current web landscape requires the ability to make judgments about the trustworthiness of information, which is a difficult task for most people.

Wineburg and McGrew (2019) discovered that professional information analysts follow a process of Lateral Reading that involves asking questions about a document’s source and the support for its claims, and seeking answers to these questions via search engines in order to establish the document’s trustworthiness. In the first year of this track, our goal is to explore NLP and IR technologies to support the tested practice of lateral reading during people’s trustworthiness evaluation of online news, given the civic importance of trustworthy news and a decline of trust in news over the years (Brenan, 2022). As such, it will not require a definition of what is true or false, and thus the track can address trustworthiness beyond the relatively narrow focus of traditional claim verification.

While teaching people how to do lateral reading can be an effective means of helping people better evaluate the trustworthiness of information (McGrew et al., 2019), this training cannot easily reach the millions of people who use the Internet and are finished with their schooling. As such, an opportunity exists for systems to assist users with lateral reading by helping them understand what they should question about a document and helping them find other documents that can answer those questions. For example, imagine a “Lateral Reading Copilot” that could assist or nudge people towards lateral reading behaviors when reading web pages.

In the first year, this track has two tasks with separate deadlines. For Task 1, participants need to suggest questions that a reader of an online target news article should ask to determine its trustworthiness. After the deadline for Task 1, we pool submitted questions as input to Task 2, where participants need to retrieve documents from the specified web collection to answer those questions.

Data #

Web Collection: This track will use the English subset ClueWeb22-B-English of the new ClueWeb22 Category B dataset as the document collection, which contains about 87 million popular web documents of roughly 200 GB of data (size of plaintext). This web collection was collected around February 2022. Please refer to their website for how to obtain the dataset. The ClueWeb22-B-English subset is found in cw22-b/txt/en for plaintext and cw22-b/html/en for WARC format, etc. Considering the size of ClueWeb22-B-English, we suggest obtain the collection as soon as possible.
News Articles: trec-2024-lateral-reading-task1-articles.txt contains the the ClueWeb22-IDs of 50 selected target news articles (or “topics”), each about a different event, published in 2021 and 2022 from various sources. If participants have not yet obtained the collection, they can obtain the 50 documents (plaintext version) from CMU for free once they have signed a licensing agreement with them. To obtain this 50-document subset, please refer to the How to Get It page and follow the instructions to request the TREC-LR-2024-T1 subset.

Tasks #

Assume there is a (general public) reader who is looking through online news. This track has the following two tasks with separate submission due dates.

Task 1: Question Generation #

For each of the 50 topics (i.e., target news articles), participants need to produce 10 questions that the reader should ask to evaluate its trustworthiness, ranked from the most important to the least important to ask. Those questions should meet the following requirements.

Should be self-contained and explain the full context, i.e., one can understand this question without reference to the article.
Should be at most 120 characters long.
Should be reasonably expected to be answered by a single web page.
Compound questions should be avoided, e.g. who is X and when did Y happen? In general, each question should focus on a single topic.

Participants should put all the questions for those 50 articles into a single file, using the format below, and submit it to NIST via Evalbase.

It should be a tab-separated file.
It should be encoded in UTF-8.
Each line consists of the following tab-separated fields in this order: topic_id, run_tag, rank, question.
- topic_id: ClueWeb22-ID of the target news article.
- run_tag: A tag that uniquely identifies your group and the method you used to produce the run. Each run should have a different tag. Run tags for runs submitted by one group must all share a common prefix to identify the group across runs.
- rank: Your rank for the question, starting from 1.
- question: Question in plaintext, with no tabs or newlines.

Submissions can be either manual (involving human intervention to generate questions, e.g., hiring people to produce questions or manually selecting questions from a candidate list of questions produced by algorithms or other human involvement in the question generation process) or automatic (automatic systems that produce questions without the need of human input beyond the construction of the systems). Teams submitting automatic runs should make a good faith effort to not read or study the 50 articles.

Task 2: Document Retrieval #

We have pooled 12 questions for each topic (i.e., target news article) from those submitted by participants in Task 1: trec-2024-lateral-reading-task2-questions.txt. In Task 2, for each question, participants need to retrieve 10 documents from the web collection ClueWeb22-B-English that are useful for answering those questions in the context of the corresponding news article, ranked by their usefulness. This task is similar to a traditional ad-hoc retrieval task. As we provide questions, to participate in Task 2 does not require your participation in Task 1. The question file is a tab-separated file with the following fields:

question_id: Unique identifier for the question. No two questions will have the same identifier, even across different topics.
topic_id: ClueWeb22-ID of the target news article, associated with the question.
question: Question in plaintext, with no tabs or newlines.

Note that the pooled questions may not satisfy the requirements for Task 1, e.g., some questions may not be self-contained or understandable without the context of the news article. Those questions might be discarded during the evaluation of Task 2 submissions. Meanwhile, participants are free to use the target news articles in addition to the pooled questions during their retrieval processes.

Runs can be either full rank or rerank. We have prepared a BM25-RM3 baseline run ( Organizers-Baseline-BM25RM3) and participants can rerank the top 100 retrieved results using BM25 (k1=0.9, b=0.4) with RM3 (fb_terms=10, fb_docs=10, original_query_weight=0.5) as implemented in Pyserini, without the hassle to index the full collection. Participants who want to rerank the baseline run can request the plaintext version of those retrieved documents (about 1 GB in JSONL format) from CMU after signing a licensing agreement with them. Please refer to the How to Get It page and follow the instructions to request the TREC-LR-2024-T2 subset.

Similar to Task 1, runs may be either automatic or manual. Participants should follow the standard TREC run format below and submit their runs to NIST via Evalbase.

It should be a space-separated file.
It should be encoded in ASCII.
Each line consists of the following space-separated fields in this order: question_id, Q0, doc_id, rank, score, run_tag.
- question_id: Unique question id from the question file (above).
- Q0: Unused column, whose value should always be Q0.
- doc_id: ClueWeb22-ID of the retrieved document answering the question in the context of the target news article.
- rank: Rank of the document, starting from 1.
- score: Score (integer or floating point) of the document, in non-increasing order. trec_eval sorts documents by the scores instead of the ranks.
- run_tag: A tag that uniquely identifies your group and the method you used to produce the run. Each run should have a different tag. Run tags for runs submitted by one group must all share a common prefix to identify the group across runs.

Schedule #

Task 1 Article Released: Early May
Task 1 Question Generation Submission Due: ~~June 30~~ July 26
Task 2 Question Released: August 7
Task 2 Document Ranking Submission Due: ~~August 30~~ September 8
Result Release: ~~Late September~~ November 5
Notebook Paper Due: ~~Late October~~ November 10
TREC 2024 Conference: November 18-22 at NIST in Gaithersburg, MD, USA

Evaluation #

NIST assessors judged the helpfulness of generated questions from Task 1 and the usefulness of retrieved documents to answer the corresponding question in the context of the news article from Task 2. Here are the assessing instructions: instructions.pdf. Evaluation results (including assessor questions, question assessment, and document assessment) can be found on this page. Please refer to our track overview paper for detailed result analysis.

Q&A #

How do we register to participate in this track?
Please follow the TREC registration guidelines from their Call for Participation.
Why can’t we join the Slack channel?
#lateral-reading-2024 is in the TREC workspace. Please join the workspace first by following the instructions in the TREC 2024 welcome email after registration.
Is there a limit on how many runs each group can submit?
Participating groups will be allowed to submit as many runs as they like, but they need authorization from the track organizers before submitting more than 10 runs per task. Not all runs are likely to be used for pooling and groups will need to specify a preference ordering for pooling purposes.

Organizers #

	Dake Zhang University of Waterloo Waterloo, Ontario, Canada Website LinkedIn X
	Mark D. Smucker University of Waterloo Waterloo, Ontario, Canada Website LinkedIn
	Charles L. A. Clarke University of Waterloo Waterloo, Ontario, Canada Website LinkedIn X

Resources #

Civic Online Reasoning - Teaching Lateral Reading