IARPA BETTER (Better Extraction from Text Towards Enhanced Retrieval) information extraction and information retrieval datasets.

Published by National Institute of Standards and Technology | National Institute of Standards and Technology | Metadata Last Checked: August 02, 2025 | Last Modified: 2023-02-24 00:00:00

Cross-language information extraction and retrieval datasets developed for the evaluation of the IARPA BETTER program. The documents come from CommonCrawl. The IE annotations in three schemas are by MITRE and ARLIS. The IR queries and relevance judgments were done at NIST, and NIST was asked by IARPA to distribute the data in its final form. The tasks are all cross-language from English into one of Arabic, Farsi, Russian, Chinese, and Korean

Find Related Datasets

Click any tag below to search for similar datasets

information extraction; information retrieval; cross-language information retrieval

Complete Metadata

@type	dcat:Dataset
accessLevel	public
accrualPeriodicity	irregular
bureauCode	[ "006:55" ]
contactPoint	{ "fn": "Ian Soboroff", "hasEmail": "mailto:ian.soboroff@nist.gov" }
description	Cross-language information extraction and retrieval datasets developed for the evaluation of the IARPA BETTER program. The documents come from CommonCrawl. The IE annotations in three schemas are by MITRE and ARLIS. The IR queries and relevance judgments were done at NIST, and NIST was asked by IARPA to distribute the data in its final form. The tasks are all cross-language from English into one of Arabic, Farsi, Russian, Chinese, and Korean
distribution	[ { "title": "The BETTER datasets", "mediaType": "application/octet-stream", "downloadURL": "https://ir.nist.gov/better/" } ]
identifier	ark:/88434/mds2-2946
issued	2024-04-22
keyword	[ "information extraction; information retrieval; cross-language information retrieval" ]
landingPage	https://ir.nist.gov/better/
language	[ "en" ]
license	https://www.nist.gov/open/license
modified	2023-02-24 00:00:00
programCode	[ "006:045" ]
publisher	{ "name": "National Institute of Standards and Technology", "@type": "org:Organization" }
theme	[ "Information Technology:Data and informatics" ]
title	IARPA BETTER (Better Extraction from Text Towards Enhanced Retrieval) information extraction and information retrieval datasets.

1 resource available