NIST test dataset for assessing baseline nucleic acid sequence screening
This repository contains the dataset used in the manuscript "Inter-tool analysis of a NIST dataset for assessing baseline nucleic acid sequence screening". NIST constructed the test dataset based on the current screening recommendations from HHS. The dataset is a FASTA formatted file with blinded numerical sequence headers. The dataset was sent to sequence screening tool developers for initial testing and to obtain feedback about its utility for assessing baseline sequence screening. An additional metadata file provides the NIST-assigned label for each sequence, along with a more detailed description derived from the source database.
Complete Metadata
| bureauCode |
[ "006:55" ] |
|---|---|
| identifier | ark:/88434/mds2-3787 |
| issued | 2025-05-21 |
| landingPage | https://data.nist.gov/od/id/mds2-3787 |
| language |
[ "en" ] |
| programCode |
[ "006:045" ] |
| theme |
[ "Bioscience:Biomaterials", "Bioscience:Engineering/synthetic biology", "Public Safety:Chemical/Biological/Radiological/Nuclear/Explosives (CBRNE)" ] |