Data and code from: A nitric oxide reductase is a key enzyme target for eliminating fungal emissions of nitrous oxide
Brief description of study aimsFungal denitrification is a biogeochemical process that releases nitrous oxide (N2O), a potent greenhouse gas. The NOR1 gene is part of the denitrification pathway in Fusarium. Four experiments were conducted for this study. (1) The N2O comparative experiment compares denitrification rates, as measured by N2O production, of a variety of Fusarium spp. strains with and without the NOR1 gene. (2) The N2O substrate experiment compares denitrification rates of selected strains on different growth media (substrates). For parts 1 and 2, linear models are fit comparing N2O production between strains and/or substrates. (3) The Bioscreen growth assay tests whether there is a pleiotropic effect of the NOR1 gene. In this portion of the analysis, growth curves are fit to assess differences in growth rate and carrying capacity between selected strains with and without the NOR1 gene. (4) Phylogenetic analysis of 198 FvNOR1-like proteins and predicted orthologs was performed to better understand p450nor sequence conservation in the fungal kingdom. (5) Motif conservation tests to see if a hypothetical NOR1 inhibitor would have broad- or narrow-spectrum efficacy. We evaluated which of the 198 sequences had a cytochrome P450 motif; 159 out of 198 sequences were predicted to have the motif signature. We extracted the motif from the MAFFT alignment and created a sequence logo to interpret sequence conservation at and around the active site.CodeThe code is contained in R scripts, bash scripts, and RMarkdown notebooks. There are five components to the analysis: the denitrification analysis (comprising parts 1 and 2 described above), the Bioscreen growth analysis (part 3), the phylogenetic analysis (part 4), and analysis of results of motif conservation (part 5). The scripts for each are listed and described below.Analysis of results of denitrification experiments (parts 1 and 2)NOR1_denitrification_analysis.Rmd: The R code to analyze the experimental data comparing nitrous oxide emissions is all contained in a single RMarkdown notebook. This script analyzes the results from the comparative study and the substrate study.n2o_subgroup_figures.R: R script to create additional figures using the output from the RMarkdown notebookAnalysis of results of Bioscreen growth assay (part 3)bioscreen_analysis.Rmd: This RMarkdown notebook contains all R code needed to analyze the results of the Bioscreen assay comparing growth of the different strains. It could be run as is. However, the model-fitting portion was run on a high-performance computing cluster with the following scripts:bioscreen_fit_simpler.R: R script containing only the model-fitting portion of the Bioscreen analysis, fit using the Stan modeling language interfaced with R through the brms and cmdstanr packages.job_bssimple.sh: Job submission shell script used to submit the model-fitting R job to be run on USDA SCINet high-performance computing cluster.Analysis of results of NOR1 phylogeny (part 4)Run_MAFFT_NOR1_B6_Final.sh: This shell (.sh) script was used to submit a job on the Georgia Advanced Computing Resource Center (GACRC) Sapelo2 Cluster to run the MAFFT (v 7.487) multiple sequence alignment tool on 198 sequences for phylogenetic analyses. All shell scripts described below were run on the GACRC Sapelo2 Cluster. No jobs were run locally.Run_IQTREE_NOR1_Geneious_ModelFinder_v2_198.sh: This shell (.sh) script uses the output from Data S5 (i.e., Data S6) to run the phylogenetic software IQ-TREE (v.1.6.12). This script uses the option “-m MFP” which tells IQ-TREE to perform extended model selection followed by tree inference. The “-nt AUTO” option is also used to automatically detect the number of cores/threads. The .treefile produced as an output of this program was not used in subsequent analyses.IQTREE_NOR1_Phylogeny_JTTDCMut_R5_100_v2.sh: This shell (.sh) script instructs IQ-TREE to use the MAFFT sequence alignment (Data S6) to build a maximum-likelihood (ML) phylogenetic tree. This script uses the option “-m JTTDCMut+R5” which tells IQ-TREE to use the JTTDCMut+R5 substitution model to evaluate relationships between sequences. The “-b 100” option determines the number of replicates that will be evaluated using standard non-parametric bootstrap estimation.IQTREE_NOR1_Phylogeny_JTTDCMut_R5_Ultrafast_ALRT_BNNI_1000_v2.sh: This shell (.sh) script instructs IQ-TREE to use the MAFFT sequence alignment (Data S6) to build a maximum-likelihood (ML) phylogenetic tree. The filename of the MAFFT alignment is “NOR1_B6_Alignment_Ultrafast.fasta”, but it is exact same file as Data S6. As in Data S8, this script uses the option “-m JTTDCMut+R5”. The “-bb 1000” option determines the number of replicates that will be evaluated using ultrafast bootstrap (UFBoot). The “-alrt 1000” option determines the number of replicates that will be used for the SH-like approximate likelihood ratio test (SH-aLRT), a single branch test.IQTREE_NOR1_Phylogeny_JTTDCMut_R5_UFBoot_SH-aLRT_10000_v1.sh: This shell (.sh) script is almost the same as Data S9. The only differences between the two scripts is that this script has 10,000 replicates for the -bb and -alrt options.Analysis of results of motif conservation (part 5)No scripts are included for this part of the analysis.Also note the file gtstyle.css (stylesheet for formatting the tables in the notebooks) is included.DataData required to run the analysis scripts are archived in this dataset, other than strain_lookup.csv, a lookup table of strain abbreviations and full names included for convenience. They should be placed in a folder or symbolic link called project within the directory where the code is run. The data are contained in a series of .zip archives.Due to length constraints the contents of each of the .zip archives are described in detail in the file zip_archive_contents_description.pdf.In addition a file is included that is used for component 5 of the analysis (analysis of results of motif conservation):Extraction of 159 annotations from Protein_alignment_v7.fasta: This FAST-All (FASTA) file contains the NCBI accession ID, protein annotation and species name for 159 aligned sequences used in the motif analysis and sequence logo creation. Sequence data contained within is the cytochrome P450 cysteine heme-iron ligand signature (CYTOCHROME P450; PS00086). The consensus pattern for this motif signature is: [FW]-[SGNH]-x-[GD]-{F}-[RKHPT]-{P}-C-[LIVMFAP]-[GAD].
Complete Metadata
| @type | dcat:Dataset |
|---|---|
| accessLevel | public |
| bureauCode |
[
"005:18"
]
|
| contactPoint |
{
"fn": "Read, Quentin D.",
"hasEmail": "mailto:quentin.read@usda.gov"
}
|
| description | <h3>Brief description of study aims</h3><p dir="ltr">Fungal denitrification is a biogeochemical process that releases nitrous oxide (N<sub>2</sub>O), a potent greenhouse gas. The NOR1 gene is part of the denitrification pathway in <i>Fusarium</i>. Four experiments were conducted for this study. (1) The N<sub>2</sub>O comparative experiment compares denitrification rates, as measured by N<sub>2</sub>O production, of a variety of <i>Fusarium</i> spp. strains with and without the NOR1 gene. (2) The N2O substrate experiment compares denitrification rates of selected strains on different growth media (substrates). For parts 1 and 2, linear models are fit comparing N<sub>2</sub>O production between strains and/or substrates. (3) The Bioscreen growth assay tests whether there is a pleiotropic effect of the NOR1 gene. In this portion of the analysis, growth curves are fit to assess differences in growth rate and carrying capacity between selected strains with and without the NOR1 gene. (4) Phylogenetic analysis of 198 FvNOR1-like proteins and predicted orthologs was performed to better understand p450nor sequence conservation in the fungal kingdom. (5) Motif conservation tests to see if a hypothetical NOR1 inhibitor would have broad- or narrow-spectrum efficacy. We evaluated which of the 198 sequences had a cytochrome P450 motif; 159 out of 198 sequences were predicted to have the motif signature. We extracted the motif from the MAFFT alignment and created a sequence logo to interpret sequence conservation at and around the active site.</p><h3>Code</h3><p dir="ltr">The code is contained in R scripts, bash scripts, and RMarkdown notebooks. There are five components to the analysis: the denitrification analysis (comprising parts 1 and 2 described above), the Bioscreen growth analysis (part 3), the phylogenetic analysis (part 4), and analysis of results of motif conservation (part 5). The scripts for each are listed and described below.</p><h4>Analysis of results of denitrification experiments (parts 1 and 2)</h4><ul><li><b>NOR1_denitrification_analysis.Rmd</b>: The R code to analyze the experimental data comparing nitrous oxide emissions is all contained in a single RMarkdown notebook. This script analyzes the results from the comparative study and the substrate study.</li><li><b>n2o_subgroup_figures.R</b>: R script to create additional figures using the output from the RMarkdown notebook</li></ul><h4>Analysis of results of Bioscreen growth assay (part 3)</h4><ul><li><b>bioscreen_analysis.Rmd</b>: This RMarkdown notebook contains all R code needed to analyze the results of the Bioscreen assay comparing growth of the different strains. It could be run as is. However, the model-fitting portion was run on a high-performance computing cluster with the following scripts:</li><li><ul><li><b>bioscreen_fit_simpler.R</b>: R script containing only the model-fitting portion of the Bioscreen analysis, fit using the Stan modeling language interfaced with R through the brms and cmdstanr packages.</li><li><b>job_bssimple.sh</b>: Job submission shell script used to submit the model-fitting R job to be run on USDA SCINet high-performance computing cluster.</li></ul></li></ul><h4>Analysis of results of NOR1 phylogeny (part 4)</h4><ul><li><b>Run_MAFFT_NOR1_B6_Final.sh</b>: This shell (.sh) script was used to submit a job on the Georgia Advanced Computing Resource Center (GACRC) Sapelo2 Cluster to run the MAFFT (v 7.487) multiple sequence alignment tool on 198 sequences for phylogenetic analyses. All shell scripts described below were run on the GACRC Sapelo2 Cluster. No jobs were run locally.</li><li><b>Run_IQTREE_NOR1_Geneious_ModelFinder_v2_198.sh</b>: This shell (.sh) script uses the output from Data S5 (i.e., Data S6) to run the phylogenetic software IQ-TREE (v.1.6.12). This script uses the option “-m MFP” which tells IQ-TREE to perform extended model selection followed by tree inference. The “-nt AUTO” option is also used to automatically detect the number of cores/threads. The .treefile produced as an output of this program was not used in subsequent analyses.</li><li><b>IQTREE_NOR1_Phylogeny_JTTDCMut_R5_100_v2.sh</b>: This shell (.sh) script instructs IQ-TREE to use the MAFFT sequence alignment (Data S6) to build a maximum-likelihood (ML) phylogenetic tree. This script uses the option “-m JTTDCMut+R5” which tells IQ-TREE to use the JTTDCMut+R5 substitution model to evaluate relationships between sequences. The “-b 100” option determines the number of replicates that will be evaluated using standard non-parametric bootstrap estimation.</li><li><b>IQTREE_NOR1_Phylogeny_JTTDCMut_R5_Ultrafast_ALRT_BNNI_1000_v2.sh</b>: This shell (.sh) script instructs IQ-TREE to use the MAFFT sequence alignment (Data S6) to build a maximum-likelihood (ML) phylogenetic tree. The filename of the MAFFT alignment is “NOR1_B6_Alignment_Ultrafast.fasta”, but it is exact same file as Data S6. As in Data S8, this script uses the option “-m JTTDCMut+R5”. The “-bb 1000” option determines the number of replicates that will be evaluated using ultrafast bootstrap (UFBoot). The “-alrt 1000” option determines the number of replicates that will be used for the SH-like approximate likelihood ratio test (SH-aLRT), a single branch test.</li><li><b>IQTREE_NOR1_Phylogeny_JTTDCMut_R5_UFBoot_SH-aLRT_10000_v1.sh</b>: This shell (.sh) script is almost the same as Data S9. The only differences between the two scripts is that this script has 10,000 replicates for the -bb and -alrt options.</li></ul><h4>Analysis of results of motif conservation (part 5)</h4><p dir="ltr">No scripts are included for this part of the analysis.</p><p dir="ltr">Also note the file <b>gtstyle.css</b> (stylesheet for formatting the tables in the notebooks) is included.</p><h3>Data</h3><p dir="ltr">Data required to run the analysis scripts are archived in this dataset, other than <b>strain_lookup.csv</b>, a lookup table of strain abbreviations and full names included for convenience. They should be placed in a folder or symbolic link called <b>project</b> within the directory where the code is run. The data are contained in a series of .zip archives.</p><p dir="ltr">Due to length constraints the contents of each of the .zip archives are described in detail in the file <b>zip_archive_contents_description.pdf</b>.</p><p dir="ltr">In addition a file is included that is used for component 5 of the analysis (analysis of results of motif conservation):</p><p dir="ltr"><br></p><ul><li><b>Extraction of 159 annotations from Protein_alignment_v7.fasta</b>: This FAST-All (FASTA) file contains the NCBI accession ID, protein annotation and species name for 159 aligned sequences used in the motif analysis and sequence logo creation. Sequence data contained within is the cytochrome P450 cysteine heme-iron ligand signature (CYTOCHROME P450; PS00086). The consensus pattern for this motif signature is: [FW]-[SGNH]-x-[GD]-{F}-[RKHPT]-{P}-C-[LIVMFAP]-[GAD].</li></ul><p dir="ltr"><br></p> |
| distribution |
[
{
"@type": "dcat:Distribution",
"title": "N2O_data_2022-08-03.zip",
"format": "zip",
"mediaType": "application/zip",
"downloadURL": "https://ndownloader.figshare.com/files/58269820"
},
{
"@type": "dcat:Distribution",
"title": "Outliers_NOR1_2022.zip",
"format": "zip",
"mediaType": "application/zip",
"downloadURL": "https://ndownloader.figshare.com/files/58269823"
},
{
"@type": "dcat:Distribution",
"title": "clean_data.zip",
"format": "zip",
"mediaType": "application/zip",
"downloadURL": "https://ndownloader.figshare.com/files/58269826"
},
{
"@type": "dcat:Distribution",
"title": "phylo.zip",
"format": "zip",
"mediaType": "application/zip",
"downloadURL": "https://ndownloader.figshare.com/files/58269832"
},
{
"@type": "dcat:Distribution",
"title": "zip_archive_contents_description.pdf",
"format": "pdf",
"mediaType": "application/pdf",
"downloadURL": "https://ndownloader.figshare.com/files/58269856"
},
{
"@type": "dcat:Distribution",
"title": "NOR1_denitrification_analysis.Rmd",
"format": "Rmd",
"mediaType": "text/plain",
"downloadURL": "https://ndownloader.figshare.com/files/58269931"
},
{
"@type": "dcat:Distribution",
"title": "n2o_subgroup_figures.R",
"format": "R",
"mediaType": "text/plain",
"downloadURL": "https://ndownloader.figshare.com/files/58269934"
},
{
"@type": "dcat:Distribution",
"title": "bioscreen_analysis.Rmd",
"format": "Rmd",
"mediaType": "text/plain",
"downloadURL": "https://ndownloader.figshare.com/files/58269937"
},
{
"@type": "dcat:Distribution",
"title": "bioscreen_fit_simpler.R",
"format": "R",
"mediaType": "text/plain",
"downloadURL": "https://ndownloader.figshare.com/files/58269940"
},
{
"@type": "dcat:Distribution",
"title": "job_bssimple.sh",
"format": "sh",
"mediaType": "text/x-shellscript",
"downloadURL": "https://ndownloader.figshare.com/files/58269943"
},
{
"@type": "dcat:Distribution",
"title": "gtstyle.css",
"format": "css",
"mediaType": "text/plain",
"downloadURL": "https://ndownloader.figshare.com/files/58269946"
},
{
"@type": "dcat:Distribution",
"title": "strain_lookup.csv",
"format": "csv",
"mediaType": "text/csv",
"downloadURL": "https://ndownloader.figshare.com/files/58269949"
},
{
"@type": "dcat:Distribution",
"title": "Run_MAFFT_NOR1_B6_Final.sh",
"format": "sh",
"mediaType": "text/x-shellscript",
"downloadURL": "https://ndownloader.figshare.com/files/58269952"
},
{
"@type": "dcat:Distribution",
"title": "Run_IQTREE_NOR1_Geneious_ModelFinder_v2_198.sh",
"format": "sh",
"mediaType": "text/x-shellscript",
"downloadURL": "https://ndownloader.figshare.com/files/58269955"
},
{
"@type": "dcat:Distribution",
"title": "IQTREE_NOR1_Phylogeny_JTTDCMut_R5_UFBoot_SH-aLRT_10000_v1.sh",
"format": "sh",
"mediaType": "text/x-shellscript",
"downloadURL": "https://ndownloader.figshare.com/files/58269973"
},
{
"@type": "dcat:Distribution",
"title": "IQTREE_NOR1_Phylogeny_JTTDCMut_R5_Ultrafast_ALRT_BNNI_1000_v2.sh",
"format": "sh",
"mediaType": "text/x-shellscript",
"downloadURL": "https://ndownloader.figshare.com/files/58269976"
},
{
"@type": "dcat:Distribution",
"title": "IQTREE_NOR1_Phylogeny_JTTDCMut_R5_100_v2.sh",
"format": "sh",
"mediaType": "text/x-shellscript",
"downloadURL": "https://ndownloader.figshare.com/files/58269979"
},
{
"@type": "dcat:Distribution",
"title": "Extraction of 159 annotations from Protein_alignment_v7.fasta",
"format": "fasta",
"mediaType": "text/plain",
"downloadURL": "https://ndownloader.figshare.com/files/58269985"
}
]
|
| identifier | 10.15482/USDA.ADC/30210112.v1 |
| keyword |
[
"Fusarium graminearum",
"Fusarium oxysporum",
"Fusarium verticillioides",
"denitrification",
"greenhouse gas emissions",
"nitrous oxide",
"source code"
]
|
| license | https://creativecommons.org/publicdomain/zero/1.0/ |
| modified | 2025-11-20 |
| programCode |
[
"005:040"
]
|
| publisher |
{
"name": "Agricultural Research Service",
"@type": "org:Organization"
}
|
| temporal | 2021-12-08/2022-03-08 |
| title | Data and code from: A nitric oxide reductase is a key enzyme target for eliminating fungal emissions of nitrous oxide |