Code used to produce terms list in the work "NLP-Driven Electron Microscopy Ontology Development"
This is a collection of code written by Maurice Curran that was used to process the Microscopy and Microanalysis conference proceeding corpus into word products described in the publication "NLP-Driven Electron Microscopy Ontology Development". The scripts are written in Python, to be used in the following order:1. SettingUpTextFiles.py and CopyingText.py to get the raw text files; 2. SentenceConversion.py; 3. reference_remover.py; 4. testing.py and testingavg.py; 5. SentenceCreator.py; 6. matscholar_model.py to get matscholar tags; 7. training_model_gensim.py to get gensim model;8. word2vecscript.py and gensim_visual.py;
Complete Metadata
| @type | dcat:Dataset |
|---|---|
| accessLevel | public |
| accrualPeriodicity | irregular |
| bureauCode |
[
"006:55"
]
|
| contactPoint |
{
"fn": "June W. Lau",
"hasEmail": "mailto:june.lau@nist.gov"
}
|
| description | This is a collection of code written by Maurice Curran that was used to process the Microscopy and Microanalysis conference proceeding corpus into word products described in the publication "NLP-Driven Electron Microscopy Ontology Development". The scripts are written in Python, to be used in the following order:1. SettingUpTextFiles.py and CopyingText.py to get the raw text files; 2. SentenceConversion.py; 3. reference_remover.py; 4. testing.py and testingavg.py; 5. SentenceCreator.py; 6. matscholar_model.py to get matscholar tags; 7. training_model_gensim.py to get gensim model;8. word2vecscript.py and gensim_visual.py; |
| distribution |
[
{
"title": "NLP code to produce words about electron microscopy",
"mediaType": "application/zip",
"description": "This zip file contains a set of scripts that extracts frequently occurring words from the conference proceedings of Microscopy & Microanalysis between the years of 2002 and 2019.",
"downloadURL": "https://data.nist.gov/od/ds/ark:/88434/mds2-3198/PythonFiles_Maurice_clean.zip"
}
]
|
| identifier | ark:/88434/mds2-3198 |
| issued | 2024-09-05 |
| keyword |
[
"NLP",
"Natural language processing",
"controlled vocabulary",
"electron microscopy",
"ontology"
]
|
| landingPage | https://data.nist.gov/od/id/mds2-3198 |
| language |
[
"en"
]
|
| license | https://www.nist.gov/open/license |
| modified | 2021-12-31 00:00:00 |
| programCode |
[
"006:045"
]
|
| publisher |
{
"name": "National Institute of Standards and Technology",
"@type": "org:Organization"
}
|
| references |
[
"https://doi.org/10.1007/s40192-024-00378-y"
]
|
| theme |
[
"Information Technology:Data and informatics",
"Materials:Materials characterization",
"Materials:Modeling and computational material science"
]
|
| title | Code used to produce terms list in the work "NLP-Driven Electron Microscopy Ontology Development" |