To foster research into COVID-19 virus, we make the COVID-19 Open Research Dataset created by the Allen Institute for Artificial Intelligence available in a simplified, beta version of AVOBMAT.
It can mainly help with:
Critical and interactive analysis of bibliographic metadata AND texts with data-driven and NLP methods supported by AI techniques in a number of languages
In AVOBMAT the different type of searches are:
TIP: To start a new search you should reload the page (F5).
You can specify the Date or Date range of the publications by using On, Before, After & Between.
Please note that the months and days of the publications are not provided in all the articles in the original COVID-19 database.
It helps you find words within a specific word distance.
The number (N) specifies that one word must occur within N words of the other in the documents.
Example: if the value of proximity is set to 10 and you enter vaccine immunity in the Entire document field, the search results will have documents in which the terms vaccine and immunity appear within 10-word distance.
If the Order option remains ticked, vaccine precedes immunity in the search results.
TIP: you can consult the explanation of the different functions by moving your mouse on information icon.
You can formulate complex queries by using the Lucene syntax.
It includes the use of wildcards (e.g. *,?) and regular expressions.
You can use the abbreviated versions of the metadata fields in the queries as listed on the right.
Example:
(YR:[2017 TO 2020]) AND (FT:chloroquine OR FT:ivermectin) AND AB:coronavirus*
Metadata field | Abbreviation |
Authors | AU |
Publication Year | YR |
Title | TI |
Journal | PUB |
Abstract Note | AB |
Entire document | FT |
Detected language | DLA |
TIP: If needed, click Auto-format to display the full text in a more reader-friendly format.
Tip: you can rearrange the results by Publication years (ascending / descending) and Authors (alphabetical).
You can perform three different types of analysis:
TIP: The results are displayed in Bar charts too.
For this just click “Bar chart” in the top left corner.
The significant text analysis & visualization highlights the most related terms to a special query.
If you filter the COVID-19 database, for example, by a keyword search in the Abstracts, this tool highlights the words that are most strongly related to this selected subset of documents compared to the entire COVID-19 database: what are the unique words characteristic of this subset?
You can set the following parameters:
Example:
Bar chart view of significant text:
TIP: you can export the data and the wordclouds.
Example:
This wordcloud shows the words occuring within 3-word distance of the term chloroquine.
TIP: you can switch to “Bar chart” and export the results.
N.B. In future releases stopwords (e.g. the, and) will be removed.
This visualization shows the most frequent words in your filtered documents.
Example:
Most frequent words in 493 articles mentioning the word chloroquine.
Example:
TIP: if you move your mouse to a particular point on the functions, it will dispay the yearly count of your search term or the percentage in the normalized view.
Example of normalised view:
TIP: You can export the image and the data and move back to aggregated view:
The parameters include:
Example:
Search query:
Topic modeling parameters:
Here are the results in the form of 20 topic clusters:
[0] antibody cell protein parasite sequence blood bind red igg gene
[1] sample method dna pcr time parasite positive detection lamp detect
[2] protein serum figure analysis control malaria expression perform identify table
[3] patient malaria infection day diagnosis treatment severe fever test blood
[4] vaccine disease infection case country cause death vaccination health immunization
[5] disease water increase human change mosquito climate affect cholera cause
[6] china health africa aids international global hiv development program chinese
[7] clinical blood product development study trial use technology potential need
[8] health public disease country surveillance system datum information control include
[9] vaccine response cell antigen vector immune adjuvant protein induce specific
[10] have stillbirth road ebola network municipality dengue node community epidemic
[11] malaria case report control area study datum high population numb
[12] compound acid activity hcv fig peptide enzyme active structure amino
[13] cell virus chloroquine infection treatment viral effect host infect drug
[14] disease study research infectious identify pathogen analysis country laboratory publish
[15] child risk high age low year associate fever rate health
[16] mouse fig lung parasite level study show day live animal
[17] model disease numb human population infect individual infection transmission mosquito
[18] travel infection traveler fever case return disease traveller include dengue
[19] species host include occur range common parasite infection tissue blood
Please note that the WORDS in the original articles are LEMMATIZED (dictionary form). Example: patients become patient.
Interpretations of some topic clusters:
[1] sample method dna pcr time parasite positive detection lamp detect
This topic refers to methods such as PCR and LAMP to detect malaria by DNA amplification.
[3] patient malaria infection day diagnosis treatment severe fever test blood
Articles in this cluster are related to the treatment and diagnosis of malaria among patients having fever.
You can see the list of articles belonging to each topic cluster by clicking on Topic documents.
The percentage in sqaure brackets shows the probability of the selected topic (bold) in the article.
In this example several articles are related to the LAMP detective method.
Option: removing unnecessary words
Very frequent words are no use for topic models. You can interactively remove them. For this you should click on the Vocabulary icon.
TIP: Do NOT forget to re-run the iterations after removing the stopwords.
Topics that occur together more than expected are blue, topics that occur together less than expected are red.
You can visualize the distribution of topics over time.
Aggregated:
Tip: You can interactively remove a topic by clicking on its colour.
Normalised:
You can display the results in normalized mode.
Tip: If you click on the point of the function, the topic with the probability value is displayed.
Click on “Metadata visualization” in menu bar.
Example:
Search term: chloroquine in the Abstracts
Network of chloroquine-related publications:
TIP: You can download the network and the data.
Top five authors in the COVID-19 dataset in 2020 and the journals where they published their papers:
N.B. Missing values in the Authors and Journal fields were excluded.
You have reached the end of the AVOBMAT help page. If you have any more questions, don’t hesitate to contact us.