About

The Computer Science Ontology (CSO) is a large-scale ontology of research areas that was automatically generated using the Klink-2 algorithm [1] on the Rexplore dataset [2], which consists of about 16 million publications, mainly in the field of Computer Science. The Klink-2 algorithm combines semantic technologies, machine learning, and knowledge from external sources to automatically generate a fully populated ontology of research areas. Some relationships were also revised manually by experts during the preparation of two ontology-assisted surveys in the field of Semantic Web and Software Architecture. The main root of CSO is Computer Science, however, the ontology includes also a few secondary roots, such as Linguistics, Geometry, Semantics, and so on.

CSO presents two main advantages over manually crafted categorisations used in Computer Science (e.g., 2012 ACM Classification, Microsoft Academic Search Classification). First, it can characterise higher-level research areas by means of hundreds of sub-topics and related terms, which enables to map very specific terms to higher-level research areas. Secondly, it can be easily updated by running Klink-2 on a set of new publications. A more comprehensive discussion of the advantages of adopting an automatically generated ontology in the scholarly domain can be found in [3].

Data Model

The CSO model is an extension of SKOS. It includes eight semantic relations:

relatedEquivalent, which indicates that two topics can be treated as equivalent for the purpose of exploring research data (e.g., Ontology Matching, Ontology Mapping). This predicate is referred as alternative label of.
superTopicOf, which indicates that a topic is a sub-area of another one (e.g., Linked Data, Semantic Web). This predicate is referred as parent of.
contributesTo, which indicates that the research outputs of one topic contributes to another. For instance, research in Ontology Engineering contributes to the Semantic Web, but arguably Ontology Engineering is not a sub-area of the Semantic Web – that is, there is plenty of research in Ontology Engineering. This predicate is not visible in the explorer.
preferentialEquivalent, this relation is used to state the main label for topics belonging to a cluster of relatedEquivalent. For instance, the topics ontology and ontologies will both have their preferentialEquivalent set to ontology. This predicate is not visible in the explorer.
rdf:type, this relation is used to state that a resource is an instance of a class. For example, a resource in our ontology is an instance of topic. This predicate is not visible in the explorer.
rdfs:label, this relation is used to provide a human-readable version of a resource’s name. This predicate is not visible in the explorer.
owl:sameAs, which lists entities from other knowledge graphs from the Linked Open Data Cloud (DBpedia, Freebase, Wikidata, YAGO, and Cyc) that refer to the same concepts.
schema:relatedLink, which links CSO concepts to related web pages that either describe the research topics (Wikipedia articles) or provide additional information about the research domains (Microsoft Academic).

Resource Exploration

Each resource is available at its own URI. For instance, the resource 'semantic web' is browsable at the URI https://cso.kmi.open.ac.uk/topics/semantic_web.

The CSO Portal allows to negotiate the content to serve different representations of the same resource (URI), with the following formats:

HTML. The resource is explorable via browser (i.e., by clicking here).
RDF/XML. Specifying the Accept header application/rdf+xml, or adding rdf or xml as extension to the resource, e.g., semantic web.rdf
Turtle. Specifying the Accept header text/turtle, or adding ttl as extension to the resource, e.g., semantic web.ttl
JSON-LD. Specifying the Accept header application/json or application/ld+json, or adding json or jsonld as extension to the resource, e.g., semantic web.json
N-Triples. Specifying the Accept header application/n-triples, or adding nt as extension to the resource, e.g., semantic web.nt

Details:

Format	Header	Resource
HTML	-	semantic web
RDF/XML	application/rdf+xml	semantic web.rdf or semantic web.xml
Turtle	text/turtle	semantic web.ttl
JSON-LD	application/json or application/ld+json	semantic web.json or semantic web.jsonld
N-Triples	application/n-triples	semantic web.nt

CSO Uptake

CSO was officially released in 2019 and has been already adopted by several major organizations, including Springer Nature.
In the last two year, CSO supported the creation of many innovative applications and technologies, including ontology-driven topic models (e.g., CoCoNoW (Beck et al., 2020)), recommender systems for articles (e.g., SBR (Thanapalasingam et al., 2018)) and video lessons (Borges & dos Reis, 2019), visualisation frameworks (e.g., ScholarLensViz (Loffler et al., 2020), ConceptScope (Zhang et al., 2021)), temporal knowledge graphs (e.g., TGK (Rossanez et al., 2020)), NLP frameworks for entity extraction (Dessi et al., 2021), tools for identifying domain experts (e.g., VeTo (Vergoulis et al., 2020)), and systems for predicting academic impact (e.g., ArtSim (Chatzopoulos et al., 2020)).
It was also used for several large-scale analyses of the literature (e.g., Cloud Computing (Lula et al., 2021), Software Engineering (Chicaiza & Re ategui, 2020), Ecuadorian publications (Chicaiza & Reategui, 2020)).

References

Beck, M., Rizvi, S. T. R., Dengel, A., & Ahmed, S. (2020). Fromautomatic keyword detection to ontology-based topic modeling. In International workshop on document analysis systems (pp.451–465). doi: 10.1007/978-3-030-57058-3 32
Thanapalasingam, T., Osborne, F., Birukou, A., & Motta, E.(2018). Ontology-based recommendation of editorial products.In D. Vrandecic et al. (Eds.), The semantic web – iswc 2018 (pp.341–358). Cham: Springer Int. Publishing
Borges, M. V. M., & dos Reis, J. C. (2019). Semantic-enhanced recommendation of video lectures. In 2019 ieee 19th internationalconference on advanced learning technologies (icalt) (Vol. 2161, pp.42–46). doi: 10.1109/ICALT.2019.00013
Loffler, F., Wesp, V., Babalou, S., Kahn, P., Lachmann, R., Sateli, B., Konig-Ries, B. (2020). Scholarlensviz: A visualization framework for transparency in semantic user profiles. InK. Taylor, R. Goncalves, F. Lecue, & J. Yan (Eds.), Proceedingsof the iswc 2020 demos and industry tracks: From novel ideas toindustrial practice co-located with 19th international semantic webconference (iswc 2020), globally online, november 1-6, 2020 (utc).
Zhang, X., Chandrasegaran, S., & Ma, K.-L. (2021). Conceptscope:Organizing and visualizing knowledge in documents based ondomain ontology. In Proceedings of the 2021 chi conference on human factors in computing systems (pp. 1–13).
Rossanez, A., dos Reis, J. C., & da Silva Torres, R. (2020). Representing scientific literature evolution via temporal knowledgegraphs.
Dessi, D., Osborne, F., Recupero, D. R., Buscaldi, D., & Motta, E.(2021). Generating knowledge graphs by employing naturallanguage processing and machine learning techniques withinthe scholarly domain. Future Generation Computer Systems, 116,253–264. doi: 10.1016/j.future.2020.10.026
Vergoulis, T., Chatzopoulos, S., Dalamagas, T., & Tryfonopoulos,C. (2020a). Veto: Expert set expansion in academia. In M. Hall,T. Mercun, T. Risse, & F. Duchateau (Eds.), Digital libraries foropen knowledge (pp. 48–61). Cham: Springer International Publishing. doi: 10.1007/978-3-030-54956-5 4
Chatzopoulos, S., Vergoulis, T., Kanellos, I., Dalamagas, T., &Tryfonopoulos, C. (2020a). Artsim: improved estimation ofcurrent impact for recent articles. In Adbis, tpdl and eda 2020common workshops and doctoral consortium (pp. 323–334). doi:10.1007/978-3-030-55814-7 27
Lula, P., Dospinescu, O., Homocianu, D., & Sireteanu, N.-A.(2021). An advanced analysis of cloud computing conceptsbased on the computer science ontology. Computers, Materials& Continua, 66(3), 2425–2443. doi: 10.32604/cmc.2021.013771
Chicaiza, J., & Reategui, R. (2020). Using domain ontologies fortext classification. a use case to classify computer science papers. In Iberoamerican knowledge graphs and semantic web conference (pp. 166–180). doi: 10.1007/978-3-030-65384-2 13

Applications

Smart Topic Miner. The Smart Topic Miner (STM) [4] is a tool which uses semantic web technologies to classify scholarly publications on the basis of a very large automatically generated ontology of research areas. It was developed to support the Springer Nature Computer Science editorial team in classifying proceedings. A demo of the system is available at http://stm-demo.kmi.open.ac.uk/.

Smart Book Recommender. The Smart Book Recommender (SBR) [5] is a semantic application designed to support the Springer Nature editorial team in promoting their publications at Computer Science venues. It takes as input the proceedings of a conference and suggests books, journals, and other conference proceedings which are likely to be relevant to the attendees of the conference in question. A demo of the system is available at http://rexplore.kmi.open.ac.uk/SBR_demo/.

Rexplore. Rexplore [2] is a system which leverages novel solutions in large-scale data mining, semantic technologies and visual analytics, to provide an innovative environment for exploring and making sense of scholarly data.

EDAM methodology. EDAM [6] is a novel expert-driven automatic methodology for creating Systematic Reviews that keep human experts in the loop, but does not require them to check all papers included in the analysis.

Research Communities Map Builder. Temporal Semantic Topic-Based Clustering (TST) [7, 8] is an approach for detecting research communities by clustering researchers according to their research trajectories, defined as distributions of topics over time.

The CSO Classifier. The CSO Classifier [9] is an unsupervised approach for automatically classifying research papers according to the Computer Science Ontology. The classifier takes as input the metadata of a research paper (usually title, abstract, and keywords) and returns a set of research topics drawn from the ontology. Try it out.

Academic/Industry DynAmics Knowledge Graph. The Academia/Industry DynAmics (AIDA) Knowledge Graph [9] describes 21M publications and 8M patents according to the research topics drawn from the Computer Science Ontology. 5.1M publications and 5.6M patents are further characterized according to the type of the author's affiliations (academy, industry, or collaborative) and 66 industrial sectors (e.g., automotive, financial, energy, electronics) organized in a two-level taxonomy.

People

Steering Committee

Aliaksandr
Birukou

Executive Editor, Springer-Verlag GmbH

Enrico
Motta

Professor of Knowledge Technologies

Francesco
Osborne

Research Fellow

Team

Enrico
Motta

Professor of Knowledge Technologies

Francesco
Osborne

Research Fellow

Angelo
Salatino

Research Associate

Alumni

Andrea
Mannocci

Research Associate

Thiviyan
Thanapalasingam

Research Assistant

How to Cite CSO

Please cite the following paper:

Salatino, Angelo A., Thiviyan Thanapalasingam, Andrea Mannocci, Francesco Osborne, and Enrico Motta. "The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas." International Semantic Web Conference 2018, Monterey (CA), USA, 2018. http://oro.open.ac.uk/55484/

Relevant Papers

[1] Osborne, F. and Motta, E. (2015) Klink-2: Integrating Multiple Web Sources to Generate Semantic Topic Networks, International Semantic Web Conference 2015, Bethlehem, Pennsylvania, USA

[2] Osborne, F., Motta, E. and Mulholland, P. (2013) Exploring Scholarly Data with Rexplore, International Semantic Web Conference, Sydney, Australia

[3] Osborne, F. and Motta, E. (2012) Mining Semantic Relations between Research Areas, International Semantic Web Conference, Boston, MA

[4] Osborne, F., Salatino, A., Birukou, A. and Motta, E. (2016) Automatic Classification of Springer Nature Proceedings with Smart Topic Miner. International Semantic Web Conference 2016, Kobe, Japan. – slides

[5] Osborne, F., Birukou, A., Thanapalasingam, T. , and Motta, E. (2017) Smart Book Recommender: A Semantic Recommendation Engine for Editorial Products. International Semantic Web Conference 2017, Poster Track. Vienna, Austria.

[6] Osborne, F., Lago, P., Muccini, H., Motta, E. (2018) Reducing the Effort for Systematic Reviews in Software Engineering.

[7] Osborne, F., Scavo, G. and Motta, E. (2014) A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities, EKAW 2014, Linkoping, Sweden.

[8] Osborne, F., Scavo, G. and Motta, E. (2014) Identifying diachronic topic-based research communities by clustering shared research trajectories, Extended Semantic Web Conference 2014, Crete, Greece.

[9] Salatino, A.; Osborne, F.; Thanapalasingam, T. and Motta, E. (2019) The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles., Theory and Practice of Digital Libraries, Oslo, Norway.

[10] Angioni, S.; Salatino, A.; Osborne, F.; Reforgiato Recupero, D. and Motta, E. (2020) Integrating Knowledge Graphs for Analysing Academia and Industry Dynamics, Workshop on Scientific Knowledge Graphs 2020, Lyon, France.

License

This work is licensed under a Creative Commons Attribution 4.0 International License.