KDD-2022 Workshop on Data-Driven Science of Science


Washington D.C. 8am-12pm EDT, August 15, 2022

***Join online via Zoom: https://zoom.uts.edu.au/j/83652811623 ***

Welcome to KDD-2022 Workshop on Data-Driven Science of Science

The number of scientific researches and the volume of practitioners has grown exponentially over the past decades. Growth in research articles, conference papers, posters, patents, preprints, science and technology reports, and informal content on the Web has created “big data”, which provides opportunities and challenges to people in both academia and industry. At the same time, the development of science of science has been facilitated by an influx of researchers from natural, computational and social sciences. Researchers with diverse and interdisciplinary backgrounds developed big data-based abilities for empirical and generative modeling to advance knowledge frontier in science of science.

Compared with traditional bibliometric studies, the current surge in science of science research is distinguished by different features. First of all, large-scale datasets are heavily relied on, many of which contain millions of authors, papers, and their citations . Additionally, research themes are diverse, ranging from scientific career to rising star detection, from knowledge diffusion to successful innovation, from computational hypotheses generation to data-driven knowledge discovery; and the applied domains are varied, from social science to natural science, from information science to computer science, from economics to management science, from mathematics to physics, etc. Moreover, advanced computational algorithms (e.g., deep learning-based strategies) have been widely adopted in current data-driven science of science studies.

The main target of science of science research is to explore the evolution pattern of the general science and specific scientific community. Most of the related works focus on developing a predictive model of the research impact, identifying the driving force of scientific progress, which in turn improve and optimize the formulation and implementation of scientific policy. In a practical sense, science of science research can promote young scientists to establish their early career life, better evaluate the performance of scientific projects, track popular research frontiers, and even discover new questions from data. These tasks require the joint efforts of scholars from various fields.

However, the advent of the age of Big Data also poses enormous challenges, especially for causal inferences. Finding patterns of correlation in the data is essential but not the goal of research. Causal forces enable us to understand the complexities of the world and provide more relevant policy implications. Most big data is created in the complex real-world and non-experimental environments, which generates difficulties for researchers who try to capture causal links in big data due to a large number of uncontrolled confounders in big data. Drawing causal inference based on big data requires new advancements and novel thinking. This challenge also provides new opportunities to make traditional research in informetrics and scientometrics that are more policy relevant.

The ongoing COVID-19 reflects the disruptions of external threats on the scientific community. The increasing complexities of economy and society, and the growing globalization generated big data in public health, environments and migration, which enables scientists to investigate how environmental stability influences the scientific system. Citation data, along with other bibliographic datasets, have long been adopted by the Web community as an important direction for presenting the validity and effectiveness of proposed algorithms and strategies. Many top computer scientists are also excellent researchers in science of science. The purpose of this workshop is to bridge the two communities (i.e., the knowledge discovery community and the science of science community) together as the scholarly activities become salient web and social activities that start to generate a ripple effect on broader knowledge discovery communities. This workshop will showcase the current data-driven science of science research by highlighting several studies and constructing a community of researchers to explore questions critical to the future of data-driven science of science, especially a community of data-driven science of science in Data Science so as to facilitate collaboration and inspire innovation. Through discussion on emerging and critical topics in the science of science, this workshop aims to help generate effective solutions for addressing environmental, societal, and technological problems in the scientific community.

[Submission guidelines]

All papers must be original and not simultaneously submitted to another journal or conference. Please be kindly noted that workshop papers will NOT be included in the KDD main conference proceeding.

Workshop papers have the same guidelines as KDD. Submissions are limited to nine (9 pages), including references, must be in PDF and use ACM Conference Proceeding templates (two-column format). An additional two pages of supplemental material focused on reproducibility can be provided. Additionally, proofs and pseudo-code that could not be included in the main nine-page manuscript may also be included in the two-page supplement. As in previous years, supplements should be included in the same file as the main manuscript. Besides, authors have the option of uploading extra files (e.g., code/data) of their work. See the template here:


Methods and techniques for knowledge discovery in science of science

Recommendation of publications, authors, journals, and institutions with deep learning

Webometrics and altmetrics

Science mapping and visualization

Science policy and research assessment

University policy and institutional rankings

Knowledge discovery and data mining

Bibliometrics-aided information retrieval

Data sources about scholarly activities

Data Accuracy and disambiguation

NLP for science of science

Causal inferences in science of science

Team formation and collaboration

Diversity in innovation and leadership

Scientific career and mentorship

Knowledge diffusion and forgetness

Important Dates

May 26, 2022

Workshop paper submissions

June 20, 2022

Workshop paper notifications

July 9, 2022

Final submission of workshop program and materials

August 15, 2022

Workshop date


Yi Bu

Peking University

Meijun Liu

Fudan University

Yujia Zhai

Tianjin Normal University

Ying Ding

University of Texas

Feng Xia

Federation Univ Australia

Daniel Acuña

Syracuse University

Yi Zhang

Univ of Tech Sydney

Invited Speakers

James Evans

University of Chicago

Weihua Li

Beihang University

Staša Milojević

Indiana University

Domenic Rosati


Yang Yang

University of Notre Dame

Yian Yin

Cornell University