INTRODUTION TO DATAOLOGY AND DATA SCIENCE


Yangyong Zhu and Yun Xiong


(Citation: Yangyong Zhu, Yun Xiong. Introduction to Dataology and Data Science. [TR][OL]. 2009. available at: http://datascience.fudan.edu.cn/87/ef/c4833a34799/page.htm
Yangyong Zhu, Yun Xiong. Defining Data Science. 2015. available at: http://arxiv.org/abs/1501.05039)


Informationization is a process to generate data which stores the objects or phenomena in real world in the form of data into the cyber-space. Data is a representation of nature and life which records human being’s behaviors in work, livelihood and society. Nowadays, a huge amount of data is increasingly and rapidly generated in cyberspace, which is called data explosion. Data explosion forms datanature in cyberspace. It is necessary to investigate and explore the laws from data in cyberspace since data is a unique entity. Meanwhile, it be-comes an important way to explore the rules of the universe, life, human being’s behaviors and the development of the society. For example, we can study the life using data (i.e., Bioinformatics) or investigate the human being’s behaviors using data (i.e., Behavior Informatics). Dataology and Data Science is an umbrella of theories, methods and technologies for exploring phenomena and laws of datanature in cyberspace. Different from natural science and social science, Dataology and Data Science takes data in cyberspace as its research object. It is a new science. Dataology and Data Science includes two main connotations. One is to engage in researching on data including data types, data status, data properties, data transformation, and data evolution. The other is to provide a novel research method for nature science and social science. The method is called scientific research method with data.


There have already been some methods and techniques in Dataology and Data Science including data acquisition, data storage and management, data secrurity, data analysis, and data visualization. However, Dataology and Data Science requires fundamental theories, new methods and techniques, for instance, existence of data, measurement of data, time of data, data algebra, data similarity and theory of cluster, data classification and data cyclopedia, data camouflage and data perception, data experiment, data awareness, etc. Dataology and Data Science will also improve current research methods for scientific research to form new methods and develop specific theories, methods and technologies in various fields to form domain dataology, including behavior dataology, biological dataology, brain dataology, meteorolo-gical dataology, financial dataology, and geographical dataology etc.


The research topics of Dataology and Data Science

i) Foundational theory of Dataology and Data Science.
Observing and logical reasoning are the basis of scientific research. In Dataology and Data Science, we should focus on observation methods and data reasoning in datanature, including existence of data, measurement of data, time of data, data algebra, data similarity and theory of cluster, data classification and data cyclopedia, etc.

ii) Methods of Experiment and Logical Reasoning.
We should establish the methodology of experiment in Dataology and Data Science, as well as scientific hypothesis and theoretic architecture, so that we can explore the datanature, identify data types, data status, data properties, patterns of data transformation, and mechanisms of data evolution, and discover the laws of nature and human beings’ behaviors.

iii) Theories and Methods in Domain Dataology.
We should apply the theory and method in Dataology and Data Science into various fields and develop specific theories and methods to form domain dataology, including behavior dataology, biological dataology, brain dataology, meteorological dataology, financial dataology, and geographical dataology etc.

iv) Methods and Technologies for Ultilizing and Exploiting Data Resource.
Data Resource is one of the most important strategic resources and perhaps more important than oil, coal or minerals, because it is dispensable for human society, politics and economy. In addition, the process of exploiting oil, coal or minerals resources also depends on data resource.


The Framework of Dataology and Data Science

The workflow in Dataology and Data Science is shown as follows:
  a) Firstly, collecting a dataset from datanature;
  b) Secondly, exploring the dataset to grasp its overall characteristic;
  c) Thirdly, analyzing the data (e.g., using data mining technique) or doing data experiments;
  d) Finally, realizing data awareness.
The figure below illustrates the framework of Dataology and Data Science.


framework of Dataology and Data Science


Comparison with other sciences

Data is the formal representation of nature in computer systems; Information is the phenomena of nature, society and thinking activities; and knowledge is experience from practice. Data can be regarded as symbols and representations of information and knowledge, however, it should not be equivalent to information and knowledge. The research object, goal and methods of Dataology and Data Science are essen-tially different from those of Computer Science, Information Science and Knowledge Science.

On one hand, Dataology and Data Science supports natural science and social science. On the other hand, more and more scientific research will be directly targeting at data instead of nature with the development of Dataology and Data Science, which will then promote human to recognize data and facilitate them to explore the nature and human behaviors.



Reference:
[1] Yangyong Zhu, Ning Zhong, Yun Xiong. Data Explosion, Data Nature and Dataology. In Proceedings of International Conference on Brain Informatics (BI’09).2009.
[2] Yangyong Zhu, Yun Xiong. Dataology and Data Science. (in Chinese with English abstract). Fudan University Press. 2009. ISBN 978-7-309-06956-3 /T.350.