Foundations of Data Science by Avrim BlumComputer science as an academic discipline began in the 1960’s. Emphasis was on programming languages, compilers, operating systems, and the mathematical theory that supported these areas. Courses in theoretical computer science covered finite automata, regular expressions, context-free languages, and computability. In the 1970’s, the study of algorithms was added as an important component of theory. The emphasis was on making computers useful. Today, a fundamental change is taking place and the focus is more on a wealth of applications. There are many reasons for this change. The merging of computing and communications has played an important role. The enhanced ability to observe, collect, and store data in the natural sciences, in commerce, and in other fields calls for a change in our understanding of data and how to handle it in the modern setting. The emergence of the web and social networks as central aspects of daily life presents both opportunities and challenges for theory.
While traditional areas of computer science remain highly important, increasingly researchers of the future will be involved with using computers to understand and extract usable information from massive data arising in applications, not just how to make computers useful on specific well-defined problems. With this in mind we have written this book to cover the theory we expect to be useful in the next 40 years, just as an understanding of automata theory, algorithms, and related topics gave students an advantage in the last 40 years. One of the major changes is an increase in emphasis on probability, statistics, and numerical methods.
Early drafts of the book have been used for both undergraduate and graduate courses. Background material needed for an undergraduate course has been put in the appendix. For this reason, the appendix has homework problems.
Foundations of Data Science – Free Book
The course shall dwell on the geometric, mathematical and statistical foundations, necessary to understand and computationally exploit scalable data analysis and visualization. Issues of measurement errors, noise and outliers shall be central to bounding the precision, bias and accuracy of the data analysis. This is subject to modification, given the background and speed at which we cover ground. There will be a mid-term exam in class. The content will be similar to the homework exercises. Class Project Topics List. As long as you like.
The book is still a draft and I am using this version. Target audience includes advanced undergraduate and graduate level students. We had some success using this book as a core material for an undergraduate class at Penn this Spring link to the news article. In particular, a new graduate Masters program in Data Science here at IU attracts hundreds of students from diverse backgrounds. While the jury is still out on what topics should be considered as fundamental for data science I think that the Blum-Hopcroft-Kannan book makes a good first step in this direction.
Goodreads helps you keep track of books you want to read. Want to Read saving…. Want to Read Currently Reading Read. Other editions. Enlarge cover. Error rating book.
Learning To Be A Data Scientist
Learn Data Science Tutorial - Full Course for Beginners
Emphasis was on programming languages, compilers, operating systems, and the mathematical theory that supported these areas. Courses in theoretical computer science covered finite automata, regular expressions, context-free languages, and computability. The emphasis was on making computers useful. Today, a fundamental change is taking place and the focus is more on applications. There are many reasons for this change. The merging of computing and communications has played an important role.