Explanatory Data Analysis group

Information Theoretic Data Mining '17-'18

Part of
MSc Computer Science, MSc Computer Science: Data Science
Lecturer
Matthijs van Leeuwen
Assistant
Hugo Manuel Proença
News
  • 11.09.17 Slides and literature of today's lecture have been added.
  • 05.09.17 All available spots in this course have been reserved! Participating students have received a confirmation from me by e-mail. The next edition of the course will start in exactly one year from now.
  • 04.09.17 Slides of today's first lecture have been added.
Introduction

How can we gain insight from data? How can we discover and explain structure in data if we don't know what to expect? What is the optimal model for our data? How do we develop principled algorithms for exploratory data mining? To answer these questions, we study and discuss the state of the art in the relatively young research area of information theoretic data mining. We focus on theory, problems, and algorithms, not on implementation and experimentation.

Contents and schedule

The following provides an overview of the contents and schedule of the course; abbreviations for class types are explained below. Slides and additional reading material will be added during the course.

# Date Type Topic Mandatory Optional
1 Mon 4 Sep L Introduction
[1](Ch1-6)
2 Mon 11 Sep L Kolmogorov complexity
[2](1.2-1.8,2-2.1.1,8.3,8.4) [4] [5,6]
Mon 18 Sep No class
3 Mon 25 Sep L The Minimum Description Length principle
4 Mon 2 Oct L+S Pattern-based modelling
5 Mon 9 Oct S Coding for exploratory data analysis
6 Mon 16 Oct S Finding good models
7 Mon 23 Oct L+S The Maximum Entropy principle
30 Oct - 10 Nov No class; topic selection for presentations and essays, individual tutoring.
8 Thu 16 Nov P Presentations #1
9 Thu 23 Nov P Presentations #2
10 Thu 30 Nov P Presentations #3
11 Thu 7 Dec P Presentations #4
12 Thu 14 Dec P Presentations #5
Fri 22 Dec Essay submission deadline
Class types explained
L
Lecture; just sit back and pay attention, you can read the literature afterwards
S
Seminar; your active contribution is expected, prepare by reading the mandatory literature in advance
L+S
Combination of Lecture and Seminar (= read mandatory literature in advance)
P
Student presentations; your active contribution is expected, but no need to prepare (unless you present, of course)
Examination

Attendance of all twelve course meetings is mandatory. The final mark is composed of participation in discussions (20%); presentation (including Q&A) (30%); and essay (50%).

Literature
  1. Wasserman, L. All of Statistics - A Concise Course in Statistics, Springer, 2004.
  2. Ming, L & Vitányi, PMB. An Introduction to Kolmogorov Complexity and Its Applications (3rd ed), Springer, 2008.
  3. Grünwald, P. The Minimum Description Length Principle, Springer, 2007.
  4. Campana, BJL & Keogh, EJ. A Compression-Based Distance Measure for Texture. In: Proceedings of SIAM Data Mining (SDM'10), 2010.
  5. Faloutsos, C & Megalooikonomou, V. On data mining, compression, and Kolmogorov complexity. In: Data Min. Knowl. Discov. 15(1):3-20, 2007.
  6. Cilibrasi, R & Vitányi, PMB. Clustering by Compression. In: IEEE Transactions on Information Theory 15(4):1523-1545, 2005.

Note that this list will be extended during the course.