This course explores the rapidly developing field of Data Science in the context of data mining and machine learning algorithms for analyzing very large amounts of data. The emphasis will be on large-scale machine learning supported by cutting-edge technologies including Hadoop, Apache Spark and Map Reduce as a tool for creating parallel algorithms that can process very large amounts of data. The course will have a large project component, incorporating analyses over large real world data sets.