CS 178: Machine Learning and Data Mining

Prof. Stephan Mandt
Spring 2021
Day/Time: Tuesday and Thursday 11:00 am--12:20 pm (live lecture, videos will be made available to participants)
Location: Zoom, details TBA. TAs: Hao Tang, TBA
Readers: TBA

The course's main is Canvas. Please check the Canvas site for all details including homework, resources, a syllabus, and links to Piazza and Gradescope.
The first lecture takes place on Tuesday March 30, 2021.

FAQs

This course will be held semi-synchronously. The lectures will be held live during the regular time slot, but the videos will be made available on the same day after each class. Students are strongly encouraged to attend the lectures in person, but there will be no obligation to do so. Exams will take place on the dates provided by the university registrar. Students will be able to schedule their exams flexibly on the same day.

Course Description

How can a machine learn from experience, to become better at a given task? How can we automatically extract knowledge or make sense of massive quantities of data? These are the fundamental questions of machine learning. Machine learning and data mining algorithms use techniques from statistics, optimization, and computer science to create automated systems which can sift through large volumes of data at high speed to make predictions or decisions without human intervention. Machine learning as a field is now incredibly pervasive, with applications from the web (search, advertisements, and suggestions) to national security, from analyzing biochemical interactions to traffic and emissions to astrophysics. Perhaps most famously, the $1M Netflix prize stirred up interest in learning algorithms in professionals, students, and hobbyists alike; now, websites like Kaggle host regular open competitions on many companies' data. This class will familiarize you with a broad cross-section of models and algorithms for machine learning, and prepare you for research or industry application of machine learning techniques.

Course Topics

Nearest neighbor methods

Bayes classifiers, naive Bayes

Linear regression, linear classifiers; perceptrons & logistic regression

VC dimension, shattering, and complexity

Neural networks (multi-layer perceptrons) and deep belief nets

Support vector machines; kernel methods

Decision trees for classification & regression

Ensembles; bagging, gradient boosting, adaboost

Unsupervised learning: clustering methods

Dimensionality reduction: (Multivariate Gaussians); PCA/SVD, latent space representations

Recommender Systems and Collaborative Filtering

Time series, Markov models

Advanced Topics

Prerequisites

Appropriate mathematical background in probability and statistics, calculus and linear algebra

Programming assignments will require a working familiarity with Python, along with familiarity with data structures and algorithms.

Academic Integrity

All students are expected to be familiar with the policy below. Failure to adhere to this policy can result in a student receiving a failing grade in the class.

Academic integrity is taken seriously. For homework problems or programming assignments you are allowed to discuss the problems or assignments verbally with other class members, but under no circumstances can you look at or copy anyone else's written solutions or code relating to homework problems or programming assignments. All problem solutions and code submitted must be material you have personally written during this quarter, except for (a) material that you clearly indicate and reference as coming from another source, or (b) code provided to you by the TA/reader or instructor.

It is the responsibility of each student to be familiar with UCI's Academic Integrity Policies and UCI's definitions and examples of academic misconduct.