CS 178: Machine Learning and Data Mining
Prof. Stephan Mandt
Day/Time: Tuesday and Thursday 12:30--1:50 pm
TAs: Justus Will, Xingwei Li, Tamanna Hossein, Ruihan Yang
Reader: Kai Nelson
The course's official webpage is on Canvas. Please check the Canvas site for all details
including homework, resources, a syllabus, and links to Piazza and Gradescope.
The first lecture takes place on Tuesday January 8, 2024.
This course is in-person only. If you can't attend the lectures, please choose another course.
The course has reached its maximum capacity of 235 students. This capacity will not be further increased due to resource constraints.
How can a machine learn from experience, to become better at a given task? How can we automatically extract knowledge or make sense of massive quantities of data? These are the fundamental questions of machine learning. Machine learning and data mining algorithms use techniques from statistics, optimization, and computer science to create automated systems which can sift through large volumes of data at high speed to make predictions or decisions without human intervention.
Machine learning as a field is now incredibly pervasive, with applications from the web (search, advertisements, and suggestions) to national security, from analyzing biochemical interactions to traffic and emissions to astrophysics. Perhaps most famously, the $1M Netflix prize stirred up interest in learning algorithms in professionals, students, and hobbyists alike; now, websites like Kaggle host regular open competitions on many companies' data.
This class will familiarize you with a broad cross-section of models and algorithms for machine learning, and prepare you for research or industry application of machine learning techniques.
Nearest neighbor methods
Bayes classifiers, naive Bayes
Linear regression, linear classifiers; perceptrons & logistic regression
VC dimension, shattering, and complexity
Neural networks (multi-layer perceptrons) and deep belief nets
Support vector machines; kernel methods
Decision trees for classification & regression
Ensembles; bagging, gradient boosting, adaboost
Unsupervised learning: clustering methods
Dimensionality reduction: (Multivariate Gaussians); PCA/SVD, latent space representations
Recommender Systems and Collaborative Filtering
Time series, Markov models
Appropriate mathematical background in probability and statistics, calculus and linear algebra
Programming assignments will require a working familiarity with Python, along with familiarity with data structures and algorithms.
All students are expected to be familiar with the policy below. Failure to adhere to this policy can result in a student receiving a failing grade in the class.
Academic integrity is taken seriously. For homework problems or programming assignments you
are allowed to discuss the problems or assignments verbally with other class members, but
under no circumstances can you look at or copy anyone else's written solutions or code relating
to homework problems or programming assignments. All problem solutions and code submitted must
be material you have personally written during this quarter, except for (a) material that you
clearly indicate and reference as coming from another source, or (b)
code provided to you by the TA/reader or instructor.
It is the responsibility of each student to be familiar with
UCI's Academic Integrity Policies
UCI's definitions and examples of