CS 178: Machine Learning and Data Mining
Prof. Stephan Mandt
Day/Time: Tuesday and Thursday 12:30--1:50 pm (Zoom live lecture, video recording will be made available to participants)
Location: Zoom, details TBA.
The course's official webpage is on Canvas. Please check the Canvas site for all details
including homework, resources, a syllabus, and links to Piazza and Gradescope.
The first lecture takes place on Tuesday January 4, 2022.
This course will be held semi-synchronously. The lectures will be held live during the regular time slot,
but the videos will be made available on the same day after each class. Students are strongly encouraged to attend the lectures in person in order to ask questions,
but there will be no obligation to do so.
Exams will take place on the dates provided by the university registrar. Students will be able to schedule their exams flexibly on the exam day.
How can a machine learn from experience, to become better at a given task? How can we automatically extract knowledge or make sense of massive quantities of data? These are the fundamental questions of machine learning. Machine learning and data mining algorithms use techniques from statistics, optimization, and computer science to create automated systems which can sift through large volumes of data at high speed to make predictions or decisions without human intervention.
Machine learning as a field is now incredibly pervasive, with applications from the web (search, advertisements, and suggestions) to national security, from analyzing biochemical interactions to traffic and emissions to astrophysics. Perhaps most famously, the $1M Netflix prize stirred up interest in learning algorithms in professionals, students, and hobbyists alike; now, websites like Kaggle host regular open competitions on many companies' data.
This class will familiarize you with a broad cross-section of models and algorithms for machine learning, and prepare you for research or industry application of machine learning techniques.
Nearest neighbor methods
Bayes classifiers, naive Bayes
Linear regression, linear classifiers; perceptrons & logistic regression
VC dimension, shattering, and complexity
Neural networks (multi-layer perceptrons) and deep belief nets
Support vector machines; kernel methods
Decision trees for classification & regression
Ensembles; bagging, gradient boosting, adaboost
Unsupervised learning: clustering methods
Dimensionality reduction: (Multivariate Gaussians); PCA/SVD, latent space representations
Recommender Systems and Collaborative Filtering
Time series, Markov models
Appropriate mathematical background in probability and statistics, calculus and linear algebra
Programming assignments will require a working familiarity with Python, along with familiarity with data structures and algorithms.
All students are expected to be familiar with the policy below. Failure to adhere to this policy can result in a student receiving a failing grade in the class.
Academic integrity is taken seriously. For homework problems or programming assignments you
are allowed to discuss the problems or assignments verbally with other class members, but
under no circumstances can you look at or copy anyone else's written solutions or code relating
to homework problems or programming assignments. All problem solutions and code submitted must
be material you have personally written during this quarter, except for (a) material that you
clearly indicate and reference as coming from another source, or (b)
code provided to you by the TA/reader or instructor.
It is the responsibility of each student to be familiar with
UCI's Academic Integrity Policies
UCI's definitions and examples of