CS 178: Machine Learning and Data Mining

Prof. Stephan Mandt
Winter 2024
Day/Time: Tuesday and Thursday 12:30--1:50 pm
Location: HIB-100
TAs: Justus Will, Xingwei Li, Tamanna Hossein, Ruihan Yang
Reader: Kai Nelson

The course's official webpage is on Canvas. Please check the Canvas site for all details including homework, resources, a syllabus, and links to Piazza and Gradescope.
The first lecture takes place on Tuesday January 8, 2024.

FAQs

  • This course is in-person only. If you can't attend the lectures, please choose another course.
  • The course has reached its maximum capacity of 235 students. This capacity will not be further increased due to resource constraints.

    Course Description

    How can a machine learn from experience, to become better at a given task? How can we automatically extract knowledge or make sense of massive quantities of data? These are the fundamental questions of machine learning. Machine learning and data mining algorithms use techniques from statistics, optimization, and computer science to create automated systems which can sift through large volumes of data at high speed to make predictions or decisions without human intervention. Machine learning as a field is now incredibly pervasive, with applications from the web (search, advertisements, and suggestions) to national security, from analyzing biochemical interactions to traffic and emissions to astrophysics. Perhaps most famously, the $1M Netflix prize stirred up interest in learning algorithms in professionals, students, and hobbyists alike; now, websites like Kaggle host regular open competitions on many companies' data. This class will familiarize you with a broad cross-section of models and algorithms for machine learning, and prepare you for research or industry application of machine learning techniques.

    Course Topics

  • Nearest neighbor methods
  • Bayes classifiers, naive Bayes
  • Linear regression, linear classifiers; perceptrons & logistic regression
  • VC dimension, shattering, and complexity
  • Neural networks (multi-layer perceptrons) and deep belief nets
  • Support vector machines; kernel methods
  • Decision trees for classification & regression
  • Ensembles; bagging, gradient boosting, adaboost
  • Unsupervised learning: clustering methods
  • Dimensionality reduction: (Multivariate Gaussians); PCA/SVD, latent space representations
  • Recommender Systems and Collaborative Filtering
  • Time series, Markov models
  • Advanced Topics

    Prerequisites

  • Appropriate mathematical background in probability and statistics, calculus and linear algebra
  • Programming assignments will require a working familiarity with Python, along with familiarity with data structures and algorithms.

    Academic Integrity

    All students are expected to be familiar with the policy below. Failure to adhere to this policy can result in a student receiving a failing grade in the class.

    Academic integrity is taken seriously. For homework problems or programming assignments you are allowed to discuss the problems or assignments verbally with other class members, but under no circumstances can you look at or copy anyone else's written solutions or code relating to homework problems or programming assignments. All problem solutions and code submitted must be material you have personally written during this quarter, except for (a) material that you clearly indicate and reference as coming from another source, or (b) code provided to you by the TA/reader or instructor.

    It is the responsibility of each student to be familiar with UCI's Academic Integrity Policies and UCI's definitions and examples of academic misconduct.