- Crowd-scribing sign-up sheet (access with a CMU account)

- Crowd-scribed class notes (read-only; ask for edit link if in class)

- L01 (Jan 13): Introduction

- L02 (Jan 15): Basics of supervised learning: regression, classification

- Scribe note (Allie Del Giorno)
- Overview of supervised learning (Hastie, Tibshirani, Friedman, 2017) [Elements of statistical learning, chapter 01]
- Statistical learning (Jamse, Witten, Hastie, Tibshirani, 2017) [An introduction statistical learning, chapter 02]

- L-- (Jan 20): No class (MLK day)

- L03 (Jan 22): Nearest-neighbor methods: k-nn regression and classification

- Scribe note (Yunhan Wen)
- Lectures on the nearest neighbor method (Blau, Devroye, 2015)
- Prototype methods and nearest-neighbors (Hastie, Tibshirani, Friedman, 2017) [Elements of statistical learning, chapter 13]
- Nearest neighbor pattern classification (Cover, Hart, 1967)
- Kernel and nearest-neighbor estimation of a conditional quantile (Bhattacharya, Gangopadhyay, 1990)

- L04 (Jan 27): Predictive inference: conformal prediction

- Scribe note (Ian Waudby-Smith)
- Conformal prediction (Vovk, 2005) [Algorithmic learning in a random world, chapter 02]
- A tutorial on conformal prediction (Shafer, Vovk, 2008)
- Distribution-free predictive inference for regression (Lei, G'Sell, Rinaldo, Tibshirani, Wasserman, 2017)

- L05 (Jan 29): Ensemble methods: boosting (game-theoretic perspective)

- Scribe note (Sasha Podkopaev)
- Boosting (Mohri, Rostamizadeh, Talwalkar, 2018) [Foundations of machine learning, chapter 07]
- The strength of weak learnability (Schapire, 1990)
- Boosting a weak learning algorithm by majority (Freund, 1995)
- The weighted majority algorithm (Littlestone, Warmuth, 1992)
- A decision-theoretic generalization of on-line learning and an application to boosting (Freund, Schapire, 1997)

- L06 (Feb 03): Ensemble methods: boosting (statistical perspective)

- Scribe note (Weichen Wu)
- Boosting (Foundations of machine learning, chapter 07)
- Potential boosters (Duffy, Helmbold, 1999)
- Boosting algorithms as gradient descnet (Mason, Baxter, Bartlett, Frean, 1999)
- Greedy function approximation: a gradient boosting machine (Friedman, 2001)

- L07 (Feb 05): Ensemble methods: boosting (computational considerations, applications), guest lecture by Allie

- Scribe note (Tuhinangshu Choudhury)
- SpeedBoost: anytime prediction with uniform near-optimality (Grubb, Bagnell, 2012)

- L08 (Feb 10): Ensemble methods: boosting (generalization)

- Scribe note (Rajshekar Das)
- Boosting (Foundations of machine learning, chapter 07)
- Boosting the margin: a new explanation for the effectiveness of voting methods

- L09 (Feb 12): Quiz 1

- Topics: basics (supervised learning), prototype methods (nearest-neighbor methods), predictive inference (conformal prediction), ensemble methods (boosting)

- L10 (Feb 17): Ensemble methods: bagging, random forests

- Scribe note (Andrew Warren)
- Random forests (Hastie, Tibshirani, Friedman, 2017) [Elements of statistical learning, chapter 15]
- Tree-based methods (Jamse, Witten, Hastie, Tibshirani, 2017) [An introduction statistical learning, chapter 08]
- Bagging predictors (Leo Breiman, 1994)

- L11 (Feb 19): Variable importance: random forests case study

- Scribe note (Amanda Coston)
- Random decision forests (Ho, 1995)
- Random forests (Breiman, 2001)
- Additional reading: Phase stop permuting features: an explanation and alternatives (Hooker, Mentch, 2019)
- Additional reading: Getting better from worse: augmented bagging and a cautionary tale of variable importance (Mentch, Zhou, 2020)

- L12 (Feb 24): Datapoint importance: Shapley values

- Scribe note (Zeyu Tang)
- Notes on n-person game -- II: The value of an n-person game (Shapley, 1951)
- A unified approach to interpreting model predictions (Lundberg, Lee, 2017)
- Data Shapley: Equitable valuation of data for machine learning (Ghorbani, Zou, 2019)
- Shapley values (Molnar, 2020) [Interpretable machine learning, chapter 05]
- Problems with Shapley-value-based explanations as feature importance measures (Kumar, Venkatasubramanian, Scheidegger, Friedler, 2020)

- L13 (Feb 26): Ensemble methods: stacking

- L14 (Mar 02): Predictive inference: jackknife+

- Scribe note (Max Rubinstein)
- Predictive inference with the jackknife+ (Barber, Candes, Ramdas, Tibshirani, 2019)
- Predictive inference is free with the jackknife+after-bootstrap (Kim, Xu, Barber, 2020)

- L15 (Mar 04): Predictive inference: leave-one-out

- Scribe note (Naveen Basavaraj)
- Model assessment and selection (Hastie, Tibshirani, Friedman, 2017) [Elements of statistical learning, chapter 07]
- Resampling methods (James, Witten, Hastie, Tibshirani, 2017) [An introduction statistical learning, chapter 05]

- L-- (Mar 09): No class (spring break)

- L-- (Mar 11): No class (spring break)

- L16 (Mar 16): No class (no class due to COVID-19 online preparation)

- L17 (Mar 18): Mid-class reap: (methods/rows) k-nearest-neighbor; boosting; bagging; random forest; stacking; (aspects/columns) algorithms; bias-variance; computation, conformal; (practical) data aspects; explainability, interpretability

- L18 (Mar 23): Quiz 2

- Topics: variable importance (random forest), data importance (Shapley values), ensemble methods (stacking), predictive inference (jackknife+, leave-one-out)

- L19 (Mar 25): Kernel learning: basics (RKHS intro)

- Scribe note (Ankur Mallick)
- Kernel methods (Foundations of machine learning, chapter 06)
- Introduction to RKHS (Gretton, 2015)

- L20 (Mar 30): Kernel learning: basic (RKHS equivalences)

- Scribe note (Lorenzo Tomaselli)
- Kernel methods (Foundations of machine learning, chapter 06)
- Mappings of Probabilities to RKHS and applications (Gretton, 2015)

- L21 (Apr 01): Kernel learning: basics (universal/characteristic kernel)

- Scribe note (Nick Kissel)
- Kernel methods (Foundations of machine learning, chapter 06)
- Mappings of Probabilities to RKHS and applications (Gretton, 2015)

- L22 (Apr 06): Kernel learning: regression, kernel classification (kernel ridge regression, kernel SVM, kernel logistic regression)

- Scribe note (Mike Stanley)
- Regression (Foundations of machine learning, chapter 11)
- Dependence measures using RKHS embeddings (Gretton, 2015)

- L23 (Apr 08): Unsupervised learning: clustering (kernel hierarchical clustering, k-means clustering)

- Scribe note (not available, sorry!)
- Unsupervised learning (Hastie, Tibshirani, Friedman, 2017) [Elements of statistical learning, chapter 14]
- k-means++: the advantage of careful seeding (Arthur, Vassilvitskii, 2006)

- L24 (Apr 13): Unsupervised learning: dimension reduction (PCA, kernel PCA)

- Scribe note (Misha Khodak)
- Unsupervised learning (Hastie, Tibshirani, Friedman, 2017) [Elements of statistical learning, chapter 14]
- Resampling methods (James, Witten, Hastie, Tibshirani, 2017) [An introduction statistical learning, chapter 05]
- Kernel tricks and nonlinear dimensionality reduction via RBF kernel PCA (Raschka, 2014)
- Kernel Principal Component Analysis and its Applications in Face Recognition and Active Shape Models (Wang, 2012)

- L25 (Apr 15): Unsupervised learning: diemsnsion reduction (stochastic PCA, depp PCA, autoencoders)

- Scribe note (not available, sorry!)
- Unsupervised learning (Hastie, Tibshirani, Friedman, 2017) [Elements of statistical learning, chapter 14]
- Unsupervised learning (James, Witten, Hastie, Tibshirani, 2017) [An introduction to statistical learning, chapter 10]
- The Fast Convergence of Incremental PCA (Balsubramani, Dasgupta, Freund, 2015)
- Extracting and Composing Robust Features with Denoising Autoencoders (Vincent, Larochelle, Bengion, Manzagol, 2008)
- Autoencoders, Unsupervised Learning, and Deep architectures (Baldi, 2012)
- Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion (Vincent, Larochelle, Lajoie, Manzagol, 2010)
- A PCA-like autoencoder (Ladjal, Newson, Pham, 2019)

- L26 (Apr 17): Guest lecture by Lucas Mentch

- L27 (Apr 20):

- Scribe note (Jenn Williams)
- Neural networks (Hastie, Tibshirani, Friedman, 2017) [Elements of statistical learning, chapter 11]
- Breaking the curse of dimensionality with convex neural networks (Bach, 2017)
- Universal approximation bounds for superpositions of a sigmoid function (Barron, 1993)
- Approximation of superpositions of a sigmoidal function (Cybenko, 1989)
- Approximations of continuous functionals by neural networks with applications to dynamics systems (Chen, Chen, 1993)
- Universal approximation too nonlinear operators by neural networks with arbitrary activation functions and its applications to dynamical systems (Chen, Chen, 1995)
- Gradient descent only converges to minimizers (Lee, Simchowitz, Jordan, Recht, 2016)
- First-order methods almost always avoid saddle points: the case of vanishing step-sizes (Panageas, Piliouras, Wang, 2019)
- Benefits of depth in neural networks (Telgarsky, 2016)

- L28 (Apr 22): Quiz 3

- Topics: kernels (basics, regression, classification), unsupervised learning (clustering, PCA, kernel PCA, stochastic PCA, deep PCA, autoencoders)

- L29 (Apr 27): Unsupervised leaerning (ICA, CCA, SDR)

- Scribe note (Zeyu Tang)
- Independent componnet analysis: algorithms and analysis (Hyvarinen, Oja, 2000)
- Canonical correlation: a tutorial (Borga)
- Sliced inverse regression for dimension reduction (Ker-Chau Li)

- L30 (Apr 29): Calibration and end-class recap

- Scribe note (Dhivya Eswaran)
- Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers (Zadrozny, Elkan, 2001)
- Machine learning for everyone
- Topics maps (row-focused)
- Topics map (row-focused)
- Topics map (row-focused)

- The Elements of Statistical Learning: Data Mining, Inference and Prediction. Trevor Hastie, Robert Tibshirani, Jerome Friedman.

- An Introduction to Statistical Learning: With Applications in R. Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani.

- Foundations of Machine Learning. Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar.

- Videos of Larry Wasserman's 36-705 course

- Linear algebra review, videos by Zico Kolter

- Real analysis, calculus, and more linear algebra, videos by Aaditya Ramdas

- Convex optimization prequisites review from Spring 2015 course, by Nicole Rafidi

- See also Appendix A of Boyd and Vandenberghe (2004) for general mathematical review