Problem Formulation & Planning

Theory

Review supervised learning fundamentals (regression/classification, loss functions, overfitting) with emphasis on evaluation at scale: how full cross-validation becomes expensive and when single holdout or time-based splits are more appropriate. Contrast common metrics (MSE, MAE, MAPE, precision/recall, ranking metrics) and discuss computational cost of computing metrics on large datasets. Frame each project type formally: recommendation as ranking/utility estimation, segmentation as graph partitioning, forecasting as multi-series prediction. Use short quizzes around appropriate metric choice and evaluation strategies under compute constraints.

Technical

Recap project options and show examples of good project scopes. Provide a structured template for the project plan: problem statement, data understanding, planned ETL steps, baseline methods, improved methods, and team roles. In-class: finalize teams (students), choose project track and dataset, and begin drafting execution plans in a shared document or repo. Instructor circulates to check feasibility.