Course Overview & Scale
Theory
Introduce the three problem domains: retail recommender systems, graph-based segmentation, and demand forecasting. Discuss how these tasks look on ‘small data’ (single-machine notebooks) versus ’large-scale’ settings with millions of users, items, or time series. Define basic ML concepts (features, targets, train/validation/test splits, empirical risk) and highlight what changes when data no longer fits in memory (data locality, communication cost, approximate vs exact algorithms).
Technical
Introduce course logistics, ARC cluster, Spark, CubeFS, and Slurm at a high level. Present the three fixed project options and corresponding datasets. Walk through schemas and business context. In-class activity: students browse project briefs and datasets, sketch initial ideas for which track they might choose. If desired, collect a low-stakes background survey and project preferences.