Week 10 Assignments
Textbook Chapter 3 (Preprocessing and Scaling), Preprocessing and Scaling Demo, and D2L Quiz
- Here is the link to the Week 10 demo.
- Download Jupyter Notebook
Part 1: Reading
Read Chapter 3, preprocessing and scaling section (pages 131–139).
This section opens Chapter 3 by introducing preprocessing as the step that prepares raw data for modeling. Unlike the supervised algorithms covered in Chapter 2, preprocessing only looks at the feature data and does not involve a target column. The demo provides a hands-on walkthrough of StandardScaler and MinMaxScaler with detailed explanations of why feature scale matters and how to apply the fit-on-training rule correctly.
Part 2: Work through the Week 10 Demo
The Week 10 demo covers:
- Understanding what preprocessing is and how it differs from the training step
- Understanding why feature scale affects distance-based algorithms like kNN
- Training
StandardScalerusing the fit/transform workflow and applying it to training and test sets - Understanding what
StandardScalercomputes during fit (mean and standard deviation) and what it produces (mean 0, standard deviation 1) - Training
MinMaxScalerusing the same three-step pattern and comparing its output toStandardScaler - Recognizing which algorithms benefit from scaling and which do not
- Understanding the fit-on-training rule: why the scaler must be fit on training data only and what goes wrong when the rule is broken
- Comparing kNN classification accuracy on the wine dataset before and after scaling
Part 3: D2L Quiz
Complete the Week 10 D2L quiz (Preprocessing and Scaling concepts).
The quiz covers:
- Core concepts: what preprocessing is, why it is considered unsupervised, what scaling does and does not change about the data
StandardScaler:sklearn.preprocessingmodule, what it computes during fit, what the transformed values look likeMinMaxScaler: what it computes during fit, the guaranteed output range, how the three-step fit/transform workflow applies- The fit-on-training rule: what it requires, what goes wrong when the scaler is fit on the full dataset, where the learned statistics are stored
- Algorithm comparisons: which algorithms benefit from scaling, why decision trees do not require it
- Scaler comparisons: how
StandardScalerandMinMaxScalerdiffer in what they compute and what they produce, accuracy before and after scaling on the wine dataset - Advanced awareness:
RobustScalerandNormalizer(textbook only, pages 131–139)
D2L submission checklist
Complete the following in D2L:
- Week 10 D2L quiz (no file uploads required this week)
Note: This week focuses on reading Chapter 3 (pages 131–139), understanding the demo, and demonstrating comprehension through the quiz. There are no screenshot or coding submissions this week. The emphasis is on understanding why feature scale affects distance-based algorithms, how StandardScaler and MinMaxScaler transform features differently, and why the fit-on-training rule must be followed.