scikit-learn is enormous, but most working code uses ~10% of the API. This is that 10%.
What’s in here
PipelineandColumnTransformer: keep preprocessing and model in one object- the five models that solve most tabular problems: linear/logistic, ridge/lasso, tree, random forest, gradient boosting
cross_val_scoreand why a single train/test split lies to youGridSearchCVvsRandomizedSearchCV, and when each one wins- saving and loading models without breaking versioning
Prerequisites
- Tutorial 1 (training basics) recommended
- Comfortable with pandas DataFrames