Machine Learning Lectures: Steps 5-7
“Big data is at the foundation of all of the megatrends that are happening today, from social to mobile to the cloud to gaming.”
– Chris Lynch
Steps 5-7: Use Algorithms
- Step 5: Get tools
- Step 6: Train models
- Step 7:
Table of Contents
- Step 5: Algorithm Selection (6:14)
- Step 6: Is Training AI Easy (1:57)
- Step 6: Ideal Dataset Size (6:40)
- Step 6: Training Faster (2:22)
- Step 6: Statistics versus "statistics" (2:44)
- Step 6: Overfitting (2:56)
- Step 6: Complexity and Regularization (4:26)
- Step 6: Tempting Features (3:08)
- Step 6: Can You Skip The Training Phase In AI? (1:04)
- Step 7: Debugging Your Machine Learning Model (6:22)
- Step 7: Hyper-parameter Tuning (2:29)
- Step 7: What is a Holdout DataSet? (1:36)
- Step 7: Cross Validation (4:32)
- Step 7: Advanced Debugging (2:56)
- Step 7: Can You Skip Tuning? (0:51)
S5: Algorithm Selection
S6: Is Training AI Easy
S6: Ideal Dataset Size
- Algorithms Selection:
- Support Vector Classifier
- straight link
- Decision Tree
- horizontal / vertical
- Neural Network
- curvilinear
- Support Vector Classifier
- ML Research
- Finding Patterns
- ML Application
- Assessing models
- Throw away methods that don't meet your needs
- Review of steps
- Training -- Step 6
- How many features should you use?
- Worst: lots of features / little instances
- Better: only a subset of features
- Better: more instances
- Best: features / instances
- Length to Width
- guess about 10 to 1
- Dimensionality Reduction / Feature Reduction
- PCA
S6: Training Faster
S6: Statistics vs "Statistics"
S6: Overfitting
- Use prototyping tools
- Start with a smaller dataset to see if tinkering is worth doing
- Statistics: philosophical pursuit vs. trying it
- Tinker till it fits
- Fit is to perform well on an objective
- Make it fit!
- Training + Tuning
- fit, overfit, underfit, mess
- Validation: Pass / Fail
- Testing:
- Model complexity encourages overfitting
S6: Complexity and Regularization
S6: Tempting Features
S6: Skip Training Phase
- Regularization: focus on simplicity
- Penalties for errors and complexity
- Avoid training on data from the future
- Avoid training on features that cannot be used in production
- Your goal is to find patterns in the data
S7: Debugging Model
S7: Hyper-parameter Tuning
S7: What is a Holdout Set?
- Debugging / Tuning
- Need its own dataset
- pre-save or take out of training data
- Check performance in debugging data
- what instances model got wrong
- Look at things that went wrong
- see if there is something that should be added
- Tuning: Hyperparameters - numerical settings in an algorithm
- Hyperparameters are set before the algorithms runs
- Parameters are set using the data
- Use a "for" loop
S7: Cross Validation
S7: Advanced Debugging
S7: Can You Skip Tuning?
- Cross validation: Cross Tuning
- k-fold cross validation
- k is the number of non-overlapping pieces
- train, evaluate then store
- move to the next setting
- Aggregated performance
- Result is tuned model
- Check model stability
- debug inside of the model
- Enough data?
- Overfitting?
- Susceptible to outliers
- Tuning tends to. be more important later in ML