Welcome!
BA 7060
Orientation
Blog Post
- Student Driven Blog
Orientation
Community
Pillar 1
Business
Intelligence
Pillar 2
Artificial Intelligence
Machine Learning
Pillar 3
Quantum
Computing
Pillar 4
Programming
Language
Knowledge
Base

Machine Learning Lectures: Steps 5-7

“Big data is at the foundation of all of the megatrends that are happening today, from social to mobile to the cloud to gaming.”
– Chris Lynch

Steps 5-7: Use Algorithms

Step 5: Get tools
Step 6: Train models
Step 7:

Step 5: Algorithm Selection (6:14)
Step 6: Is Training AI Easy (1:57)
Step 6: Ideal Dataset Size (6:40)
Step 6: Training Faster (2:22)
Step 6: Statistics versus "statistics" (2:44)
Step 6: Overfitting (2:56)
Step 6: Complexity and Regularization (4:26)
Step 6: Tempting Features (3:08)
Step 6: Can You Skip The Training Phase In AI? (1:04)
Step 7: Debugging Your Machine Learning Model (6:22)
Step 7: Hyper-parameter Tuning (2:29)
Step 7: What is a Holdout DataSet? (1:36)
Step 7: Cross Validation (4:32)
Step 7: Advanced Debugging (2:56)
Step 7: Can You Skip Tuning? (0:51)

S5: Algorithm Selection

S6: Is Training AI Easy

S6: Ideal Dataset Size

Algorithms Selection:
- Support Vector Classifier
  - straight link
- Decision Tree
  - horizontal / vertical
- Neural Network
  - curvilinear
ML Research
- Finding Patterns
ML Application
- Assessing models
Throw away methods that don't meet your needs

Review of steps
Training -- Step 6

How many features should you use?
- Worst: lots of features / little instances
- Better: only a subset of features
- Better: more instances
- Best: features / instances
Length to Width
- guess about 10 to 1
Dimensionality Reduction / Feature Reduction
PCA

S6: Training Faster

S6: Statistics vs "Statistics"

S6: Overfitting

Use prototyping tools
Start with a smaller dataset to see if tinkering is worth doing

Statistics: philosophical pursuit vs. trying it
Tinker till it fits
Fit is to perform well on an objective

Make it fit!
Training + Tuning
- fit, overfit, underfit, mess
Validation: Pass / Fail
Testing:
Model complexity encourages overfitting

S6: Complexity and Regularization

S6: Tempting Features

S6: Skip Training Phase

Regularization: focus on simplicity
Penalties for errors and complexity

Avoid training on data from the future
Avoid training on features that cannot be used in production

Your goal is to find patterns in the data

S7: Debugging Model

S7: Hyper-parameter Tuning

S7: What is a Holdout Set?

Debugging / Tuning
Need its own dataset
- pre-save or take out of training data
Check performance in debugging data
- what instances model got wrong
Look at things that went wrong
- see if there is something that should be added

Tuning: Hyperparameters - numerical settings in an algorithm
Hyperparameters are set before the algorithms runs
Parameters are set using the data

Use a "for" loop

S7: Cross Validation

S7: Advanced Debugging

S7: Can You Skip Tuning?

Cross validation: Cross Tuning
k-fold cross validation
- k is the number of non-overlapping pieces
- train, evaluate then store
- move to the next setting
Aggregated performance
Result is tuned model
Check model stability
- debug inside of the model

Enough data?
Overfitting?
Susceptible to outliers

Tuning tends to. be more important later in ML

Mike Ilitch School of Business
2771 Woodward Avenue
Detroit, MI 48201
(313) 577-4542

Machine Learning Lectures: Steps 5-7

Steps 5-7: Use Algorithms

Table of Contents

S5: Algorithm Selection

S6: Is Training AI Easy

S6: Ideal Dataset Size

S6: Training Faster

S6: Statistics vs "Statistics"

S6: Overfitting

S6: Complexity and Regularization

S6: Tempting Features

S6: Skip Training Phase

S7: Debugging Model

S7: Hyper-parameter Tuning

S7: What is a Holdout Set?

S7: Cross Validation

S7: Advanced Debugging

S7: Can You Skip Tuning?