Welcome!
BA 7060
Orientation
Blog Post
- Student Driven Blog
Orientation
Community
Pillar 1
Business
Intelligence
Pillar 2
Artificial Intelligence
Machine Learning
Pillar 3
Quantum
Computing
Pillar 4
Programming
Language
Knowledge
Base

Machine Learning Lectures: Steps 8-9

"By far, the greatest danger of Artificial Intelligence is that people conclude too early that they understand it.”
— Eliezer Yudkowsky

Steps 8-9: Check Algorithms

Step 8:
Step 9: Test Your Model

Table of Contents

Step 8: When Validation Fails (4:19)
Step 8: Validation is Blind (5:27)
Step 8: Validation Keeps You Safe (3:03)
Step 9: What's the difference between testing and validation (2:42)
Step 9: The 12 Steps of Statistics (8:11)
Step 9: Interpreting Test Output (6:08)
Step 9: Understanding p-values (3:33)
Step 9: Statistical Significance (4:19)
Step 9: Game Over? (2:05)
Step 9: A Commitment To Testing (3:51)

S8: When Validation Fails

S8: Validation is Blind

S8: Validation Keeps You Safe

Validation: evaluate model on different dataset
When it fails ...
- engineering new features,
- training on feature subsets,
- running different algorithms,
- tuning your algorithm,
- change model complexity
Check with fresh data

Validation is blind
Only look at the final performance metric
Don't debug your validation data!

Validation is the most important step in machine learning

S9: Testing vs Validation

S9: 12 Steps of Statistics

S9: Interpreting Test Output

Standards: need hypothesis test

Statistics is the science of changing your mind
Hypothesis Testing
12 Steps of Statistics:
- default action, operationalization, population, simulation, data strategy, assumption, hypotheses, method selection, power analysis, collection, testing, reporting
Output is a p-value or confidence interval

p-value
- F - none of the above
- report the outcome of a hypothesis test
- a p-value is the probability of obtaining a sample at least as extreme as the one we just observed in a world where the null hypothesis is actually true
- a small p-value makes your null hypothesis look ridiculaout
confidence interval

S9: Understanding p-values

S9: Statistical Significance

Stronger Evidence: lower p-value
Decision Threshold: significance value

Testing is the final frontier
Collect a new test dataset

S9: Game Over?

S9: A Commitment To Testing

Statistical rigor
If testing fails, only can start over again
Do not test the model on a new dataset

I will never test on data that was involved in any way in training or validation

Mike Ilitch School of Business
2771 Woodward Avenue
Detroit, MI 48201
(313) 577-4542

Welcome!
BA 7060

Orientation
Blog Post

Orientation
Community

Pillar 1
Business
Intelligence

Pillar 2
Artificial Intelligence
Machine Learning

Pillar 3
Quantum
Computing

Pillar 4
Programming
Language

Knowledge
Base

© 2023 | Designed with by John Heinrichs - All rights reserved. | Agency M-1 Privacy Policy | WSU Web Privacy Policy