Machine Learning Lectures: Steps 8-9
"By far, the greatest danger of Artificial Intelligence is that people conclude too early that they understand it.”
— Eliezer Yudkowsky
Steps 8-9: Check Algorithms
- Step 8:
- Step 9: Test Your Model
Table of Contents
- Step 8: When Validation Fails (4:19)
- Step 8: Validation is Blind (5:27)
- Step 8: Validation Keeps You Safe (3:03)
- Step 9: What's the difference between testing and validation (2:42)
- Step 9: The 12 Steps of Statistics (8:11)
- Step 9: Interpreting Test Output (6:08)
- Step 9: Understanding p-values (3:33)
- Step 9: Statistical Significance (4:19)
- Step 9: Game Over? (2:05)
- Step 9: A Commitment To Testing (3:51)
S8: When Validation Fails
S8: Validation is Blind
S8: Validation Keeps You Safe
- Validation: evaluate model on different dataset
- When it fails ...
- engineering new features,
- training on feature subsets,
- running different algorithms,
- tuning your algorithm,
- change model complexity
- Check with fresh data
- Validation is blind
- Only look at the final performance metric
- Don't debug your validation data!
- Validation is the most important step in machine learning
S9: Testing vs Validation
S9: 12 Steps of Statistics
S9: Interpreting Test Output
- Standards: need hypothesis test
- Statistics is the science of changing your mind
- Hypothesis Testing
- 12 Steps of Statistics:
- default action, operationalization, population, simulation, data strategy, assumption, hypotheses, method selection, power analysis, collection, testing, reporting
- Output is a p-value or confidence interval
- p-value
- F - none of the above
- report the outcome of a hypothesis test
- a p-value is the probability of obtaining a sample at least as extreme as the one we just observed in a world where the null hypothesis is actually true
- a small p-value makes your null hypothesis look ridiculaout
- confidence interval
S9: Understanding p-values
S9: Statistical Significance
- Stronger Evidence: lower p-value
- Decision Threshold: significance value
- Testing is the final frontier
- Collect a new test dataset
S9: Game Over?
S9: A Commitment To Testing
- Statistical rigor
- If testing fails, only can start over again
- Do not test the model on a new dataset
- I will never test on data that was involved in any way in training or validation