Machine Learning Lectures: Steps 8-9

"By far, the greatest danger of Artificial Intelligence is that people conclude too early that they understand it.”
Eliezer Yudkowsky

S8: When Validation Fails

S8: Validation is Blind

S8: Validation Keeps You Safe

  • Validation: evaluate model on different dataset 
  • When it fails ...
    • engineering new features,
    • training on feature subsets,
    • running different algorithms,
    • tuning your algorithm,
    • change model complexity 
  • Check with fresh data

  • Validation is blind
  • Only look at the final performance metric
  • Don't debug your validation data!
  • Validation is the most important step in machine learning

S9: Testing vs Validation

S9: 12 Steps of Statistics

S9: Interpreting Test Output

  • Standards: need hypothesis test

  • Statistics is the science of changing your mind
  • Hypothesis Testing
  • 12 Steps of Statistics:
    • default action, operationalization, population, simulation, data strategy, assumption, hypotheses, method selection, power analysis, collection, testing, reporting
  • Output is a p-value or confidence interval
  • p-value
    • F - none of the above
    • report the outcome of a hypothesis test
    • a p-value is the probability of obtaining a sample at least as extreme as the one we just observed in a world where the null hypothesis is actually true
    • a small p-value makes your null hypothesis look ridiculaout
  • confidence interval

S9: Understanding p-values

S9: Statistical Significance

  • Stronger Evidence: lower p-value
  • Decision Threshold: significance value
  • Testing is the final frontier
  • Collect a new test dataset

S9: Game Over?

S9: A Commitment To Testing

  • Statistical rigor
  • If testing fails, only can start over again
  • Do not test the model on a new dataset 
  • I will never test on data that was involved in any way in training or validation