Machine Learning Lectures: Steps 0-1

“We are entering a new world. The technologies of machine learning, speech recognition, and natural language understanding are reaching a nexus of capability. The end result is that we’ll soon have artificially intelligent assistants to help us in every aspect of our lives.”
-- Amy Stapleton

The 12 Steps of Applied AI / ML

  • Step 0: Reality check and setup
  • Step 1: Define your objectives
  • Step 2: Get access to data
  • Step 3: Split your data
  • Step 4: Explore your data
  • Step 5: Prepare your tools
  • Step 6: Use your tools to train some models
  • Step 7: Debug, analyze, and tune
  • Step 8: Validate your models
  • Step 9: Test your model
  • Step 10: Productionize your system
  • Step 11: Run live experiments to launch safely
  • Step 12: Monitor and maintain

12 Steps to Applied AI: https://medium.com/swlh/12-steps-to-applied-ai-2fdad7fdcdf3

S0: Where To Start with Applied AI?

S0: Classification vs. Regression

S0: Instances, Features, Targets

  • Begin with labels
  • Articulate the decisions you want
  • What kind of label do you want?
  • Classification (binary or multiclass)
  • Prediction (numerical)
  • Jargon - Instance / Label / Feature
  • Instance: row; the example
  • Feature: column (variable)
  • Label: answer or truth or response

S0: Supervised Learning

S0: Unsupervised Learning

S0: Semi-supervised Learning

  • For any example you give the system, you have the correct label handy
  • Keywords: labeled data
  • Splits into two groups
  • Just separates them into groups
  • Keywords: Data mining / Clustering
  • Only some of the correct labels
  • Blend of supervised and unsupervised learning
  • Keywords: Partial guidance

S0: Reinforcement Learning

S0: What is Data Science?

S0: Data Science Flowchart

  • Not immediate feedback for every move - delayed feedback
  • Keywords: sequence of actions, reward / punishment, delayed feedback, system influences its environment
  • Lots of failure
  • Map of data science: (number of decisions)
  • Analytics: None - get inspired
  • ML/AI: Many - make recipe
  • Statistics: Few - decide wisely
  • Are we making decisions?
  • No - Use descriptive analytics
  • Yes - Can you look up the answer?
  • -- Yes - Use descriptive analytics
  • -- No - Many decisions? -
  • ---- Yes - Use ML
  • ---- No - Few Decisions?
  • ---- Do you need to control risk?
  • No - Use Descriptive Analytics
  • Yes - Use Statistical Inference

S0: Don't forget Data

S1: What is "good behavior"

S1: False Positives & True Negatives

  • ML does not work without data
  • Need data to learn from
  • Set Objectives first!
  • Can be nuanced
  • What are you trying to accomplish
  • Write output label (Cat / No Cat)
  • Consider mistakes
  • Assign project score
  • Create performance metric
  • Think about loss function
  • Compare the functions
  • Set performance criteria

S1: Confusion Matrix

S1: Performance Metrics

S1: Ground Truth

  • Project scoring
    • True Positive
    • False Positive
    • False Negative
    • True Negative
  • How to calculate the score?
  • Business Performance Metric
  • Accuracy
  • Precision
  • Recall
  • Blends of the above

S1: Precision vs Recall

S1: What is Optimization

  • Two questions ... 
  • 1. Did it get it right or not? True/False
  • 2. Did it think it found a (label)? Positive/Negative

  • Precision is the fraction of actual (label) among the (label) the system found
  • Precision -- don't waste my time -- leave things out -- return things that are correct
  • Precision -- returns high quality, but leaves some items out
  • Precision: TP / (TP + FP)

  • Recall is the proportion of (label) the ML system found correctly among all the (label) in all the images
  • Recall -- I need an exhaustive list and am willing to accept duds
  • Recall -- returns lots of duds, but returns most of the good items
  • Recall: TP (TP + FN)

 

  • Optimization is about picking the best parameters
  • Find values that give the best performance metric

S1: Loss Function

S1: Setting Performance Criteria

  • Loss function (objective function - cost function) 
  • Bring "badness" down
  • Compare the functions -- Accuracy vs. Loss function
  • Minimum performance to accept