Machine Learning Lectures: Steps 0-1
“We are entering a new world. The technologies of machine learning, speech recognition, and natural language understanding are reaching a nexus of capability. The end result is that we’ll soon have artificially intelligent assistants to help us in every aspect of our lives.”
-- Amy Stapleton
Steps 0-1: Ask The Right Questions
- Step 0: Find an application where ML is useful
- Step 1: Define Your Objective
- Step 0: Where to start? (1:35)
- Step 0: Classification vs Regression (1:19)
- Step 0: Instances, Features, Targets (1:03)
- Step 0: Supervised Learning (1:54)
- Step 0: Unsupervised Learning (7:28)
- Step 0: Semi-supervised Learning (2:36)
- Step 0: Reinforcement Learning (2:59)
- Step 0: What is Data Science? (2:54)
- Step 0: Data Science Flowchart (4:53)
- Step 0: Don't forget data (1:45)
- Step 1: What is "good behavior"? (1:57)
- Step 1: False Positives & True Negatives (1:23)
- Step 1: Confusion Matrix (2:54)
- Step 1: Performance Metrics (2:08)
- Step 1: Ground Truth (1:43)
- Step 1: Precision vs Recall (5:47)
- Step 1: What is Optimization (2:07)
- Step 1: Loss Function (6:32)
- Step 1: Setting Performance Criteria (3:39)
The 12 Steps of Applied AI / ML
- Step 0: Reality check and setup
- Step 1: Define your objectives
- Step 2: Get access to data
- Step 3: Split your data
- Step 4: Explore your data
- Step 5: Prepare your tools
- Step 6: Use your tools to train some models
- Step 7: Debug, analyze, and tune
- Step 8: Validate your models
- Step 9: Test your model
- Step 10: Productionize your system
- Step 11: Run live experiments to launch safely
- Step 12: Monitor and maintain
12 Steps to Applied AI: https://medium.com/swlh/12-steps-to-applied-ai-2fdad7fdcdf3
S0: Where To Start with Applied AI?
S0: Classification vs. Regression
S0: Instances, Features, Targets
- Begin with labels
- Articulate the decisions you want
- What kind of label do you want?
- Classification (binary or multiclass)
- Prediction (numerical)
- Jargon - Instance / Label / Feature
- Instance: row; the example
- Feature: column (variable)
- Label: answer or truth or response
S0: Supervised Learning
S0: Unsupervised Learning
S0: Semi-supervised Learning
- For any example you give the system, you have the correct label handy
- Keywords: labeled data
- Splits into two groups
- Just separates them into groups
- Keywords: Data mining / Clustering
- Only some of the correct labels
- Blend of supervised and unsupervised learning
- Keywords: Partial guidance
S0: Reinforcement Learning
S0: What is Data Science?
S0: Data Science Flowchart
- Not immediate feedback for every move - delayed feedback
- Keywords: sequence of actions, reward / punishment, delayed feedback, system influences its environment
- Lots of failure
- Map of data science: (number of decisions)
- Analytics: None - get inspired
- ML/AI: Many - make recipe
- Statistics: Few - decide wisely
- Are we making decisions?
- No - Use descriptive analytics
- Yes - Can you look up the answer?
- -- Yes - Use descriptive analytics
- -- No - Many decisions? -
- ---- Yes - Use ML
- ---- No - Few Decisions?
- ---- Do you need to control risk?
- No - Use Descriptive Analytics
- Yes - Use Statistical Inference
S0: Don't forget Data
S1: What is "good behavior"
S1: False Positives & True Negatives
- ML does not work without data
- Need data to learn from
- Set Objectives first!
- Can be nuanced
- What are you trying to accomplish
- Write output label (Cat / No Cat)
- Consider mistakes
- Assign project score
- Create performance metric
- Think about loss function
- Compare the functions
- Set performance criteria
S1: Confusion Matrix
S1: Performance Metrics
S1: Ground Truth
- Project scoring
- True Positive
- False Positive
- False Negative
- True Negative
- How to calculate the score?
- Business Performance Metric
- Accuracy
- Precision
- Recall
- Blends of the above
S1: Precision vs Recall
S1: What is Optimization
- Two questions ...
- 1. Did it get it right or not? True/False
- 2. Did it think it found a (label)? Positive/Negative
- Precision is the fraction of actual (label) among the (label) the system found
- Precision -- don't waste my time -- leave things out -- return things that are correct
- Precision -- returns high quality, but leaves some items out
- Precision: TP / (TP + FP)
- Recall is the proportion of (label) the ML system found correctly among all the (label) in all the images
- Recall -- I need an exhaustive list and am willing to accept duds
- Recall -- returns lots of duds, but returns most of the good items
- Recall: TP (TP + FN)
- Optimization is about picking the best parameters
- Find values that give the best performance metric
S1: Loss Function
S1: Setting Performance Criteria
- Loss function (objective function - cost function)
- Bring "badness" down
- Compare the functions -- Accuracy vs. Loss function
- Minimum performance to accept