Guide to AI Algorithms
“AI is good at describing the world as it is today with all of its biases, but it does not know how the world should be.”
-- Joanne Chen, Partner, Foundation Capital
AI Algorithms
- Clustering and k-Means
- Lazy learning and k-NN
- Perceptron
- Maximal Margin Classifier
- Support Vector Classifier
- Support Vector Machines
- Decision Trees
- Boosted Aggregation
- Random Forests
- Ensemble Models
- Naive Bayes
- Linear Regression
- Logistic Regression
- Neural Networks / Deep Learning
- Opening the Black Box (0:57)
- Clustering and k-Means (7:48)
- Lazy Learning and k-NN (4:17)
- The Curse Of Dimensionality (2:07)
- Perceptron (1:58)
- Maximal Margin Classifier (1:32)
- Support Vector Classifier (3:27)
- Support Vector Machines (5:17)
- Decision Trees (5:45)
- Interpretability Debate (2:23)
- Tree and SVM Compared (2:48)
- Boosted Aggregation (1:37)
- Random Forest (1:24)
- Ensemble Models (2:32)
- Naive Bayes (2:57)
- Naive Bayes Classifier (4:34)
- What is Regression (1:49)
- Linear Regression (2:39)
- Logistic Regression (3:33)
- Sigmoid Functions (2:24)
- Ranking and Classification at Scale (1:40)
- Deep Learning by Analogy (6:24)
- What's Inside a Neural Network (5:16)
- Using AI For Automatic Feature Extraction (4:17)
- Components of a Neural Network (2:27)
- Backpropagation (3:23)
- Gotchas of Deep Learning (3:17)
- Neural Network Architecture (2:03)
- When to Use Neural Networks (2:43)
Opening the Black Box
Clustering and k-Means
Lazy Learning and k-NN
- Black Box
- Algorithms to cover
- Clustering: goal is to find homogeneous groups in the data
- k-Means: "k" equals number of clusters; "means" is the center / centroids to be computed
- Picking k: Analytics -- start with 2 and look for anything interesting; Machine Learning -- based upon objectives of ML
- Converge: Yes, cause it involves a loss function
- Same Result: No
- k-NN: k Nearest Neighbors; classification technique
- nothing in common with k-Means
- k is a hyperparameter
- k-NN classifies based on the labels of an instance's nearest neighbors
- use when you have labels, many instances, few features
Dimensionality
Perceptron
Maximal Margin Classifier
- the number of dimensions
- Support Vector Machine (SVM)
- Lines or planes only
- Perceptron: the separator line
- width of line; number of lines
- MMC: Maximal Margin Classifier -- the largest margin around the boundary
Support Vector Classifier
Support Vector Machines
Decision Trees
- Support Vectors: points that matter
- Use Support Vector Classifier if it is not linear separable
- Use loss function
- the kernel trick
- add another dimension
- use a plane to cut
- transformed the underlying space
- use if you need a flexible boundary
- Tree-based Methods
- If this, then that rule
- Pruning algorithm - tuning step
Interpretability Debate
Tree and SVM Compared
Boosted Aggregation
- Decision Trees: easy to describe, hard to interpret
- Interpretability Debate:
- trees finds rules that split the space on one feature at a time
- alter features or alter algorithm
- Bagging: sample it with replacement
Random Forest
Ensemble Models
Bayes Rule
- Random Forests: sample instance, sample features, repeat
- Ensembles: build lots of different models; let each vote on a new instance
- Bayes Rule:
P() -> probability of;
A -> label value;
B -> feature value or evidence;
| -> condition on; - P(A|B) = P(B|A) P(A) / P(B)
- P(cat|evidence) = P(evidence|cat) * P(cat) / P(evidence)
Naive Bayes Classifier
What is Regression?
Linear Regression
- Naive Bayes assumes features are all independent
- Use: have category features; many features; categorical features
- Regression: fitting models
- Have numerical labels
- Value of a feature is more meaningful than just a threshold
- More Regression:
- logistic regression
- poisson regression
- polynomial regression
- kernel regression
Logistic Regression
Sigmoid Functions
Ranking and Classification at Scale
- Logistic Regression: used for binary classification
- Interpret as a probability
- Diminishing returns
- if you have binary labels
- want to do ranking or get probabilities as labels
Deep Learning by Analogy
What's Inside a Neural Network
Automatic Feature Extraction
- Deep Learning: more than one layer
- All neural networks today are deep learning
- layers upon layers of data transformation
- Hidden Layer: neuron
- Activation Function: to add non-linearity
- Weighted Sum:
- ReLU
- High Level Representation
- Let's you exploit complex structure in data
- Check your data set!
Components of a Neural Network
Backpropagation
Gotchas of Deep Learning
- How many layers to use
- How many units in each layer
- What activation function to use
- What is being learned: Weights
- Backpropagation: optimizing weights
- forward propagation
- backward propagation
- Don't set to zero or the same number
- Pick random starting weights
- Pros: complex transformations might succeed where all other models failed
- Cons: more effort and resources to train than simpler models; overfitting
Neural Network Architecture
When to use neural networks
- Pick the architecture
- Try it first if ...
- you expect complicated relationships between features and labels