Guide to AI Algorithms

“AI is good at describing the world as it is today with all of its biases, but it does not know how the world should be.”
--  Joanne Chen, Partner, Foundation Capital

Opening the Black Box

Clustering and k-Means

Lazy Learning and k-NN

  • Black Box
  • Algorithms to cover
  • Clustering: goal is to find homogeneous groups in the data
  • k-Means: "k" equals number of clusters; "means" is the center / centroids to be computed
  • Picking k: Analytics -- start with 2 and look for anything interesting; Machine Learning -- based upon objectives of ML
  • Converge: Yes, cause it involves a loss function
  • Same Result: No 
  • k-NN: k Nearest Neighbors; classification technique
  • nothing in common with k-Means
  • k is a hyperparameter 
  • k-NN classifies based on the labels of an instance's nearest neighbors
  • use when you have labels, many instances, few features

Dimensionality

Perceptron

Maximal Margin Classifier

  • the number of dimensions
  • Support Vector Machine (SVM)
  • Lines or planes only
  • Perceptron: the separator line
  • width of line; number of lines
  • MMC: Maximal Margin Classifier -- the largest margin around the boundary

Support Vector Classifier

Support Vector Machines

Decision Trees

  • Support Vectors: points that matter
  • Use Support Vector Classifier if it is not linear separable 
  • Use loss function
  • the kernel trick
  • add another dimension
  • use a plane to cut 
  • transformed the underlying space
  • use if you need a flexible boundary
  • Tree-based Methods
  • If this, then that rule

  • Pruning algorithm - tuning step

Interpretability Debate

Tree and SVM Compared

Boosted Aggregation

  • Decision Trees: easy to describe, hard to interpret 
  • Interpretability Debate
  • trees finds rules that split the space on one feature at a time
  • alter features or alter algorithm
  • Bagging: sample it with replacement  

Random Forest

Ensemble Models

Bayes Rule

  • Random Forests: sample instance, sample features, repeat
  • Ensembles: build lots of different models; let each vote on a new instance
  • Bayes Rule:
    P() -> probability of;
    A -> label value;
    B -> feature value or evidence;
    | -> condition on;
  • P(A|B) = P(B|A) P(A) / P(B) 
  • P(cat|evidence) = P(evidence|cat) * P(cat) / P(evidence)

Naive Bayes Classifier

What is Regression?

Linear Regression

  • Naive Bayes assumes features are all independent
  • Use: have category features; many features; categorical features
  • Regression: fitting models
  •  
  • Have numerical labels
  • Value of a feature is more meaningful than just a threshold
  • More Regression:
    • logistic regression
    • poisson regression
    • polynomial regression
    • kernel regression  

Logistic Regression

Sigmoid Functions

Ranking and Classification at Scale

  • Logistic Regression: used for binary classification
  • Interpret as a probability
  • Diminishing returns
  • if you have binary labels
  • want to do ranking or get probabilities as labels

Deep Learning by Analogy

What's Inside a Neural Network

Automatic Feature Extraction

  • Deep Learning: more than one layer
  • All neural networks today are deep learning
  • layers upon layers of data transformation
  • Hidden Layer: neuron
  • Activation Function: to add non-linearity
  • Weighted Sum:
  • ReLU 
  • High Level Representation
  • Let's you exploit complex structure in data
  • Check your data set!

Components of a Neural Network

Backpropagation

Gotchas of Deep Learning

  • How many layers to use
  • How many units in each layer
  • What activation function to use
  • What is being learned: Weights
  • Backpropagation: optimizing weights
  • forward propagation
  • backward propagation
  • Don't set to zero or the same number
  • Pick random starting weights
  • Pros: complex transformations might succeed where all other models failed
  • Cons: more effort and resources to train than simpler models; overfitting

Neural Network Architecture

When to use neural networks

  • Pick the architecture

  • Try it first if ...
  • you expect complicated relationships between features and labels