Welcome!
BA 7060
Orientation
Blog Post
- Student Driven Blog
Orientation
Community
Pillar 1
Business
Intelligence
Pillar 2
Artificial Intelligence
Machine Learning
Pillar 3
Quantum
Computing
Pillar 4
Programming
Language
Knowledge
Base

Guide to AI Algorithms

“AI is good at describing the world as it is today with all of its biases, but it does not know how the world should be.”
-- Joanne Chen, Partner, Foundation Capital

AI Algorithms

Clustering and k-Means
Lazy learning and k-NN
Perceptron
Maximal Margin Classifier
Support Vector Classifier
Support Vector Machines
Decision Trees
Boosted Aggregation
Random Forests
Ensemble Models
Naive Bayes
Linear Regression
Logistic Regression
Neural Networks / Deep Learning

Table of Contents

Opening the Black Box (0:57)
Clustering and k-Means (7:48)
Lazy Learning and k-NN (4:17)
The Curse Of Dimensionality (2:07)
Perceptron (1:58)
Maximal Margin Classifier (1:32)
Support Vector Classifier (3:27)
Support Vector Machines (5:17)
Decision Trees (5:45)
Interpretability Debate (2:23)
Tree and SVM Compared (2:48)
Boosted Aggregation (1:37)
Random Forest (1:24)
Ensemble Models (2:32)
Naive Bayes (2:57)

Naive Bayes Classifier (4:34)
What is Regression (1:49)
Linear Regression (2:39)
Logistic Regression (3:33)
Sigmoid Functions (2:24)
Ranking and Classification at Scale (1:40)
Deep Learning by Analogy (6:24)
What's Inside a Neural Network (5:16)
Using AI For Automatic Feature Extraction (4:17)
Components of a Neural Network (2:27)
Backpropagation (3:23)
Gotchas of Deep Learning (3:17)
Neural Network Architecture (2:03)
When to Use Neural Networks (2:43)

Opening the Black Box

Clustering and k-Means

Lazy Learning and k-NN

Black Box
Algorithms to cover

Clustering: goal is to find homogeneous groups in the data
k-Means: "k" equals number of clusters; "means" is the center / centroids to be computed
Picking k: Analytics -- start with 2 and look for anything interesting; Machine Learning -- based upon objectives of ML
Converge: Yes, cause it involves a loss function
Same Result: No

k-NN: k Nearest Neighbors; classification technique
nothing in common with k-Means
k is a hyperparameter
k-NN classifies based on the labels of an instance's nearest neighbors
use when you have labels, many instances, few features

Dimensionality

Perceptron

Maximal Margin Classifier

the number of dimensions

Support Vector Machine (SVM)
Lines or planes only
Perceptron: the separator line

width of line; number of lines
MMC: Maximal Margin Classifier -- the largest margin around the boundary

Support Vector Classifier

Support Vector Machines

Decision Trees

Support Vectors: points that matter
Use Support Vector Classifier if it is not linear separable
Use loss function

the kernel trick
add another dimension
use a plane to cut
transformed the underlying space
use if you need a flexible boundary

Tree-based Methods
If this, then that rule
Pruning algorithm - tuning step

Interpretability Debate

Tree and SVM Compared

Boosted Aggregation

Decision Trees: easy to describe, hard to interpret
Interpretability Debate:

trees finds rules that split the space on one feature at a time
alter features or alter algorithm

Bagging: sample it with replacement

Random Forest

Ensemble Models

Bayes Rule

Random Forests: sample instance, sample features, repeat

Ensembles: build lots of different models; let each vote on a new instance

Bayes Rule:
P() -> probability of;
A -> label value;
B -> feature value or evidence;
| -> condition on;
P(A|B) = P(B|A) P(A) / P(B)
P(cat|evidence) = P(evidence|cat) * P(cat) / P(evidence)

Naive Bayes Classifier

What is Regression?

Linear Regression

Naive Bayes assumes features are all independent
Use: have category features; many features; categorical features

Regression: fitting models

Have numerical labels
Value of a feature is more meaningful than just a threshold
More Regression:
- logistic regression
- poisson regression
- polynomial regression
- kernel regression

Logistic Regression

Sigmoid Functions

Ranking and Classification at Scale

Logistic Regression: used for binary classification
Interpret as a probability

Diminishing returns

if you have binary labels
want to do ranking or get probabilities as labels

Deep Learning by Analogy

What's Inside a Neural Network

Automatic Feature Extraction

Deep Learning: more than one layer
All neural networks today are deep learning
layers upon layers of data transformation
Hidden Layer: neuron
Activation Function: to add non-linearity

Weighted Sum:
ReLU

High Level Representation
Let's you exploit complex structure in data
Check your data set!

Components of a Neural Network

Backpropagation

Gotchas of Deep Learning

How many layers to use
How many units in each layer
What activation function to use
What is being learned: Weights

Backpropagation: optimizing weights
forward propagation
backward propagation

Don't set to zero or the same number
Pick random starting weights
Pros: complex transformations might succeed where all other models failed
Cons: more effort and resources to train than simpler models; overfitting

Neural Network Architecture

When to use neural networks

Pick the architecture

Try it first if ...
you expect complicated relationships between features and labels

Mike Ilitch School of Business
2771 Woodward Avenue
Detroit, MI 48201
(313) 577-4542

Welcome!
BA 7060

Orientation
Blog Post

Orientation
Community

Pillar 1
Business
Intelligence

Pillar 2
Artificial Intelligence
Machine Learning

Pillar 3
Quantum
Computing

Pillar 4
Programming
Language

Knowledge
Base

© 2023 | Designed with by John Heinrichs - All rights reserved. | Agency M-1 Privacy Policy | WSU Web Privacy Policy