22. SVM

Can you calculate this by hand? ✍️

Mar 18, 2024

Support Vector Machines (SVMs) reigned supreme in machine learning before the ascendancy of the deep learning revolution.

This exercise compares Linear vs RBF SVMs---how they classify two test vectors, after SVMs are trained from six training vectors.

Walkthrough

[1] Given

xi: Six training vectors (blue rows 🟦)
yi: Labels

Using xi and yi, we learned ai and b (red borders):
↳ai: coefficient for each training vector i.
• Non-zero: A Support Vector that defines the decision boundary
• Zero: Too far from the decision boundary, ignored
↳b: bias (how much the decision boundary should be shifted)

x’j: Two test vectors (yellow columns 🟨)

(To simplify hand calculation, training and test vectors are not normalized.)

~ 𝗟𝗶𝗻𝗲𝗮𝗿 𝗦𝗩𝗠 ~

🟩 Kernel Matrix (K)  [2]-[3]

[2] Test Vector 1
↳ Take dot product between the test vector 🟨 and every training vector 🟦
↳ The dot product approximates the “cosine similarity” between two vectors
↳ Output: 1st column of K

[3] Test Vector 2
↳ Similar to [2]
↳ Output: 2nd column of K

🟧 Decision Boundary [4]-[6]

[4] Unsigned Coefficients → Signed Weights
↳ Multiply each coefficient with the corresponding label
↳ The 2nd training vector is NOT a support vector because its coefficient is 0.

[5] Weighted Combination
↳ Multiply weights and bias with K
↳ Output: “signed” distance to the decision boundary

...
X'1: (2)*13+0+(1)*12+(-1)*15+(1)*19+(-1)*27+(2) = 17
X'2: (2)*2+0+(1)*3+(-1)*4+(1)*8+(-1)*8+(2) = 5
...

[6] Classify
↳ Take the sign
...
X'1: 1 > 0 →  Positive +
X'2: 5 > 0 →  Positive +
...

~ 𝗥𝗕𝗙 𝗦𝗩𝗠 ~

Given
ai: Learned coefficients
b: Learned bias

🟩 Kernel Matrix (K)  [7]-[15]

𝘛𝘦𝘴𝘵 𝘝𝘦𝘤𝘵𝘰𝘳 (𝘟’1) 🟨
L2 Distance 🟪 [7]-[9]
[7] Squared Difference
...
i=1: (1-2)^2=1,  (2-4)^2=4, (1-3)^2=4
...
[8] Sum
[9] Square Root

[10] Negative Scaling
↳ Multiply by -1: Note that L2 is a distance metric. The negation converts distance to similarity.
↳ Multiply by gamma γ: The purpose is to control how much influence each training example has. A small gamma means each training example pulls the decision boundary more lightly, resulting in smoother decision boundaries.
↳ The result is “negative scaled L2”

[11] Exponentiate
↳ Raise e to the power of the “negative scaled L2”
↳ Use the provided table to look up the value of e^
↳ Output: The 1st column of K

𝘛𝘦𝘴𝘵 𝘝𝘦𝘤𝘵𝘰𝘳 2 (𝘟’2)
L2 Distance 🟪 [12]-[14]
[12] Squared Difference
[13] Sum
[14] Square Root

[15] Negative Scaling

[16] Exponentiate
↳Output: The 2nd column of K

🟧 Decision Boundary [17]-[19]

[17] Unsigned Coefficients → Signed Weights

[18] Weighted Combination
...
X'1: (20)*0.7+(-10)*0.8+(10)*0.9+(-10)*0.7+0+(-10)*0.9+(-1) = -2
X'2: (20)*0.9+(-10)*0.7+(10)*0.8+(-10)*0.8+0+(-10)*0.9+(-1) = 1
...

[19] Classify
...
X'1: -2 < 0 →  -
X'2: 1 > 0 →  +
...

Download

22 Svm Export

4.33MB ∙ PDF file

Download

AI by Hand ✍️

22. SVM

Can you calculate this by hand? ✍️

Walkthrough

Download