# Why is the second Principal Component orthogonal to the first one?

Because **the second Principal Component** should capture the highest variance **from what is left** after the first Principal Component explains the data as much as it can. (The first principal component has the largest possible variance, that is, accounts for as much of the variability in the data as possible.)

Then where should we look for the leftover variance from the first?

Orthogonal direction.

Below is the python code that generates the above graph.

# Before running PCA, it is important to first normalize X

X_norm, mu, sigma = featureNormalize(X)# Run PCA

U, S, V = pca(X_norm)plt.figure(figsize=(7,5))

plot = plt.scatter(X[:,0], X[:,1], s=30, facecolors=’none’, edgecolors=’b’)plt.title(“PCA — Eigenvectors Shown”,fontsize=20)

plt.xlabel(‘x1’,fontsize=16)

plt.ylabel(‘x2’,fontsize=16)

plt.grid(True)plt.plot([mu[0], mu[0] + 1.5*S[0]*U[0,0]],

[mu[1], mu[1] + 1.5*S[0]*U[0,1]],

color=’red’,linewidth=3,

label=’First Principal Component’)

plt.plot([mu[0], mu[0] + 1.5*S[1]*U[1,0]],

[mu[1], mu[1] + 1.5*S[1]*U[1,1]],

color=’green’,linewidth=3,

label=’Second Principal Component’)

leg = plt.legend(loc=4)

plt.show(block=False)