A soft intro to Linear Algebra (LA-1)

In this post I would be assuming that the reader is familiar with matrices and their basic operations. We are going to cover the following topics in this post:

Linear independence and Dimension

System of Linear Equations

I am including this section for the sake of completeness. I had to include this because it’s actually the basis of linear algebra. If you know about matrices which is assumed here then you know about solving system of linear equations, when they possess solutions, when they do not, when the solution is unique. In general, a system of linear equations with real coefficients looks like this:

System of ‘m’ linear equations in ‘n’ unknowns

This can be written in compact form using matrices as: $AX=B$ , where

From this you can easily guess that wherever there is some sort of solving of system of linear equations we see matrix which is an essential part of linear algebra. A matrix is a representation of a linear mapping which we would be focusing on in next post on linear algebra. So no wonder why we use various decomposition techniques of matrices for solving and simplifying our system of linear equations. The set of all real matrices is a vector space but what actually is a vector space, let’s see.

Vector Space

In simple sense, it is the generalization of vectors and their operations. A vector space is a mathematical object in which there is a binary operation called addition, which makes sense along with subtraction, contains a zero vector also called origin, multiplication by scalars, here scalars mean either set of real numbers or set of complex numbers. I am not writing all the properties in detail because it would most certainly scare you. In this post we would be dealing only with vector spaces where underlying scalar field is $\mathbb{R}$ . Such a vector space is called a real vector space or a vector space over $\mathbb{R}$ . A vector space is also called a linear space. The simplest example can be the real line, slightly more better example is the plane, $\mathbb{R}^2$ which is a set of 2-tuples,i.e., $(x,y)$ where $x,y \in \mathbb{R}$ . We generally denote an element of a vector space which is called as a vector by small letters such as $u,v,w$ , so here let $v_1=(x_1,y_1),v_2=(x_2,y_2) \in \mathbb{R}^2$ be two vectors. How do we add these two and multiply it with a real number? Let $a\in \mathbb{R}$ then $a.v_1=a.(x_1,y_1)=(ax_1,ay_1)$ , so multiplication with scalar makes sense. For addition, $v_1+v_2=(x_1,y_1)+(x_2,y_2)=(x_1+x_2,y_1+y_2)$ which is again an element, a vector in $\mathbb{R}^2$ .

Examples:

$\mathbb{R}^n$ is a vector space over $\mathbb{R}$ for any $n \in \mathbb{N}$ with the operations defined above for $\mathbb{R}^2$ .
Let $P_m(x)$ be the set of all real polynomials of degree less than or equal to $m$ , i.e. $\{a_0+a_1 x+a_2 x^2 +...+a_m x^m: a_i\in \mathbb{R} \forall i\}$ . You can easily show that this is a vector space with usual polynomial addition & subtraction, contains a zero polynomial, also multiplication by scalar real value is well defined, so it satisfies all the properties needed for it to be a vector space.
Consider $C([0,1])$ , the set of all real valued continuous functions defined on the interval $[0,1]$ with addition defined as $(f+g)(x)=f(x)+g(x), \forall f,g \in C([0,1])$ , it is called as pointwise addition. It is also a vector space with this operation, zero function works as origin.
Consider $M_{m\times n}$ , the set of all real matrices of order $m\times n$ , with componentwise (usual) addition and subtraction, usual scalar multiplication by reals, where zero matrix is the origin or zero vector. With all these $M_{m\times n}$ is a real vector space.

Non-examples:

$\mathbb{R} \backslash \{0\}$ , that is the set of real numbers without $0$ is not a vector space under usual addition, subtraction and scalar multiplication since it does not contain origin, zero vector.
Let $Q_n(x)$ be the set of all real polynomials of degree $n$ , this is not a vector space because sum of two elements in it may not be again an element of it. For example, let $n=2$ , $a(x)=5x^2+2x+1$ and $b(x)=-5x^2+3x+3$ , sum of these two polynomials is $5x+4$ which is not a second degree polynomial.

Vector subspace:

We call a subset $W$ of a vector space $V$ , a vector subspace if that subset is itself a vector space, that is $W$ satisfies all the properties of the vector space. The simple example is a straight line passing through origin is vector subspace of the plane, and a straight line which does not pass through origin in plane is not a vector subspace. In the following diagram $W$ is vector subspace while $U$ is not. We would come across more important types of vector subspaces later.

Linear Independence

In simple sense we say two vectors are linearly independent when one vector say $v_1$ is not a multiple of say another vector $v_2$ that is, $v_1 \neq \alpha. v_2,$ for any $\alpha \in \mathbb{R}$ . Extending this to the case of many vectors we say that $v$ is a linear combination of say $\{v_1,v_2,...,v_k\}$ if $v=\sum^{k}_{i=1}\alpha_i v_i$ for some $\alpha_i \in \mathbb{R},i=1,...,k$ . These $\alpha_i$ ‘s are called the coefficients of the linear combination, and when not all of them are zero the linear combination is called non-trivial. We say that the set of vectors $\{v_1,v_2,...,v_k\}$ is linearly independent when none of them can be written as a linear combination of the rest of the vectors in the given set. In other words, when $a_1v_1+a_2v_2+...+a_kv_k=0$ , then all the $a_i$ ‘s must be zero otherwise we can write, say if $a_1 \neq 0$ , $v_1=\frac{1}{a_1}(a_2v_2+...+a_kv_k)$ which means that the given set of vectors is not linearly independent. We use this method to find whether a given set of vectors is linearly independent. If we can find some non-trivial linear combination which sums to $0$ , then we say that the set is linearly dependent. Consider $\{(1,0),(2,3)\}$ , to check for linear independency we equate their linear combination to zero, we have $a(1,0)+b(2,3)=(0,0) \implies (a+2b,3b)=(0,0)$ which gives $b=0$ and hence $a=0$ , so these two vectors are linearly independent. Now add one more vector to this set say, $(1,1)$ and again do the same thing. In this case we get, $a(1,0)+b(2,3)+c(1,1)=(0,0) \implies (a+2b+c,3b+c)=(0,0)$ which is a system of two equations in three unknowns so we can choose any non-zero value for $c$ and solve this system, therefore not all the coefficients are zero hence these vectors are linearly dependent. In fact, you can check that in plane that is, in $\mathbb{R}^2$ any three vectors are going to be linearly dependent, which is the next topic for our discussion, what is dimension(for a vector space)?

Dimension of a vector space:

We have already defined what a linear combination is. The set of all linear combination of say $\{v_1,v_2,...,v_k\} \subset V$ is say $W=\{v:v= \sum^{k}_{i=1}\alpha_i v_i$ for some $\alpha_i \in \mathbb{R},i=1,...,k\}$ . You can check that $W$ is a vector subspace of $V$ . The vector subspace $W$ generated by $\{v_1,v_2,...,v_k\}$ is called a linear span of these vectors or the subspace generated by these vectors. We call a subset $\{v_1,v_2,...,v_k\}$ a basis of $V$ if $V$ is generated by $\{v_1,v_2,...,v_k\}$ and $\{v_1,v_2,...,v_k\}$ is a linearly independent set. The dimension of a vector space is defined as the number of vectors in the basis set. See here I am calling it ‘The’ dimension not ‘a’ dimension because I can prove that any basis of a vector space has the same number of vectors. The cardinality of any two basis would be same not necessarily the elements or vectors for example, $\{(1,0),(0,1)\}$ is a basis of $\mathbb{R}^2$ , in fact it is called a standard basis and $\{(1,1),(-1,2)\}$ is also a basis of $\mathbb{R}^2$ (you can check it by verifying that any element or vector of $\mathbb{R}^2$ can be written as a linear combination of these two vectors). The rough idea about the same number of basis element is that suppose we have two basis one with $n$ vectors and other with $m$ vectors both generating the vector space $V$ , then without loss of generality assume that $n >m$ . Now we know that basis with $m$ elements also generates the whole space $V$ so then other extra vectors in another basis with $n$ elements must be in some linear combination of the basis with $m$ elements but as we know both are basis so we must have $n=m$ .

Linear Algebra and its scope

The linear algebra began with the study of system of linear equations history of which goes back to an ancient Chinese text and evolved into more abstract study of finite dimensional vector spaces and the linear transformations/mappings between them. The modern way of doing linear algebra is mostly attributed to Hermann Grassmann, J.J. Sylvester and Arthur Cayley, for the brief history refer this. The eigenvalues & eigenvectors, the diagonalization, the Cholesky decomposition, the QR decomposition, singular value decomposition, the rational and Jordan normal forms, these are some of the most important concepts and tools in linear algebra. The modern treatment of data is in matrix form (arrays) with features and targets as columns so various classification and regression method requires careful use of techniques from linear algebra. These methods and techniques from linear algebra is used in Machine Learning in various ways such as least squares method in regression analysis, singular value decomposition (SVD) in Principal Components Analysis (PCA), linear maps to transform non-linear data to linear form using Locally Linear Embedding (LLE), other decomposition techniques to simplify the computations in algorithms and in dimension reduction of the given data. So the good knowledge of linear algebraic tools is essential in understanding of various machine learning and Artificial Intelligence algorithms.

For basic introduction of linear algebra one could refer to [1] and for higher abstract level [2] is good enough. And if you are interested in matrix computations [3] is a fine book.

References

Learn, Think & Do