SciPy Tutorial

SciPy, pronounced as Sigh Pi, is a scientific python open source, distributed under the BSD licensed library to perform Mathematical, Scientific and Engineering Computations.

The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays and provides many user-friendly and efficient numerical practices such as routines for numerical integration and optimization. Together, they run on all popular operating systems, are quick to install and are free of charge. NumPy and SciPy are easy to use, but powerful enough to depend on by some of the world's leading scientists and engineers.

Basic Functionality

By default, all the NumPy functions have been available through the SciPy namespace. There is no need to import the NumPy functions explicitly, when SciPy is imported. The main object of NumPy is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In NumPy, dimensions are called as axes. The number of axes is called as rank.

Now, let us revise the basic functionality of Vectors and Matrices in NumPy. As SciPy is built on top of NumPy arrays, understanding of NumPy basics is necessary. As most parts of linear algebra deals with matrices only.

NumPy Vector

A Vector can be created in multiple ways. Some of them are described below.

Converting Python array-like objects to NumPy
Let us consider the following example.

import numpy as np
list = [1,2,3,4]
arr = np.array(list)
print arr 

The output of the above program will be as follows.

[1 2 3 4] 

Intrinsic NumPy Array Creation

NumPy has built-in functions for creating arrays from scratch. Some of these functions are explained below.

Using zeros()

The zeros(shape) function will create an array filled with 0 values with the specified shape. The default dtype is float64. Let us consider the following example.

import numpy as np
print np.zeros((2, 3)) 

The output of the above program will be as follows.

array([ [ 0., 0., 0.] ,
[ 0., 0., 0.] ])

Using ones()

The ones(shape) function will create an array filled with 1 values. It is identical to zeros in all the other respects. Let us consider the following example.

import numpy as np
print np.ones((2, 3)) 

The output of the above program will be as follows.

array([ [ 1., 1., 1.],
[ 1., 1., 1.] ]) 

Using arange()

The arange() function will create arrays with regularly incrementing values. Let us consider the following example.

import numpy as np
print np.arange(7) 

The above program will generate the following output.

 array([0, 1, 2, 3, 4, 5, 6]) 

Defining the data type of the values

Let us consider the following example.

 import numpy as np
arr = np.arange(2, 10, dtype = np.float)
print arr
print "Array Data Type :",arr.dtype 

The above program will generate the following output.

[ 2. 3. 4. 5. 6. 7. 8. 9.]
Array Data Type : float64 

Using linspace()

The linspace() function will create arrays with a specified number of elements, which will be spaced equally between the specified beginning and end values. Let us consider the following example.

import numpy as np
print np.linspace(1., 4., 6) 

The above program will generate the following output.

array([ 1. , 1.6, 2.2, 2.8, 3.4, 4. ]) 

Matrix

A matrix is a specialized 2-D array that retains its 2-D nature through operations. It has certain special operators, such as * (matrix multiplication) and ** (matrix power). Let us consider the following example.

import numpy as np
print np.matrix('1 2; 3 4') 

The above program will generate the following output.

matrix([ [1, 2],
[3, 4]  ])

Conjugate Transpose of Matrix

This feature returns the (complex) conjugate transpose of self. Let us consider the following example.

import numpy as np 
mat = np.matrix('1 2; 3 4')
print mat.H

The above program will generate the following output.

matrix([ [1, 3],
        [2, 4] ]) 

Transpose of Matrix

This feature returns the transpose of self. Let us consider the following example.

import numpy as np
mat = np.matrix('1 2; 3 4')
mat.T 

The above program will generate the following output.

matrix([ [1, 3],
        [2, 4]  ])

When we transpose a matrix, we make a new matrix whose rows are the columns of the original. A conjugate transposition, on the other hand, interchanges the row and the column index for each matrix element. The inverse of a matrix is a matrix that, if multiplied with the original matrix, results in an identity matrix.

Cluster

K-means clustering is a method for finding clusters and cluster centers in a set of unlabelled data. Intuitively, we might think of a cluster as – comprising of a group of data points, whose inter-point distances are small compared with the distances to points outside of the cluster. Given an initial set of K centers, the K-means algorithm iterates the following two steps −

  • For each center, the subset of training points (its cluster) that is closer to it is identified than any other center.
  • The mean of each feature for the data points in each cluster are computed, and this mean vector becomes the new center for that cluster.

These two steps are iterated until the centers no longer move or the assignments no longer change. Then, a new point x can be assigned to the cluster of the closest prototype. The SciPy library provides a good implementation of the K-Means algorithm through the cluster package. Let us understand how to use it.

K-Means Implementation

We will understand how to implement K-Means in SciPy.

Import K-Means

We will see the implementation and usage of each imported function.

from SciPy.cluster.vq import kmeans,vq,whiten 

Data generation

We have to simulate some data to explore the clustering.

from numpy import vstack,array 
from numpy.random import rand
# data generation with three features
data = vstack((rand(100,3) + array([.5,.5,.5]),rand(100,3))

N ow, we have to check for data. The above program will generate the following output.

array([ [ 1.48598868e+00, 8.17445796e-01, 1.00834051e+00],
       [ 8.45299768e-01, 1.35450732e+00, 8.66323621e-01],
       [ 1.27725864e+00, 1.00622682e+ 00, 8.43735610e-01],
       …………….

Normalize a group of observations on a per feature basis. Before running K-Means, it is beneficial to rescale each feature dimension of the observation set with whitening. Each feature is divided by its standard deviation across all observations to give it unit variance.

Whiten the data

We have to use the following code to whiten the data.

# whitening of data
data = whiten(data) 

Compute K-Means with Three Clusters

Let us now compute K-Means with three clusters using the following code.

# computing K-Means with K = 3 (2 clusters)
centroids,_ = kmeans(data,3) 

The above code performs K-Means on a set of observation vectors forming K clusters. The K-Means algorithm adjusts the centroids until sufficient progress cannot be made, i.e. the change in distortion, since the last iteration is less than some threshold. Here, we can observe the centroid of the cluster by printing the centroids variable using the code given below.

 print(centroids)

The above code will generate the following output.

print(centroids)[ [ 2.26034702 1.43924335 1.3697022 ] 
                  [ 2.63788572 2.81446462 2.85163854]
                  [ 0.73507256 1.30801855 1.44477558] ]

Assign each value to a cluster by using the code given below.

# assign each sample to a cluster 
clx,_ = vq(data,centroids)

The vq function compares each observation vector in the ‘M’ by ‘N’ obs array with the centroids and assigns the observation to the closest cluster. It returns the cluster of each observation and the distortion. We can check the distortion as well. Let us check the cluster of each observation using the following code.

# check clusters of observation
print (clx) 

The above code will generate the following output.

array([1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 2, 0, 2, 0, 1, 1, 1,
0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0,
0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 0, 2, 2, 2, 2, 2, 0, 0, 
2, 2, 2, 1, 0, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 0, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)

The distinct values 0, 1, 2 of the above array indicate the clusters.

Constants

SciPy constants package provides a wide range of constants, which are used in the general scientific area.

SciPy Constants Package

The scipy.constants package provides various constants. We have to import the required constant and use them as per the requirement. Let us see how these constant variables are imported and used.

To start with, let us compare the ‘pi’ value by considering the following example.

#Import pi constant from both the packages
from scipy.constants import pi
from math import pi
print("sciPy - pi = %.16f"%scipy.constants.pi)
print("math - pi = %.16f"%math.pi)

The above program will generate the following output.

sciPy - pi = 3.1415926535897931
math - pi = 3.1415926535897931  

The easy way to get which key is for which function is with the scipy.constants.find() method. Let us consider the following example.

import scipy.constants
res = scipy.constants.physical_constants["alpha particle mass"]
print res 

The above program will generate the following output.

 [
   'alpha particle mass',
   'alpha particle mass energy equivalent',
   'alpha particle mass energy equivalent in MeV',
   'alpha particle mass in u',
   'electron to alpha particle mass ratio'
]

 The list of keys, else nothing if the keyword does not match.

Integrate

When a function cannot be integrated analytically, or is very difficult to integrate analytically, one generally turns to numerical integration methods. SciPy has a number of routines for performing numerical integration. Most of them are found in the same scipy.integrate library.

Single Integrals

The Quad function is the workhorse of SciPy’s integration functions. Numerical integration is sometimes called quadrature, hence the name. It is normally the default choice for performing single integrals of a function f(x) over a given fixed range from a to b.

The general form of quad is scipy.integrate.quad(f, a, b), Where ‘f’ is the name of the function to be integrated. Whereas, ‘a’ and ‘b’ are the lower and upper limits, respectively. Let us see an example of the Gaussian function, integrated over a range of 0 and 1.

We first need to define the function → f(x)=e−x^2, this can be done using a lambda expression and then call the quad method on that function.

import scipy.integrate
from numpy import exp 
f= lambda x:exp(-x**2)
i = scipy.integrate.quad(f, 0, 1)
print i

The above program will generate the following output.

(0.7468241328124271, 8.291413475940725e-15) 

The quad function returns the two values, in which the first number is the value of integral and the second value is the estimate of the absolute error in the value of integral.

Note − Since quad requires the function as the first argument, we cannot directly pass exp as the argument. The Quad function accepts positive and negative infinity as limits. The Quad function can integrate standard predefined NumPy functions of a single variable, such as exp, sin and cos.

Multiple Integrals

The mechanics for double and triple integration have been wrapped up into the functions dblquad, tplquad and nquad. These functions integrate four or six arguments, respectively. The limits of all inner integrals need to be defined as functions.

Double Integrals

The general form of dblquad is scipy.integrate.dblquad(func, a, b, gfun, hfun). Where, func is the name of the function to be integrated, ‘a’ and ‘b’ are the lower and upper limits of the x variable, respectively, while gfun and hfun are the names of the functions that define the lower and upper limits of the y variable.

We define the functions f, g, and h, using the lambda expressions. Note that even if g and h are constants, as they may be in many cases, they must be defined as functions, as we have done here for the lower limit.

import scipy.integrate
from numpy import exp
from math import sqrt
f = lambda x, y : 16*x*y
g = lambda x : 0
h = lambda y : sqrt(1-4*y**2) 
i = scipy.integrate.dblquad(f, 0, 0.5, g, h)
print i

The above program will generate the following output.

(0.5, 1.70923500125 94845e-14)

In addition to the routines described above, scipy.integrate has a number of other integration routines, including nquad, which performs n-fold multiple integration, as well as other routines that implement various integration algorithms. However, quad and dblquad will meet most of our needs for numerical integration.

Page structure