### What Is Agglomerative Clustering Algorithm

Agglomerative clustering algorithm is one of unsupervised machine learning algorithm. That used for classifying data which is not labelled. In Agglomerative clustering first each data point is consider as one cluster and at every iteration closet cluster merge together and form a new cluster.This process is repeat until all data points are not placed in appropriate cluster.

Agglomerative clustering follow bottom, top approach to merge clusters. This algorithm generates a hierarchy of data points according to nearest data point that represent as .

Agglomerative clustering work on distance of the data point .To calculate distance between two point use the **Euclidean** **distance formula**

### Method Of Calculating Distance Between Two Cluster

Basically, many methods for calculating the distance between cluster are available. Some methods explain below.

- Single Nearest Distance: In single nearest distance method calculates distance between two clusters based on the nearest point of two clusters
. This also called as Simple Linkage or Nearest Point Algorithm. - Complete Farthest Distance: In complete farthest distance method calculates distance between two clusters based on the farthest point of two clusters
. This also called as Complete Linkage or Farthest Point Algorithm or Voor Hees Algorithm. - Average Distance: In average distance method calculates distance based on the average of all the distances of all pairs of point in clusters. This is also called as average linkage or UPGMA (Unweighted Pair Group Mean Averaging) algorithm.
- Centroid Distance: In centroid distance method calculated distance between clusters centroid point
. This is also called as centroid linkage or UPGMC( Unweighted Pair-Group Method uses Centroids)

### To Implement Agglomerative Clustering Algorithm Need To Install Required Packages

pip install numpy pip install padas pip install matplotlib pip install sklearn pip install scipy

Here we implement an example of car classification using agglomerative clustering using slang and scipy package

### Data Set On Which Clustering Algorithm Apply

### Import Require Packages

```
import numpy as np
#this package is used to read data from csv file
import pandas as pd
#this used to plot graph
import matplotlib.pyplot as plt
#this used for color making
import matplotlib.cm as cm
#this module used to generate distance matrix
from scipy.spatial import distance_matrix
#this module used to pefrom agglomerative clustering
from scipy.cluster.hierarchy import linkage
#this module used to generate dendograms
from scipy.cluster.hierarchy import dendrogram
#thtis is used to generate N clusters from dendgogram
from scipy.cluster.hierarchy import fcluster
#this used in preprocessing
from sklearn.preprocessing import MinMaxScaler
#this is useed to perform clustering
from sklearn.cluster import hierarchical
from sklearn.cluster import AgglomerativeClustering
```

### Read Data From CSV File And Data Pre-processing

To read data from csv file, use the pandas package that has a read csv method that help to read data from csv file and return data frame object. In data pre-processing step removes null value from the data set and scale all value between 0 to 1.Also generate a distance matrix for the dataset.

```
data=pd.read_csv('cars_clus.csv')
#convert into numeric value
data=data.apply(pd.to_numeric,errors='coerce')
#drop rows that contain null value
data=data.dropna()
#reset the index after drop null rows
data=data.reset_index(drop=True)
#copy dataset that used to apply algorithm using diifrent method
dataf=data
data.head()
#scale all the values between (0,1) range
data=MinMaxScaler().fit_transform(data);
#generate distance matrix
dm=distance_matrix(data,data)
```

### Perform Agglomerative Clustering Using Scipy

A

```
#this help to perform agglomerative clustering
model=linkage(dm,'complete')
#this help make N cluster from linkage
clusters=fcluster(model,5,criterion='maxclust')
clusters
```

array([1, 3, 3, 3, 1, 3, 2, 3, 3, 3, 3, 3, 3, 3, 2, 2, 3, 4, 1, 3, 3, 3, 3, 2, 1, 5, 3, 3, 3, 3, 3, 3, 3, 1, 3], dtype=int32)

### Plot Dendrogram

```
fig=pylab.figure(figsize=(20,30))
#this function help to label data point in dendrogram
def leaf_label(i):
return '[%s|%s|%s]' % (dataf['engine_s'][i],dataf['horsepow'][i],dataf['wheelbas'][i]);
#this is plot dendogram
dendog=dendrogram(model,leaf_label_func=leaf_label,leaf_rotation=90,leaf_font_size=18,orientation='top')
```

### Perform Agglomerative Clustering Using Sklearn

A

```
#create object of Agglomerative Clustering Algorithm
modelc=AgglomerativeClustering(n_clusters=5,linkage='complete')
#fit data in model
modelc.fit(data);
#print labels for data point
modelc.labels_
#append labels in data set
dataf['cluster']=modelc.labels_;
dataf.head()
```

array([1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 1, 0, 0, 0, 4, 1, 2, 0, 1, 1, 0, 1, 0, 0, 1, 1], dtype=int64)

### Plot Graph Of Cluster

Use scatter plot for plot cluster in graph and use

```
#identify number if clusters
no_cluters=max(modelc.labels_)+1
#this is help to choose color from colormap of matplotlib
colors=cm.rainbow(np.linspace(0,1,no_cluters))
#create cluster labellist
c_label=list(range(0,no_cluters))
for color , label in zip(colors,c_label):
subset=dataf[dataf.cluster == label]
plt.scatter(subset.engine_s,subset.horsepow,c=color,label="cluster"+str(label))
plt.legend();
plt.xlabel("Engine size");
plt.ylabel("Horse power");
```

Solicitors APJThankfulness to my father who informed me concerning this website, this website is truly awesome.