Sent Successfully.
Home / Blog / Data Science Digital Book / Unsupervised Learning - Preliminaries
Unsupervised Learning - Preliminaries
Table of Content
Distance Calculation
Distance is either calculated between:
Click here to learn Data Science in Hyderabad
Learn the core concepts of Data Science Course video on YouTube:
Distance Properties:
- Should be non-negative (distance > 0)
- Distance between a record to itself is equal to 0
- Satisfies Symmetry (Distance between records 'i' & 'j' is equal to the distance between records 'j' & 'i')
If the variables scale or have different units, standardise or normalise the variables before computing the distance.
Click here to learn Data Science in Bangalore
Distance Calculations
Distance Metrics for Continuous Data
- Mahalanobis Distance which is calculated using Correlation Matrix
- Manhattan Distance, is also called as L1 norm
- Euclidean Distance, is also called as L2 norm
Click here to learn Artificial Intelligence in Hyderabad
Distance Metrics for Binary Categorical Data
- Binary Euclidean Distance
- Simple Matching Coefficient
- Jaccard's Coefficient
Click here to learn Artificial Intelligence in Bangalore
Distance Metrics for Categorical Data (> 2 categories)
- Distance is 0, if both items have same category
- Distance is 1 otherwise
Click here to learn Data Analytics in Hyderabad
Distance Metrics when both Quantitative Data & Categorical Data exists in a dataset
- Gower's General Dissimilarity Coefficient
Click here to learn Data Analytics in Bangalore
Linkages
Linkages - Distance between a record & a cluster, or between two clusters.
-
Single Linkage - This is the closest a record may be to a cluster or to another cluster.
- Single Linkage is also called as Nearest Neighbor
- Emphasis is on close records or regions and not on overall structure of Data
- Capable of clustering non-elliptical shaped regions
- Gets influenced greatly by outliers or noisy data
-
Complete Linkage - The diameter between a record and a cluster, or between two clusters, is the greatest.
- Complete Linkage is also called as Farthest Neighbor
- Complete Linkage is also sensitive to outliers
-
Average Linkage - This is the mean distance between any two clusters or between any two records.
- Average Linkage is also called Group Average
- Very expensive because computation takes a lot of time
-
Centroid Linkage - This is the separation between two clusters' centroids, or between a cluster's record and centroid.
- Centroid Linkage is also called Centroid Similarity
-
Ward's Criterion - By combining them into a single cluster, the SSE criteria for clustering's value increased.
- This is also called Ward's Minimum Variance and it minimizes the total within cluster variance
-
Group Averaged Agglomerative Clustering (GAAC)
- Two clusters are merged based on cardinality of the clusters and centroid of clusters
- Cardinality is the number of elements in the cluster
Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore
Data Science Placement Success Story
Data Science Training Institutes in Other Locations
Agra, Ahmedabad, Amritsar, Anand, Anantapur, Bangalore, Bhopal, Bhubaneswar, Chengalpattu, Chennai, Cochin, Dehradun, Malaysia, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Gwalior, Hebbal, Hyderabad, Jabalpur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Khammam, Kolhapur, Kothrud, Ludhiana, Madurai, Meerut, Mohali, Moradabad, Noida, Pimpri, Pondicherry, Pune, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thane, Thiruvananthapuram, Tiruchchirappalli, Trichur, Udaipur, Yelahanka, Andhra Pradesh, Anna Nagar, Bhilai, Borivali, Calicut, Chandigarh, Chromepet, Coimbatore, Dilsukhnagar, ECIL, Faridabad, Greater Warangal, Guduvanchery, Guntur, Gurgaon, Guwahati, Hoodi, Indore, Jaipur, Kalaburagi, Kanpur, Kharadi, Kochi, Kolkata, Kompally, Lucknow, Mangalore, Mumbai, Mysore, Nagpur, Nashik, Navi Mumbai, Patna, Porur, Raipur, Salem, Surat, Thoraipakkam, Trichy, Uppal, Vadodara, Varanasi, Vijayawada, Vizag, Tirunelveli, Aurangabad
Navigate to Address
360DigiTMG - Data Science Course, Data Scientist Course Training in Chennai
D.No: C1, No.3, 3rd Floor, State Highway 49A, 330, Rajiv Gandhi Salai, NJK Avenue, Thoraipakkam, Tamil Nadu 600097
1800-212-654-321