Sent Successfully.
Home / Blog / Data Science / Decision Tree in a Cheat Sheet
Decision Tree in a Cheat Sheet
Table of Content
A supervised, non-parametric machine learning technique called a decision tree is utilised for both classification and regression.
Decision Trees are represented as Nodes:
- Root Node represented as a Rectangle or a Square: ▭ or □
- Branch/ Internal Node represented as a Circle: ○
- Leaf /Terminal Node represented as a Triangle or a dot: △ or ○
Click here to explore 360DigiTMG.
Learn the core concepts of Data Science Course video on Youtube:
Information Gain:
After the dataset is divided based on an attribute, the information gain is based on the decrease in entropy. It has a value between 0 and 1.
Entropy before - after is the formula for information gain (IG).
Entropy:
It is the measure of impurity, it is also called a measure of uncertainty.
Its value ranges between 0 to 1
Gini Index:
The purity is measured by the Gini Index. Gini Index is used by the CART algorithm for decision trees. It has a value between 0 and 1
- Stacking: A meta-classifier or a meta-regression is used in the ensemble learning approach known as stacking to merge many classification or regression models.
- Voting: Voting combines the predictions from multiple machine learning algorithms
- Hard Voting: The class that gained the most votes in this case will be selected as the output class.
- Soft Voting: In this, the probability values for each predicted class are added and taken an average, the one with more average is considered.
- Bagging: Bagging is aggregation in Bootstrap. It improves accuracy and decreases over-fitting.
- Random Forest: Random Forest is an extension to Bagging. IT minimizes the overfit
- Ada Boost: Ada Boost seeks to create a powerful classifier by merging many weak classifiers. Improve the weak classifier's accuracy.
- Gradient Boosting: Gradient Boosting is used to define the loss function and reduce it. It works well with categorical and count data and also handles the missing data well
- XG Boost: Gradient boosting is improved by XG Boost, which can be applied to both classifiers and regression models.
Libraries to install in Python for Decision Tree and Ensemble
- from sklearn.preprocessing import LabelEncoder - Used for one-hot encoding on the data
- from sklearn.preprocessing import scale - Data preprocessing for standardization
- from sklearn.model_selection import train_test_split - To split the data into Train and Test
- from sklearn.tree import DecisionTreeClassifier as DT - Used in multiclass classification
- from sklearn import tree - Used to generate and draw trees
- from sklearn.metrics import accuracy_score - Multilabel classification for subset accuracy
- from sklearn.metrics import confusion_matrix - Used to evaluate the quality of o/p classifier
- from sklearn.ensemble import VotingClassifier - Used for prediction based on the most frequent one
- from sklearn.ensemble import BaggingClassifier - Used on the base classifier on random subsets of the original dataset and aggregate individual predictions
- from sklearn.ensemble import RandomForestClassifier - Used in both classification and regression models
- from sklearn.ensemble import AdaBoostClassifier - It uses multiple classifiers to increase the accuracy of the classifier
- from sklearn.ensemble import GradientBoostingClassifier - Gradient Boosting classifiers is to minimize the loss
- import xgboost as xgb - XGB is an extension of GB used for speed and performance
Libraries to install in R for Decision Tree and Ensemble
- library(caTools) -Used for basic utility functions
- library(C50) - C5.0 classification model for Decision Tree
- library(rpart) - R implementation in Recursive Partitioning And Regression Trees
- library(gmodels) - For model fitting
- library(caret) - For Classification and Regression
- library(randomForest) - Algorithm for Classification and Regression
- library(adabag) - AdaBoost for classification with bagging and boosting
- library(gbm) - Gradient Boosting Machine for Regression models
- library(xgboost) - It’s an extension to GB and it supports both classification and regression models
Hyperparameters in Decision Tree | ||
---|---|---|
Hyper Parameters | Input Values | Default Value |
max_depth | Integer or None, Optional | None |
min_samples_split | Integer, Float, Optional | 2 |
min_samples_leaf | Integer, Float, Optional | 1 |
min_weight_fraction_leaf | Float, Optional | 0 |
max_features | Integer, Float, string or None, Option | None |
random_state | Integer, RSI or None, Optional | None |
min_impurity_decrease | Float, Optional | 0 |
base_estimator | Int | Decision Tree |
n_estimators | Int | 10 |
random_state | seed | None |
n_jobs | Int, None | None |
Criterion | Integer, float | Gini |
min_samples_leaf | Integer | 1 |
oob_score | Boolean | False |
learning_rate | Integer | 1 |
colsample_byleve | Integer, float | 1 |
colsample_bytree | Integer, float | 1 |
Subsample | Integer, float | 1 |
Eta | Integer, float | 0.3 |
min_child_weight | Integer | 1 |
Gamma | Integer, Float | 0 |
Alpha | Integer, float | 0 |
Lambda | Integer, float | 1 |
Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore
Data Science Placement Success Story
Data Science Training Institutes in Other Locations
Agra, Ahmedabad, Amritsar, Anand, Anantapur, Bangalore, Bhopal, Bhubaneswar, Chengalpattu, Chennai, Cochin, Dehradun, Malaysia, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Gwalior, Hebbal, Hyderabad, Jabalpur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Khammam, Kolhapur, Kothrud, Ludhiana, Madurai, Meerut, Mohali, Moradabad, Noida, Pimpri, Pondicherry, Pune, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thane, Thiruvananthapuram, Tiruchchirappalli, Trichur, Udaipur, Yelahanka, Andhra Pradesh, Anna Nagar, Bhilai, Borivali, Calicut, Chandigarh, Chromepet, Coimbatore, Dilsukhnagar, ECIL, Faridabad, Greater Warangal, Guduvanchery, Guntur, Gurgaon, Guwahati, Hoodi, Indore, Jaipur, Kalaburagi, Kanpur, Kharadi, Kochi, Kolkata, Kompally, Lucknow, Mangalore, Mumbai, Mysore, Nagpur, Nashik, Navi Mumbai, Patna, Porur, Raipur, Salem, Surat, Thoraipakkam, Trichy, Uppal, Vadodara, Varanasi, Vijayawada, Vizag, Tirunelveli, Aurangabad
Data Analyst Courses in Other Locations
ECIL, Jaipur, Pune, Gurgaon, Salem, Surat, Agra, Ahmedabad, Amritsar, Anand, Anantapur, Andhra Pradesh, Anna Nagar, Aurangabad, Bhilai, Bhopal, Bhubaneswar, Borivali, Calicut, Cochin, Chengalpattu , Dehradun, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Guduvanchery, Gwalior, Hebbal, Hoodi , Indore, Jabalpur, Jaipur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Kanpur, Khammam, Kochi, Kolhapur, Kolkata, Kothrud, Ludhiana, Madurai, Mangalore, Meerut, Mohali, Moradabad, Pimpri, Pondicherry, Porur, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thoraipakkam , Tiruchirappalli, Tirunelveli, Trichur, Trichy, Udaipur, Vijayawada, Vizag, Warangal, Chennai, Coimbatore, Delhi, Dilsukhnagar, Hyderabad, Kalyan, Nagpur, Noida, Thane, Thiruvananthapuram, Uppal, Kompally, Bangalore, Chandigarh, Chromepet, Faridabad, Guntur, Guwahati, Kharadi, Lucknow, Mumbai, Mysore, Nashik, Navi Mumbai, Patna, Pune, Raipur, Vadodara, Varanasi, Yelahanka
Navigate to Address
360DigiTMG - Data Analytics, Data Science Course Training Hyderabad
2-56/2/19, 3rd floor, Vijaya Towers, near Meridian School, Ayyappa Society Rd, Madhapur, Hyderabad, Telangana 500081
099899 94319