Decision Trees: The Branching Path to Better Choices

# 8

January 26, 2024

Shayaan Hussain

Assume that you're a traveller planning your next weekend getaway and have two amazing options: A relaxing beach trip to Goa or a thrilling mountain adventure in Lonavala. But, how do you decide one way or the other? This is where Decision Trees come in. They are like flowcharts, visually representing all the scenarios. Let's look deeper into this example.

Our first question would be if the person is interested in beaches or mountains. Another decisive question could be if the person is fit for a trek or a hike.

Let's look at a visual representation of how a Decision Tree could help in this.

Shown above is a standard representation of a hypothetical situation but there's a lot of math that goes into it. Ready for it? Let's get right in!

Decision Tree Variants

The decision tree is not restricted to just classification or regression. There is a separate arithmetic calculation in each node for each of these variants. The variants are:

Decision Tree Classifier
Decision Tree Regressor

The Decision Tree evaluates each given feature to find the most relevant one and equates the results for that specific feature first. This is done until all the features are evaluated and the leaf node is encountered. A leaf node is defined as a node to which there are no Childs. This node is used for prediction. As for the tuning, the depth of the decision tree could be restricted, meaning that longest path between the root node and a leaf node is restricted to a specified number.

Decision Tree Classifier

The decision tree classifier is very similar to a flow chart. Each node of decision tree has certain evaluations that go into it. There are two different ways to do this. One is Information Gain method, and the other is Gini Index method. Let's look at the steps involved in building these and the Mathematics and Statistics behind it.

Decision Tree Classifier: Information Gain Method

Step 1: Calculate the Entropy of entire Dataset

Entropy is defined as randomness or uncertainty in the given data or situation. First step in building the decision tree classifier is to calculate the entropy of the entire dataset. This can be done by calculating the entropy of the label or the target column. The formula for this is:

Entropy,of,Dataset =-sum_{class=1}^{n}p(class) , log_2,(p(class))

Where;p(class) = rac{Number,of,samples,corresponding,to,the,class}{Total,number,of,samples}

Step 2: Calculate Entropy of each Feature Column

When calculating entropy of each feature column, we need to first calculate entropy for each unique value in the selected feature column using the following equation. Select all the rows corresponding to selected value and calculate entropy with respect to those rows.

Entropy,of,UniqueValue =sum_{class=1}^{n}p(class) , log_2,(p(class))

Where;p(class) = rac{Number,of,samples,corresponding,to,the,class,and,selected,feature,value}{Total,number,of,samples}

After calculating the entropy for each unique value in the feature column, next step is to calculate the entropy of the feature column using the following arithmetic equation:

Entropy,of,Feature = sum_{UniqueVal=1}^n p(UniqueVal) . Entropy,of,UniqueVal

Where;p(class) = rac{Number,of,samples,corresponding,to,the,UniqueVal}{Total,number,of,samples}

Repeat this process for each feature column.

Step 3: Information Gain Index

Information Gain is a measure used to determine which feature column is to be selected next.

IG(Target,,Feature) = Entropy,of,Dataset - Entropy,of,Feature

The value with highest Information Gain is selected to be the root node or the next node.

Working Example

Now let's look at how this works in practice. Consider the following dataset:

Weather	Temperature	Humidity	Wind	Play Football
Sunny	Hot	High	Weak	No
Sunny	Hot	High	Strong	No
Cloudy	Hot	High	Weak	Yes
Rainy	Mild	High	Weak	Yes
Rainy	Cool	Normal	Weak	Yes
Rainy	Cool	Normal	Strong	No
Cloudy	Cool	Normal	Strong	Yes
Sunny	Mild	High	Weak	No
Sunny	Cool	Normal	Weak	Yes
Rainy	Mild	Normal	Weak	Yes
Sunny	Mild	Normal	Strong	Yes
Cloudy	Mild	High	Strong	Yes
Cloudy	Hot	Normal	Weak	Yes
Rainy	Mild	High	Strong	No

The Entropy of the entire dataset is:

Entropy(Play,Football) = - rac{9}{14} log_2 rac{9}{14} - rac{5}{14} log_2 rac{5}{14} = 0.94

Next step is to calculate Entropy for each feature column.

For Weather: There are three unique values: Sunny, Rainy and Cloudy

Entropy,of,Sunny=- rac{2}{5}log_2 rac{2}{5}- rac{3}{5}log_2 rac{3}{5} = 0.97

Entropy,of,Rainy=- rac{3}{5}log_2 rac{3}{5}- rac{2}{5}log_2 rac{2}{5} = 0.97

Entropy,of,Cloudy=- rac{4}{4}log_2 rac{4}{4}- rac{0}{4}log_2 rac{0}{4} = 0

IG(Weather) = Entropy(Play Football) - rac{5}{14} Entropy(Sunny) - rac{5}{14} Entropy(Rainy) - rac{4}{14} Entropy(Cloudy)

IG(Weather) = 0.94 - rac{5}{14} 0.97 - rac{5}{14} 0.97 - rac{4}{14} 0 = 0.247

For Temperature: There are three unique values: Hot, Cool and Mild

Entropy,of,Hot=- rac{2}{4}log_2 rac{2}{4}- rac{2}{4}log_2 rac{2}{4} = 1.0

Entropy,of,Cool=- rac{3}{4}log_2 rac{3}{4}- rac{1}{4}log_2 rac{1}{4} = 0.81

Entropy,of,Mild=- rac{4}{6}log_2 rac{4}{6}- rac{2}{6}log_2 rac{2}{6} = 0.918

IG(Temperature) = Entropy(Play Football) - rac{4}{14} Entropy(Hot) - rac{4}{14} Entropy(Cool) - rac{6}{14} Entropy(Mild)

IG(Temperature) = 0.94 - rac{4}{14} 1.0 - rac{4}{14} 0.81 - rac{6}{14} 0.918 = 0.03

For Humidity: There are two unique values: High and Normal

Entropy,of,High=- rac{3}{7}log_2 rac{3}{7}- rac{4}{7}log_2 rac{4}{7} = 0.98

Entropy,of,Normal=- rac{6}{7}log_2 rac{6}{7}- rac{1}{7}log_2 rac{1}{7} = 0.59

IG(Humidity) = Entropy(Play Football) - rac{7}{14} Entropy(High) - rac{7}{14} Entropy(Normal)

IG(Humidity) = 0.94 - rac{7}{14} 0.98 - rac{7}{14} 0.59 = 0.15

For wind: There are two unique values: Weak and Strong

Entropy,of,Weak=- rac{3}{6}log_2 rac{3}{6}- rac{3}{6}log_2 rac{3}{6} = 1.0

Entropy,of,Strong=- rac{6}{8}log_2 rac{6}{8}- rac{2}{8}log_2 rac{2}{8} = 0.81

IG(Wind) = Entropy(Play Football) - rac{6}{14} Entropy(Weak) - rac{8}{14} Entropy(Strong)

IG(Wind) = 0.94 - rac{6}{14} 1.0 - rac{8}{14} 0.81 = 0.05

Now that we have all the Information Gain Index, let's analyze them.

Feature	Information Gain
Weather	0.247
Temperature	0.03
Humidity	0.15
Wind	0.05

The highest Information Gain is to be selected as root node. In this case, it is Weather. Hence, the decision tree we get now is:

Now let's do the further processing for the "Sunny" Condition. Consider the following dataset:

Weather	Temperature	Humidity	Wind	Play Football
Sunny	Hot	High	Weak	No
Sunny	Hot	High	Strong	No
Sunny	Mild	High	Weak	No
Sunny	Cool	Normal	Weak	Yes
Sunny	Mild	Normal	Strong	Yes

Entropy(Play Football) = - rac{2}{5}log_2 rac{2}{5}- rac{3}{5}log_2 rac{3}{5} = 0.97

For Temperature: There are three values: Hot, Mild, Cool

Entropy,for,Hot = - rac{2}{2}log_2 rac{2}{2}- rac{0}{2}log_2 rac{0}{2} = 0

Entropy,for,Mild = - rac{1}{2}log_2 rac{1}{2}- rac{1}{2}log_2 rac{1}{2} = 1.0

Entropy,for,Cool = - rac{1}{1}log_2 rac{1}{1}- rac{0}{1}log_2 rac{0}{1} = 0

IG(Temperature) = Entropy(Play Football) - rac{2}{5} Entropy(Hot) - rac{2}{5} Entropy(Mild) - rac{1}{5} Entropy(Cool)

IG(Temperature) = 0.97 - rac{2}{5} 0 - rac{2}{5} 1.0 - rac{1}{5} 0 = 0.57

For Humidity: There are two values: High and Normal

Entropy,for,High = - rac{3}{3}log_2 rac{3}{3}- rac{0}{3}log_2 rac{0}{3} = 0

Entropy,for,Normal = - rac{0}{2}log_2 rac{0}{2}- rac{2}{2}log_2 rac{2}{2} = 0

IG(Humidity) = Entropy(Play Football) - rac{3}{5} Entropy(High) - rac{2}{5} Entropy(Normal)

IG(Humidity) = 0.97 - rac{3}{5} 0 - rac{2}{5} 0 = 0.97

For Wind: There are two values: Weak and Strong

Entropy,for,Weak = - rac{1}{3}log_2 rac{1}{3}- rac{2}{3}log_2 rac{2}{3} = 0.918

Entropy,for,Strong = - rac{1}{2}log_2 rac{1}{2}- rac{1}{2}log_2 rac{1}{2} = 1.0

IG(Wind) = Entropy(Play Football) - rac{3}{5} Entropy(Weak) - rac{2}{5} Entropy(Strong)

IG(Wind) = 0.97 - rac{3}{5} 0.918 - rac{2}{5} 1.0 = 0.02

Let's check all the Information Gain Indexes:

Feature	Information Gain
Temperature	0.57
Humidity	0.97
Wind	0.02

Since the highest is Humidity, the Humidity node comes after Sunny.

From the dataset, we can clearly notice that when Humidity is high, football is not played and when it is normal, football is played. Hence we can insert a leaf node or output node for this.

The resulting decision tree is as follows:

Now let's check the data for Rainy:

Weather	Temperature	Humidity	Wind	Play Football
Rainy	Mild	High	Weak	Yes
Rainy	Cool	Normal	Weak	Yes
Rainy	Cool	Normal	Strong	No
Rainy	Mild	Normal	Weak	Yes
Rainy	Mild	High	Strong	No

Entropy(Play Football) = - rac{2}{5}log_2 rac{2}{5}- rac{3}{5}log_2 rac{3}{5} = 0.97

For Temperature: There are two values: Mild, Cool

Entropy,for,Mild = - rac{2}{3}log_2 rac{2}{3}- rac{1}{3}log_2 rac{1}{3} = 0.92

Entropy,for,Cool = - rac{1}{2}log_2 rac{1}{2}- rac{1}{2}log_2 rac{1}{2} = 1.0

IG(Temperature) = Entropy(Play Football) - rac{3}{5} Entropy(Mild) - rac{2}{5} Entropy(Cool)

IG(Temperature) = 0.97 - rac{3}{5} 0.92 - rac{2}{5} 1.0 = 0.02

For Humidity: There are two values: High and Normal

Entropy,for,High = - rac{1}{2}log_2 rac{1}{2}- rac{1}{2}log_2 rac{1}{2} = 1.0

Entropy,for,Normal = - rac{2}{3}log_2 rac{2}{3}- rac{1}{3}log_2 rac{1}{3} = 0.92

IG(Humidity) = Entropy(Play Football) - rac{2}{5} Entropy(High) - rac{3}{5} Entropy(Normal)

IG(Humidity) = 0.97 - rac{2}{5} 1.0 - rac{3}{5} 0.92 = 0.02

For Wind: There are two values: Weak and Strong

Entropy,for,Weak = - rac{3}{3}log_2 rac{3}{3}- rac{0}{3}log_2 rac{0}{3} = 0

Entropy,for,Strong = - rac{2}{2}log_2 rac{2}{2}- rac{2}{2}log_2 rac{2}{2} = 0

IG(Wind) = Entropy(Play Football) - rac{3}{5} Entropy(Weak) - rac{2}{5} Entropy(Strong)

IG(Wind) = 0.97 - rac{3}{5} 0 - rac{2}{5} 0 = 0.97

Let's check all the Information Gain Indexes:

Feature	Information Gain
Temperature	0.02
Humidity	0.02
Wind	0.97

Since the highest is Wind, the Wind node comes after Rainy.

It can be seen that when Wind is Strong, football is not played and when it is Weak, football is played. Hence we can insert a leaf node or output node for this.

The resulting decision tree is as follows:

Now let's look at the data for cloudy weather!

Weather	Temperature	Humidity	Wind	Play Football
Cloudy	Hot	High	Weak	Yes
Cloudy	Cool	Normal	Strong	Yes
Cloudy	Mild	High	Strong	Yes
Cloudy	Hot	Normal	Weak	Yes

We can see that all the labels are "Yes" in this data. Hence we can conclude that for cloudy weather, football will be played regardless of temperature, humidity and wind.

The final decision tree constructed will be as follows:

Decision Tree Classifier: Gini Index Method

Gini index is very similar to Information Gain Index. Here are the equations and steps with respect to Gini-Index Method. Gini Index is a more powerful measure of entropy in datasets and is hence proven to be more effective than Information Gain.

Step 1: Calculate Gini-Index of each Feature Column

Gini,Index,of,UniqueValue =1 - sum_{class=1}^{n}p(class)^2

Where;p(class) = rac{Number,of,samples,corresponding,to,the,class,and,selected,feature,value}{Total,number,of,samples}

After calculating the entropy for each unique value in the feature column, next step is to calculate the entropy of the feature column using the following arithmetic equation:

Entropy,of,Feature = sum_{UniqueVal=1}^n p(UniqueVal) . Gini,Index,of,UniqueVal

Where;p(class) = rac{Number,of,samples,corresponding,to,the,UniqueVal}{Total,number,of,samples}

Repeat this process for each feature column.

Step 2: Select the Attribute with Lowest Gini-Index as Decision Node

The feature with lowest gini-index is utilized as the decision node and the process is repeated until leaf node is achieved.

Working Example

Consider the following data:

Temperature	Humidity	Wind	Play Football
Hot	High	Weak	No
Hot	High	Strong	No
Mild	High	Weak	No
Cool	Normal	Weak	Yes
Mild	Normal	Strong	Yes

For Temperature: There are 3 values: Hot, Mild and Cool

Gini(Hot) = 1 - p(Yes)^2 - p(No)^2

Gini(Hot) = 1 - ( rac{0}{2})^2 - ( rac{2}{2})^2 = 0

Gini(Mild) = 1 - ( rac{1}{2})^2 - ( rac{1}{2})^2 = 0.5

Gini(Cool) = 1 - ( rac{1}{1})^2 - ( rac{0}{1})^2 = 0

Gini(Temperature) = rac{2}{5} 0 + rac{2}{5}0.5 + rac{1}{5}0 = 0.2

For Humidity: There are two values: High and Normal

Gini(High) = 1 - ( rac{0}{3})^2 - ( rac{3}{3})^2 = 0

Gini(Normal) = 1 - ( rac{2}{2})^2 - ( rac{0}{2})^2 = 0

Gini(Humidity) = rac{3}{5} 0 + rac{2}{5}0 = 0

For Wind: There are two values: Weak and Strong

Gini(Weak) = 1 - ( rac{1}{3})^2 - ( rac{2}{3})^2 = 0.44

Gini(Strong) = 1 - ( rac{1}{2})^2 - ( rac{1}{2})^2 = 0.5

Gini(Wind) = rac{3}{5} 0.44 + rac{2}{5} 0.5 = 0.464

Feature	Gini Index
Temperature	0.2
Humidity	0
Wind	0.464

The lowest Gini Index is of Humidity. Therefore Humidity is selected as decision node.

Therefore the tree that is formed is:

Python Implementation of Decision Tree Classifier

Following is the Python code for decision tree classifier:

			# Import Required Modules
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import pandas as pd
import matplotlib.pyplot as plt
 
# Data Reading Pre-Processing
data = pd.read_excel("data.xlsx")
data['Weather'].replace(['Sunny','Rainy','Cloudy'], [0,1,2], inplace=True)
data['Temperature'].replace(['Hot','Cool','Mild'], [0,1,2], inplace=True)
data['Humidity'].replace(['High','Normal'], [0,1], inplace=True)
data['Wind'].replace(['Weak','Strong'], [0,1], inplace=True)
data['Play Football'].replace(['No','Yes'], [0,1], inplace=True)
 
# Separating Features and Label
x = data.drop("Play Football", axis=1)
y = data[["Play Football"]]
 
# Model Training
model = DecisionTreeClassifier()
model.fit(x, y)
 
# Confusion Matrix
cm = confusion_matrix(y, model.predict(x))
ConfusionMatrixDisplay(cm).plot()
 
# Append predicted values to data to display
data = pd.read_excel("data.xlsx")
data["Model Predictions"] = model.predict(x)
data['Model Predictions'].replace([0,1], ["No","Yes"], inplace=True)

			# Import Required Modules
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import pandas as pd
import matplotlib.pyplot as plt
 
# Data Reading Pre-Processing
data = pd.read_excel("data.xlsx")
data['Weather'].replace(['Sunny','Rainy','Cloudy'], [0,1,2], inplace=True)
data['Temperature'].replace(['Hot','Cool','Mild'], [0,1,2], inplace=True)
data['Humidity'].replace(['High','Normal'], [0,1], inplace=True)
data['Wind'].replace(['Weak','Strong'], [0,1], inplace=True)
data['Play Football'].replace(['No','Yes'], [0,1], inplace=True)
 
# Separating Features and Label
x = data.drop("Play Football", axis=1)
y = data[["Play Football"]]
 
# Model Training
model = DecisionTreeClassifier()
model.fit(x, y)
 
# Confusion Matrix
cm = confusion_matrix(y, model.predict(x))
ConfusionMatrixDisplay(cm).plot()
 
# Append predicted values to data to display
data = pd.read_excel("data.xlsx")
data["Model Predictions"] = model.predict(x)
data['Model Predictions'].replace([0,1], ["No","Yes"], inplace=True)

The Confusion Matrix produced by this code is as follows:

Confusion Matrix

It can be interpreted by this confusion matrix that all the predictions were accurate and there were 5 true negatives and 9 true positives.

This is all about the Decision Tree Classifier. If you have any questions regarding this, feel free to contact me.

Up next, we have the Decision Tree Regressor where we train the Decision Tree Algorithm to work with continuous values. Stay Tuned!

Decision Trees: The Branching Path to Better Choices

Decision Tree Variants

Decision Tree Classifier

Decision Tree Classifier: Information Gain Method

Step 1: Calculate the Entropy of entire Dataset

Step 2: Calculate Entropy of each Feature Column

Step 3: Information Gain Index

Working Example

Decision Tree Classifier: Gini Index Method

Step 1: Calculate Gini-Index of each Feature Column

Step 2: Select the Attribute with Lowest Gini-Index as Decision Node

Working Example

Python Implementation of Decision Tree Classifier

View on GitHub