Decision Trees: The Branching Path to Better Choices

Decision Trees: The Branching Path to Better Choices

# 8

January 26, 2024

Assume that you're a traveller planning your next weekend getaway and have two amazing options: A relaxing beach trip to Goa or a thrilling mountain adventure in Lonavala. But, how do you decide one way or the other? This is where Decision Trees come in. They are like flowcharts, visually representing all the scenarios. Let's look deeper into this example.

Our first question would be if the person is interested in beaches or mountains. Another decisive question could be if the person is fit for a trek or a hike.

Let's look at a visual representation of how a Decision Tree could help in this.

Shown above is a standard representation of a hypothetical situation but there's a lot of math that goes into it. Ready for it? Let's get right in!

Decision Tree Variants

The decision tree is not restricted to just classification or regression. There is a separate arithmetic calculation in each node for each of these variants. The variants are:

  • Decision Tree Classifier
  • Decision Tree Regressor

The Decision Tree evaluates each given feature to find the most relevant one and equates the results for that specific feature first. This is done until all the features are evaluated and the leaf node is encountered. A leaf node is defined as a node to which there are no Childs. This node is used for prediction. As for the tuning, the depth of the decision tree could be restricted, meaning that longest path between the root node and a leaf node is restricted to a specified number.

Decision Tree Classifier

The decision tree classifier is very similar to a flow chart. Each node of decision tree has certain evaluations that go into it. There are two different ways to do this. One is Information Gain method, and the other is Gini Index method. Let's look at the steps involved in building these and the Mathematics and Statistics behind it.

Decision Tree Classifier: Information Gain Method

Step 1: Calculate the Entropy of entire Dataset

Entropy is defined as randomness or uncertainty in the given data or situation. First step in building the decision tree classifier is to calculate the entropy of the entire dataset. This can be done by calculating the entropy of the label or the target column. The formula for this is:

EntropyofDataset=class=1np(class)log2(p(class))Entropy,of,Dataset =-sum_{class=1}^{n}p(class) , log_2,(p(class)) Where  p(class)=NumberofsamplescorrespondingtotheclassTotalnumberofsamplesWhere;p(class) = rac{Number,of,samples,corresponding,to,the,class}{Total,number,of,samples}

Step 2: Calculate Entropy of each Feature Column

When calculating entropy of each feature column, we need to first calculate entropy for each unique value in the selected feature column using the following equation. Select all the rows corresponding to selected value and calculate entropy with respect to those rows.

EntropyofUniqueValue=class=1np(class)log2(p(class))Entropy,of,UniqueValue =sum_{class=1}^{n}p(class) , log_2,(p(class)) Where  p(class)=NumberofsamplescorrespondingtotheclassandselectedfeaturevalueTotalnumberofsamplesWhere;p(class) = rac{Number,of,samples,corresponding,to,the,class,and,selected,feature,value}{Total,number,of,samples}

After calculating the entropy for each unique value in the feature column, next step is to calculate the entropy of the feature column using the following arithmetic equation:

EntropyofFeature=UniqueVal=1np(UniqueVal).EntropyofUniqueValEntropy,of,Feature = sum_{UniqueVal=1}^n p(UniqueVal) . Entropy,of,UniqueVal Where  p(class)=NumberofsamplescorrespondingtotheUniqueValTotalnumberofsamplesWhere;p(class) = rac{Number,of,samples,corresponding,to,the,UniqueVal}{Total,number,of,samples}

Repeat this process for each feature column.

Step 3: Information Gain Index

Information Gain is a measure used to determine which feature column is to be selected next.

IG(Target,Feature)=EntropyofDatasetEntropyofFeatureIG(Target,,Feature) = Entropy,of,Dataset - Entropy,of,Feature

The value with highest Information Gain is selected to be the root node or the next node.

Working Example

Now let's look at how this works in practice. Consider the following dataset:

Weather Temperature Humidity Wind Play Football
Sunny Hot High Weak No
Sunny Hot High Strong No
Cloudy Hot High Weak Yes
Rainy Mild High Weak Yes
Rainy Cool Normal Weak Yes
Rainy Cool Normal Strong No
Cloudy Cool Normal Strong Yes
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rainy Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Cloudy Mild High Strong Yes
Cloudy Hot Normal Weak Yes
Rainy Mild High Strong No

The Entropy of the entire dataset is:

Entropy(PlayFootball)=914log2914514log2514=0.94Entropy(Play,Football) = - rac{9}{14} log_2 rac{9}{14} - rac{5}{14} log_2 rac{5}{14} = 0.94

Next step is to calculate Entropy for each feature column.

  • For Weather: There are three unique values: Sunny, Rainy and Cloudy
EntropyofSunny=25log22535log235=0.97Entropy,of,Sunny=- rac{2}{5}log_2 rac{2}{5}- rac{3}{5}log_2 rac{3}{5} = 0.97 EntropyofRainy=35log23525log225=0.97Entropy,of,Rainy=- rac{3}{5}log_2 rac{3}{5}- rac{2}{5}log_2 rac{2}{5} = 0.97 EntropyofCloudy=44log24404log204=0Entropy,of,Cloudy=- rac{4}{4}log_2 rac{4}{4}- rac{0}{4}log_2 rac{0}{4} = 0 IG(Weather)=Entropy(PlayFootball)514Entropy(Sunny)514Entropy(Rainy)414Entropy(Cloudy)IG(Weather) = Entropy(Play Football) - rac{5}{14} Entropy(Sunny) - rac{5}{14} Entropy(Rainy) - rac{4}{14} Entropy(Cloudy) IG(Weather)=0.945140.975140.974140=0.247IG(Weather) = 0.94 - rac{5}{14} 0.97 - rac{5}{14} 0.97 - rac{4}{14} 0 = 0.247
  • For Temperature: There are three unique values: Hot, Cool and Mild
EntropyofHot=24log22424log224=1.0Entropy,of,Hot=- rac{2}{4}log_2 rac{2}{4}- rac{2}{4}log_2 rac{2}{4} = 1.0 EntropyofCool=34log23414log214=0.81Entropy,of,Cool=- rac{3}{4}log_2 rac{3}{4}- rac{1}{4}log_2 rac{1}{4} = 0.81 EntropyofMild=46log24626log226=0.918Entropy,of,Mild=- rac{4}{6}log_2 rac{4}{6}- rac{2}{6}log_2 rac{2}{6} = 0.918 IG(Temperature)=Entropy(PlayFootball)414Entropy(Hot)414Entropy(Cool)614Entropy(Mild)IG(Temperature) = Entropy(Play Football) - rac{4}{14} Entropy(Hot) - rac{4}{14} Entropy(Cool) - rac{6}{14} Entropy(Mild) IG(Temperature)=0.944141.04140.816140.918=0.03IG(Temperature) = 0.94 - rac{4}{14} 1.0 - rac{4}{14} 0.81 - rac{6}{14} 0.918 = 0.03
  • For Humidity: There are two unique values: High and Normal
EntropyofHigh=37log23747log247=0.98Entropy,of,High=- rac{3}{7}log_2 rac{3}{7}- rac{4}{7}log_2 rac{4}{7} = 0.98 EntropyofNormal=67log26717log217=0.59Entropy,of,Normal=- rac{6}{7}log_2 rac{6}{7}- rac{1}{7}log_2 rac{1}{7} = 0.59 IG(Humidity)=Entropy(PlayFootball)714Entropy(High)714Entropy(Normal)IG(Humidity) = Entropy(Play Football) - rac{7}{14} Entropy(High) - rac{7}{14} Entropy(Normal) IG(Humidity)=0.947140.987140.59=0.15IG(Humidity) = 0.94 - rac{7}{14} 0.98 - rac{7}{14} 0.59 = 0.15
  • For wind: There are two unique values: Weak and Strong
EntropyofWeak=36log23636log236=1.0Entropy,of,Weak=- rac{3}{6}log_2 rac{3}{6}- rac{3}{6}log_2 rac{3}{6} = 1.0 EntropyofStrong=68log26828log228=0.81Entropy,of,Strong=- rac{6}{8}log_2 rac{6}{8}- rac{2}{8}log_2 rac{2}{8} = 0.81 IG(Wind)=Entropy(PlayFootball)614Entropy(Weak)814Entropy(Strong)IG(Wind) = Entropy(Play Football) - rac{6}{14} Entropy(Weak) - rac{8}{14} Entropy(Strong) IG(Wind)=0.946141.08140.81=0.05IG(Wind) = 0.94 - rac{6}{14} 1.0 - rac{8}{14} 0.81 = 0.05

Now that we have all the Information Gain Index, let's analyze them.

Feature Information Gain
Weather 0.247
Temperature 0.03
Humidity 0.15
Wind 0.05

The highest Information Gain is to be selected as root node. In this case, it is Weather. Hence, the decision tree we get now is:

Now let's do the further processing for the "Sunny" Condition. Consider the following dataset:

Weather Temperature Humidity Wind Play Football
Sunny Hot High Weak No
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes
Entropy(PlayFootball)=25log22535log235=0.97Entropy(Play Football) = - rac{2}{5}log_2 rac{2}{5}- rac{3}{5}log_2 rac{3}{5} = 0.97
  • For Temperature: There are three values: Hot, Mild, Cool
EntropyforHot=22log22202log202=0Entropy,for,Hot = - rac{2}{2}log_2 rac{2}{2}- rac{0}{2}log_2 rac{0}{2} = 0 EntropyforMild=12log21212log212=1.0Entropy,for,Mild = - rac{1}{2}log_2 rac{1}{2}- rac{1}{2}log_2 rac{1}{2} = 1.0 EntropyforCool=11log21101log201=0Entropy,for,Cool = - rac{1}{1}log_2 rac{1}{1}- rac{0}{1}log_2 rac{0}{1} = 0 IG(Temperature)=Entropy(PlayFootball)25Entropy(Hot)25Entropy(Mild)15Entropy(Cool)IG(Temperature) = Entropy(Play Football) - rac{2}{5} Entropy(Hot) - rac{2}{5} Entropy(Mild) - rac{1}{5} Entropy(Cool) IG(Temperature)=0.97250251.0150=0.57IG(Temperature) = 0.97 - rac{2}{5} 0 - rac{2}{5} 1.0 - rac{1}{5} 0 = 0.57
  • For Humidity: There are two values: High and Normal
EntropyforHigh=33log23303log203=0Entropy,for,High = - rac{3}{3}log_2 rac{3}{3}- rac{0}{3}log_2 rac{0}{3} = 0 EntropyforNormal=02log20222log222=0Entropy,for,Normal = - rac{0}{2}log_2 rac{0}{2}- rac{2}{2}log_2 rac{2}{2} = 0 IG(Humidity)=Entropy(PlayFootball)35Entropy(High)25Entropy(Normal)IG(Humidity) = Entropy(Play Football) - rac{3}{5} Entropy(High) - rac{2}{5} Entropy(Normal) IG(Humidity)=0.97350250=0.97IG(Humidity) = 0.97 - rac{3}{5} 0 - rac{2}{5} 0 = 0.97
  • For Wind: There are two values: Weak and Strong
EntropyforWeak=13log21323log223=0.918Entropy,for,Weak = - rac{1}{3}log_2 rac{1}{3}- rac{2}{3}log_2 rac{2}{3} = 0.918 EntropyforStrong=12log21212log212=1.0Entropy,for,Strong = - rac{1}{2}log_2 rac{1}{2}- rac{1}{2}log_2 rac{1}{2} = 1.0 IG(Wind)=Entropy(PlayFootball)35Entropy(Weak)25Entropy(Strong)IG(Wind) = Entropy(Play Football) - rac{3}{5} Entropy(Weak) - rac{2}{5} Entropy(Strong) IG(Wind)=0.97350.918251.0=0.02IG(Wind) = 0.97 - rac{3}{5} 0.918 - rac{2}{5} 1.0 = 0.02

Let's check all the Information Gain Indexes:

Feature Information Gain
Temperature 0.57
Humidity 0.97
Wind 0.02

Since the highest is Humidity, the Humidity node comes after Sunny.

From the dataset, we can clearly notice that when Humidity is high, football is not played and when it is normal, football is played. Hence we can insert a leaf node or output node for this.

The resulting decision tree is as follows:

Now let's check the data for Rainy:

Weather Temperature Humidity Wind Play Football
Rainy Mild High Weak Yes
Rainy Cool Normal Weak Yes
Rainy Cool Normal Strong No
Rainy Mild Normal Weak Yes
Rainy Mild High Strong No
Entropy(PlayFootball)=25log22535log235=0.97Entropy(Play Football) = - rac{2}{5}log_2 rac{2}{5}- rac{3}{5}log_2 rac{3}{5} = 0.97
  • For Temperature: There are two values: Mild, Cool
EntropyforMild=23log22313log213=0.92Entropy,for,Mild = - rac{2}{3}log_2 rac{2}{3}- rac{1}{3}log_2 rac{1}{3} = 0.92 EntropyforCool=12log21212log212=1.0Entropy,for,Cool = - rac{1}{2}log_2 rac{1}{2}- rac{1}{2}log_2 rac{1}{2} = 1.0 IG(Temperature)=Entropy(PlayFootball)35Entropy(Mild)25Entropy(Cool)IG(Temperature) = Entropy(Play Football) - rac{3}{5} Entropy(Mild) - rac{2}{5} Entropy(Cool) IG(Temperature)=0.97350.92251.0=0.02IG(Temperature) = 0.97 - rac{3}{5} 0.92 - rac{2}{5} 1.0 = 0.02
  • For Humidity: There are two values: High and Normal
EntropyforHigh=12log21212log212=1.0Entropy,for,High = - rac{1}{2}log_2 rac{1}{2}- rac{1}{2}log_2 rac{1}{2} = 1.0 EntropyforNormal=23log22313log213=0.92Entropy,for,Normal = - rac{2}{3}log_2 rac{2}{3}- rac{1}{3}log_2 rac{1}{3} = 0.92 IG(Humidity)=Entropy(PlayFootball)25Entropy(High)35Entropy(Normal)IG(Humidity) = Entropy(Play Football) - rac{2}{5} Entropy(High) - rac{3}{5} Entropy(Normal) IG(Humidity)=0.97251.0350.92=0.02IG(Humidity) = 0.97 - rac{2}{5} 1.0 - rac{3}{5} 0.92 = 0.02
  • For Wind: There are two values: Weak and Strong
EntropyforWeak=33log23303log203=0Entropy,for,Weak = - rac{3}{3}log_2 rac{3}{3}- rac{0}{3}log_2 rac{0}{3} = 0 EntropyforStrong=22log22222log222=0Entropy,for,Strong = - rac{2}{2}log_2 rac{2}{2}- rac{2}{2}log_2 rac{2}{2} = 0 IG(Wind)=Entropy(PlayFootball)35Entropy(Weak)25Entropy(Strong)IG(Wind) = Entropy(Play Football) - rac{3}{5} Entropy(Weak) - rac{2}{5} Entropy(Strong) IG(Wind)=0.97350250=0.97IG(Wind) = 0.97 - rac{3}{5} 0 - rac{2}{5} 0 = 0.97

Let's check all the Information Gain Indexes:

Feature Information Gain
Temperature 0.02
Humidity 0.02
Wind 0.97

Since the highest is Wind, the Wind node comes after Rainy.

It can be seen that when Wind is Strong, football is not played and when it is Weak, football is played. Hence we can insert a leaf node or output node for this.

The resulting decision tree is as follows:

Now let's look at the data for cloudy weather!

Weather Temperature Humidity Wind Play Football
Cloudy Hot High Weak Yes
Cloudy Cool Normal Strong Yes
Cloudy Mild High Strong Yes
Cloudy Hot Normal Weak Yes

We can see that all the labels are "Yes" in this data. Hence we can conclude that for cloudy weather, football will be played regardless of temperature, humidity and wind.

The final decision tree constructed will be as follows:

Decision Tree Classifier: Gini Index Method

Gini index is very similar to Information Gain Index. Here are the equations and steps with respect to Gini-Index Method. Gini Index is a more powerful measure of entropy in datasets and is hence proven to be more effective than Information Gain.

Step 1: Calculate Gini-Index of each Feature Column

GiniIndexofUniqueValue=1class=1np(class)2Gini,Index,of,UniqueValue =1 - sum_{class=1}^{n}p(class)^2 Where  p(class)=NumberofsamplescorrespondingtotheclassandselectedfeaturevalueTotalnumberofsamplesWhere;p(class) = rac{Number,of,samples,corresponding,to,the,class,and,selected,feature,value}{Total,number,of,samples}

After calculating the entropy for each unique value in the feature column, next step is to calculate the entropy of the feature column using the following arithmetic equation:

EntropyofFeature=UniqueVal=1np(UniqueVal).GiniIndexofUniqueValEntropy,of,Feature = sum_{UniqueVal=1}^n p(UniqueVal) . Gini,Index,of,UniqueVal Where  p(class)=NumberofsamplescorrespondingtotheUniqueValTotalnumberofsamplesWhere;p(class) = rac{Number,of,samples,corresponding,to,the,UniqueVal}{Total,number,of,samples}

Repeat this process for each feature column.

Step 2: Select the Attribute with Lowest Gini-Index as Decision Node

The feature with lowest gini-index is utilized as the decision node and the process is repeated until leaf node is achieved.

Working Example

Consider the following data:

Temperature Humidity Wind Play Football
Hot High Weak No
Hot High Strong No
Mild High Weak No
Cool Normal Weak Yes
Mild Normal Strong Yes
  • For Temperature: There are 3 values: Hot, Mild and Cool
Gini(Hot)=1p(Yes)2p(No)2Gini(Hot) = 1 - p(Yes)^2 - p(No)^2 Gini(Hot)=1(02)2(22)2=0Gini(Hot) = 1 - ( rac{0}{2})^2 - ( rac{2}{2})^2 = 0 Gini(Mild)=1(12)2(12)2=0.5Gini(Mild) = 1 - ( rac{1}{2})^2 - ( rac{1}{2})^2 = 0.5 Gini(Cool)=1(11)2(01)2=0Gini(Cool) = 1 - ( rac{1}{1})^2 - ( rac{0}{1})^2 = 0 Gini(Temperature)=250+250.5+150=0.2Gini(Temperature) = rac{2}{5} 0 + rac{2}{5}0.5 + rac{1}{5}0 = 0.2
  • For Humidity: There are two values: High and Normal
Gini(High)=1(03)2(33)2=0Gini(High) = 1 - ( rac{0}{3})^2 - ( rac{3}{3})^2 = 0 Gini(Normal)=1(22)2(02)2=0Gini(Normal) = 1 - ( rac{2}{2})^2 - ( rac{0}{2})^2 = 0 Gini(Humidity)=350+250=0Gini(Humidity) = rac{3}{5} 0 + rac{2}{5}0 = 0
  • For Wind: There are two values: Weak and Strong
Gini(Weak)=1(13)2(23)2=0.44Gini(Weak) = 1 - ( rac{1}{3})^2 - ( rac{2}{3})^2 = 0.44 Gini(Strong)=1(12)2(12)2=0.5Gini(Strong) = 1 - ( rac{1}{2})^2 - ( rac{1}{2})^2 = 0.5 Gini(Wind)=350.44+250.5=0.464Gini(Wind) = rac{3}{5} 0.44 + rac{2}{5} 0.5 = 0.464
Feature Gini Index
Temperature 0.2
Humidity 0
Wind 0.464

The lowest Gini Index is of Humidity. Therefore Humidity is selected as decision node.

Therefore the tree that is formed is:

Python Implementation of Decision Tree Classifier

Following is the Python code for decision tree classifier:

			# Import Required Modules
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import pandas as pd
import matplotlib.pyplot as plt
 
# Data Reading Pre-Processing
data = pd.read_excel("data.xlsx")
data['Weather'].replace(['Sunny','Rainy','Cloudy'], [0,1,2], inplace=True)
data['Temperature'].replace(['Hot','Cool','Mild'], [0,1,2], inplace=True)
data['Humidity'].replace(['High','Normal'], [0,1], inplace=True)
data['Wind'].replace(['Weak','Strong'], [0,1], inplace=True)
data['Play Football'].replace(['No','Yes'], [0,1], inplace=True)
 
# Separating Features and Label
x = data.drop("Play Football", axis=1)
y = data[["Play Football"]]
 
# Model Training
model = DecisionTreeClassifier()
model.fit(x, y)
 
# Confusion Matrix
cm = confusion_matrix(y, model.predict(x))
ConfusionMatrixDisplay(cm).plot()
 
# Append predicted values to data to display
data = pd.read_excel("data.xlsx")
data["Model Predictions"] = model.predict(x)
data['Model Predictions'].replace([0,1], ["No","Yes"], inplace=True)
	
			# Import Required Modules
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import pandas as pd
import matplotlib.pyplot as plt
 
# Data Reading Pre-Processing
data = pd.read_excel("data.xlsx")
data['Weather'].replace(['Sunny','Rainy','Cloudy'], [0,1,2], inplace=True)
data['Temperature'].replace(['Hot','Cool','Mild'], [0,1,2], inplace=True)
data['Humidity'].replace(['High','Normal'], [0,1], inplace=True)
data['Wind'].replace(['Weak','Strong'], [0,1], inplace=True)
data['Play Football'].replace(['No','Yes'], [0,1], inplace=True)
 
# Separating Features and Label
x = data.drop("Play Football", axis=1)
y = data[["Play Football"]]
 
# Model Training
model = DecisionTreeClassifier()
model.fit(x, y)
 
# Confusion Matrix
cm = confusion_matrix(y, model.predict(x))
ConfusionMatrixDisplay(cm).plot()
 
# Append predicted values to data to display
data = pd.read_excel("data.xlsx")
data["Model Predictions"] = model.predict(x)
data['Model Predictions'].replace([0,1], ["No","Yes"], inplace=True)
	

The Confusion Matrix produced by this code is as follows:

Confusion Matrix

Confusion Matrix

It can be interpreted by this confusion matrix that all the predictions were accurate and there were 5 true negatives and 9 true positives.

This is all about the Decision Tree Classifier. If you have any questions regarding this, feel free to contact me.

Up next, we have the Decision Tree Regressor where we train the Decision Tree Algorithm to work with continuous values. Stay Tuned!

Neuralchemie | © 2025

Made with

svelte-logo By Prabhu Kiran Konda