In this decision tree tutorial blog, we will talk about what a decision tree algorithm is, and we will also mention some interesting decision tree examples. The blog will also highlight how to create a decision tree classification model and a decision tree for regression using the decision tree classifier function and the decision tree regressor function, respectively. Also, it will discuss about decision tree analysis, how to visualize a decision tree algorithm in Machine Learning using Python, Scikit-Learn, and the Graphviz tool.
Decision Tree algorithm is one of the simplest yet powerful Supervised Machine Learning algorithms. Decision Tree algorithm can be used to solve both regression and classification problems in Machine Learning. That is why it is also known as CART or Classification and Regression Trees. As the name suggests, in Decision Tree, we form a tree-like model of decisions and their possible consequences.
Before we dive right into understanding this interesting algorithm, let us take a look at the concepts this blog has to offer.
Without much delay, let’s get started!
Monica’s cousin Marry is visiting Central Park this weekend. Now, Monica needs to make some plans for the weekend, whether to go out for shopping, go for a movie, spend time in the Central Park coffee shop, or just stay in and play a board game. Well, she decides to create a Decision Tree to make things easy.
Here, the interior nodes represent different tests on an attribute (for example, whether to go out or stay in), branches hold the outcomes of those tests, and leaf nodes represent a class label or some decision taken after measuring all attributes. Each path from the root node to the leaf nodes represents a decision tree classification rule.
Rule 1: If it’s not raining and not too sunny, then go out for shopping.
Rule 2: If it’s not raining but too sunny outside, then go for a movie.
Rule 3: If it’s raining outside and the cable has signal, then watch a TV show.
Rule 4: If it’s raining and the cable signal fails, then spend time in the coffee shop downstairs
That’s how a decision tree helps Monica to make the perfect weekend plan with her cousin.
There are two types of decision trees. They are categorized based on the type of the target variable they have. If the decision tree has a categorical target variable, then it is called a ‘categorical variable decision tree’. Similarly, if it has a continuous target variable, it is called a ‘continuous variable decision tree’.
The process of training and predicting the target features using a decision tree in Machine Learning is given below:
That’s it! Your decision tree model is ready.
DecisionTreeClassifier (): It is nothing but the decision tree classifier function to build a decision tree model in Machine Learning using Python. The DecisionTreeClassifier() function looks like this:
DecisionTreeClassifier (criterion = ‘gini’, random_state = None, max_depth = None, min_samples_leaf =1)
Here are a few important parameters:
DecisionTreeRegressio (): It is the decision tree regressor function used to build a decision tree model in Machine Learning using Python. The DecisionTreeRegressor () function looks like this:
DecisionTreeRegressor (criterion = ‘mse’, random_state =None , max_depth=None, min_samples_leaf=1,)
Problem Statement: Use Machine Learning to predict the selling prices of houses based on some economic factors. Build a model using decision tree in Python.
Dataset: Boston Housing Dataset
Model Building
Let us build the regression model of decision tree in Python.
Step 1: Load required packages
Step 2: Load the Boston dataset
Step 3: Visualize the dataset using a scatter plot
Step 4: Define the features and the target
Step 5: Split the dataset into train and test sets
Here, ‘test_size = 0.2’ means that the test set will be 20 percent of the whole dataset and the training set’s size will be 80 percent of the entire dataset.
Step 6: Build the model with the decision tree regressor function
Step 7: Visualize the tree using Graphviz
Step 8: Compare y_test and y_pred
Step 9: Finding the RMSE value