0

Introduction

I'm learning the basics of AI. I have created a .csv file with random data to test Decision Trees. I'm currently using R in Jupyther Notebook.

Problem

Temperature, Humidity and Wind are the variables which determine if you are allowed to fly or not.

When I execute ctree(vuelo~., data=vuelo.csv) the output it's just a single node when I was expecting a full tree with the variables (Temperatura, Humdedad, Viento), as I resolved on paper.

Snippet of the result

The data used is the next table:

   temperatura humedad viento vuelo
1          Hot    High   Weak    No
2          Hot    High Strong    No
3          Hot    High   Weak   Yes
4         Mild    High   Weak   Yes
5         Cool  Normal   Weak   Yes
6         Cool  Normal Strong    No
7         Cool  Normal Strong   Yes
8         Mild    High   Weak    No
9         Cool  Normal   Weak   Yes
10        Mild  Normal   Weak   Yes
11        Mild  Normal Strong   Yes
12        Mild    High Strong   Yes
13         Hot  Normal   Weak   Yes
14        Mild    High Strong    No

I'm not sure if I missed something while importing the data, but what I did is:

test <- read.csv("vuelo.csv")

Notes

  • I'm using the "party" library from R (which contains examples from where I took some ideas)

EDIT:

Here is the result of dput() as requested

structure(list(temperatura = structure(c(2L, 2L, 2L, 3L, 1L, 
1L, 1L, 3L, 1L, 3L, 3L, 3L, 2L, 3L), .Label = c("Cool", "Hot", 
"Mild"), class = "factor"), humedad = structure(c(1L, 1L, 1L, 
1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L), .Label = c("High", 
"Normal"), class = "factor"), viento = structure(c(2L, 1L, 2L, 
2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L), .Label = c("Strong", 
"Weak"), class = "factor"), vuelo = structure(c(1L, 1L, 2L, 2L, 
2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L), .Label = c("No", "Yes"
), class = "factor")), class = "data.frame", row.names = c(NA, 
-14L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213

1 Answers1

0

Answer

ctree only creates splits if those reach statistical significance (see ?ctree for the underlying tests). In your case, none of the splits do so, and therefore no splits are provided.

In your case, you could force a full tree by messing with the controls (see ?ctree and ?ctree_control), e.g. like this:

ctree(vuelo~., data = vuelo.csv, 
      control = ctree_control(minbucket = 0, 
                               minsplit = 0,
                               testtype = "Teststatistic",
                               mincriterion = 0))

However, this does not make sense from a statistical point of view and I would strongly advise against it.

A more appropriate solution would be to include more observations into your dataset. Assuming that there is an underlying association of temperature, humidity, and wind with being allowed to fly or not, you'll find it with more observations.

For completeness, if we use plot on the output then we get the tree with all (not statistically significant) branches:

enter image description here

slamballais
  • 3,161
  • 3
  • 18
  • 29
  • Could you please recommend me any other option to plot a decision tree with such small dataset? – BadProgrammer May 16 '21 at 10:17
  • Well, the code that I provide should plot the full tree, it would just not be a tree with statistically significant branches. If you're asking for a way to find statistical significance in your dataset, then you're likely just out of luck: you'll simply need more observations. Edit: I added what the tree looks like. – slamballais May 16 '21 at 10:20
  • Thanks a lot, that is all I wanted. Answer acepted. – BadProgrammer May 16 '21 at 11:44