How to handle the Nominal Data by Weka J48

Question

When I ran J48 of weka with binary split option, such decision tree was built.

http://www.fastpic.jp/viewer.php?file=2693704973.jpg

Input explanation variable is 1 nominal data which was made by question id + answer id. 1 nominal data, 1 transaction. I'm wondering why the tree is at only one side.

Is it caused by my data set or table definition or original binary splits way? I'd like the tree to have node on both sides.

If you know such a option please show me.

!Sample Data! Please ignore dot '・'

usr,qa,class
A,11,1
A,21,1
A,31,1
B,12,2
B,22,2
B,32,2
C,13,3
C,23,3
C,33,3
D,11,4
D,22,4
D,31,4
E,11,1
E,23,1
E,31,1
F,12,2
F,22,2
F,33,2
G,13,3
G,22,3
G,32,3
H,12,4
H,21,4
H,33,4

Thank you for your interest. I added the sample data. please refer them. — keita, Nov 12 '14 at 09:23

score 1 · Accepted Answer · answered Nov 12 '14 at 09:40

1

There's no error in the tree built and no option would really modify it. If your question is related to your same Akinator project, please reformat your data to get all questions (ie. 11,21,31) on the same instance/line and the answer as target class.

PS: if you import those data as CSV, Weka will take those data as numerical (not as as nominal). You should then add a non digit character (ie. #1,#2,#3...) so that Weka will take those data as nominal.

answered Nov 12 '14 at 09:40

doxav

978
8
14

Thank you for your answer. Yes it is related to akinator's one. Once I considered the same way as your reformat opinion, but I'd like user to add the related question in app by themselves. So should I prepare the column according to each question as scalable right? Like a qa1, qa2, qa3... – keita Nov 12 '14 at 09:49
It is not cvs import but database. It seems weka refer the data type automatically. – keita Nov 12 '14 at 09:50
1

Weka detetects automatically but a full digit data will be detected as a numerical (not a categorical/nominal).If you want the user to add questions, you'll need to use an updateable classifier, in that case you should prepare more empty questions from the beginning, or to re-build the classifier for any new question. In the last case, you will just need to add those new question to the feature list. – doxav Nov 12 '14 at 10:46
however, a tree based solution is limited compared to the solution proposed here: http://stats.stackexchange.com/questions/6074/akinator-com-and-naive-bayes-classifier . – doxav Nov 12 '14 at 10:49
Yes, my app continue to re-ubild the decision tree by question. Do you mean I need to prepare the blank column in my database in advance? – keita Nov 12 '14 at 11:13
Not in the database but in the ARFF dataset or Java instances. – doxav Nov 12 '14 at 11:15
Actually once I tried the neural networks by weka's MultilayerPerceptron but that was too slow to run as app... – keita Nov 12 '14 at 11:16
I don't understand what is the user but dataset could be like this: qa1,qa2,qa3,qa4.....qa1000,class but stored differently in the database: iSolutionID, iQuestionID, bQuestionAnswer – doxav Nov 12 '14 at 11:19
OK, I understood data should be re-format as dataset. Can I ask 2 questions about this? #1. Currently I use the code'Instances data = query.retrieveInstances();' when I set the dataset from db. How can I re-map? #2. difference question could come by instance/user because this app re-build tree. so data set could be like the url. - http://www.fastpic.jp/viewer.php?file=6534076444.png How can I run correctly except for blank data? If you have any idea please show me. – keita Nov 12 '14 at 12:03
#1: I would not use weka database object but you can still use it and re-map through the sql query using "query.setQuery(my_merging_sql_query);" before query.retrieveInstances(). #2: sorry but I don't understand the question nor the user/instance thing. Are you asking how to handle missing data (should be replaced by ?) ? – doxav Nov 12 '14 at 12:14
Thank you for answering additional questions. #1. Yes, I've used setQuery method but it seems difficult to re-map in that method. #2. Yes, I mean the missing data. If I set ? to bland data weka can skip or handle it correctly? – keita Nov 12 '14 at 13:03
#1: you can use conditional subselect. I still think, it might be worth to manage db connection and re-map in a separate java code, I mean not using weka db connector. #2: ? should be handled correctly depending on the algorithm but trees easily deal with it. – doxav Nov 12 '14 at 13:07
#1: Yes, there is no choice. it is difficult for me to code that connector so I try to solute by only sql. #2: Thank you for good solution. Is there any option which change blank to ? automatically in j48? – keita Nov 12 '14 at 13:47
#1 if it's a nominal, change blank to ? using a Filter like RenameNominalValues. If it's numerical it should be interpreted as missing. #2 so go for a conditional subselect. – doxav Nov 12 '14 at 14:10
#1: it is nominal so that I'll use filter but I could not find RenameNominalValues in my weka ver.3.6.11. I updated data from null to ? but weka regards ? as dataset and decision tree contains '?' blanch – keita Nov 12 '14 at 14:35
?: it might be that DB default missing character is different from ARFF format which uses ?. If you didn't have any branch with a blank character before, it means that it was correctly interpreted as a missing value. #1: on 3.7 it is weka.filters.unsupervised.attribute.RenameNominalValues, I don't know for 3.6. – doxav Nov 12 '14 at 14:45
I see. I'll update version and find it method for using '?'. Anyway thank you for your kindly support. I'll try re-mapping and build decision tree correctly. I'll update the status at this comment after test. thank you so much. – keita Nov 12 '14 at 15:39
If you think it is the correct answer, please state it as the answer. – doxav Nov 13 '14 at 08:10
Is it OK? I re-mapped by subselect sql and confirmed it works correctly. Thank you for your help!!! – keita Nov 13 '14 at 11:21
Yes. I'm glad you finally got it. – doxav Nov 13 '14 at 11:26

How to handle the Nominal Data by Weka J48

1 Answers1