Given a classification problem, the training data looks as follows:
input - output
--------------
A 100
B 150
C 170
..............
where A,B,C are big data sets, each with 6 variables and around 5000 rows.
The problem is: how do I wrap up the input to use a classification algorithm for other data sets inputs such as these.
I tried attaching each row the training classifier value and train them as such. For a new entry, each row would be classified, and I would take the mean (average) value as a classifier for the entire data set. But I did not have very good results using naive Bayes.
Should I keep looking into this method with other classifiers? What other options could I look into?
Edit
Sample data from 2 events
OUT Var1 Var2 Var3 Var4 Var5 Var6
0 93 209.2 49.4 5451.0 254.0 206.0 37.7
1 344.9 217.6 14590.5 191.7 175.5 106.8
2 663.3 97.2 17069.2 144.4 2.8 59.9
3 147.4 137.7 12367.4 194.1 237.7 116.2
4 231.8 162.2 11938.4 71.3 149.1 116.3
OUT Var1 Var2 Var3 Var4 Var5 Var6
964 100 44.5 139.7 10702.5 151.4 36.0 17.9
966 59.8 148.9 3184.9 103.0 96.5 12.8
967 189.7 194.4 7569.6 49.9 82.6 55.2
969 158.5 88.2 2932.4 159.8 232.8 125.2
971 226.4 155.2 3156.3 85.0 4010.5 69.9
For a similar data set, I need to predict the out value. I have a lot of samples like these.
Is it correct to apply the same value to all rows?