I have a large csv file ~90k rows and 355 columns. The first 354 columns correspond to the presence of different words, showing a 1 or 0 and the last column to a numerical value.
Eg:
table, box, cups, glasses, total
1,0,0,1,30
0,1,1,1,28
1,1,0,1,55
When I use:
d = np.recfromcsv('clean.csv', dtype=None, delimiter=',', names=True)
d.shape
# I get: (89460,)
So my question is:
- How do I get a 2d array/matrix? Does it matter?
- How can I separate the 'total' column so I can create train, cross_validation and test sets and train a model?