I have a csv file below called train.csv:
25.3, 12.4, 2.35, 4.89, 1, 2.35, 5.65, 7, 6.24, 5.52, M
20, 15.34, 8.55, 12.43, 23.5, 3, 7.6, 8.11, 4.23, 9.56, B
4.5, 2.5, 2, 5, 10, 15, 20.25, 43, 9.55, 10.34, B
1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, M
I am trying to get this dataset be separated and classified as the following (This is the output I want):
[[25.3, 12.4, 2.35, 4.89. 1, 2.35, 5.65, 7, 6.24, 5.52],
[20, 15.34, 8.55, 12.43, 23.5, 3, 7.6, 8.11, 4.23, 9.56],
[4.5, 2.5, 2, 5, 10, 15, 20.25, 43, 9.55, 10.34],
[1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5]],
[M, B, B, M]
The one in "[[" is the x (the sample data) and the one in "[M, M, B, B, M]" is the y (which is the classification that matches with its set of data.
I am trying to create a python code that's been loaded and can print out the data being separated by data and it's classification. It's related to linear SVM.
y_list = []
x_list = []
for W in range(0, 100):
X = data_train.readline()
y = X.split(",")
y_list.append(y[10][0])
print(y_list)
z_list = []
for Z in range(0, 10):
z_list.append(y[Z])
x_list.append(z_list)
dataSet = (x_list, y_list)
print(dataSet)
Note: I know my range is completely wrong. I'm not sure how to fit the range at all for this type of example, could anyone please explain how the range would work in this situation.
Note: I know the append line where it is "y[10][0]" is also wrong as well. Could someone explain how these indexes work.
Overall I want the output to be the output I stated above. Thanks for the help.