0

I have columns I want to extract from my csv file. For that I wrote each one a column as follow:

train_error,cv_error,test_error
25.4965,25.3198,25.8298
25.7468,25.4954,26.0417
25.8253,25.6034,26.1515
26.3315,26.0611,26.6243
26.1648,25.92,26.4781
26.2192,25.9808,26.5419
26.3396,26.0814,26.6447
26.2184,25.9735,26.5339
26.2553,26.0105,26.5733
26.2683,26.0142,26.5763
26.1743,25.9286,26.4885
26.1716,25.9236,26.4836
26.1836,25.9345,26.495
26.1548,25.9083,26.4684
26.1505,25.9022,26.4617
26.1342,25.887,26.4463
26.1468,25.8994,26.4592
26.1453,25.8976,26.4574
26.1322,25.8854,26.445
26.132,25.8842,26.4438
26.1297,25.8828,26.4423

Everyone suggests that I use pandas. I've seen its API and I can't seem to get the columns in a way that actually uses the name of the column. The only way to get, say, the train_errors is indexing with a 0 as follows:

df = pd.read_csv('./tmp_errors/error_file.csv')
print(df.as_matrix()[:,0])

Is that really the only way to do it? I was hoping I could use the actual word train_error or something in my code to make it more readable or make a dictionary with those keys and arrays pointing to the errors. If I can't do it and I am forced to index with 0, 1, 2 etc. then what is even the point of having the named columns?

Is there a way to actually use the names of the columns to extract the data?


In the other answer it seems that doing:

df.to_dict()

gives nearly what I want, except that instead of having a dictionary with the keys as names of columns, the rows are a dictionary of numbers. As in:

{'cv_error': {0: 25.319800000000001, 1: 25.4954, 2: 25.603400000000001, 3: 26.0611, 4: 25.920000000000002, 5: 25.980799999999999, 6: 26.081399999999999, 7: 25.973500000000001, 8: 26.0105, 9: 26.014199999999999, 10: 25.928599999999999, 11: 25.9236, 12: 25.9345, 13: 25.908300000000001, 14: 25.902200000000001, 15: 25.886999999999997, 16: 25.8994, 17: 25.897600000000001, 18: 25.885400000000001, 19: 25.8842, 20: 25.8828}, 'train_error': {0: 25.496500000000001, 1: 25.7468, 2: 25.825299999999999, 3: 26.331499999999998, 4: 26.1648, 5: 26.219200000000001, 6: 26.339600000000001, 7: 26.218399999999999, 8: 26.255299999999998, 9: 26.2683, 10: 26.174299999999999, 11: 26.171600000000002, 12: 26.183599999999998, 13: 26.154800000000002, 14: 26.150500000000001, 15: 26.1342, 16: 26.146799999999999, 17: 26.145299999999999, 18: 26.132200000000001, 19: 26.131999999999998, 20: 26.1297}, 'test_error': {0: 25.829799999999999, 1: 26.041699999999999, 2: 26.151499999999999, 3: 26.624300000000002, 4: 26.478100000000001, 5: 26.541899999999998, 6: 26.6447, 7: 26.533899999999999, 8: 26.5733, 9: 26.5763, 10: 26.488499999999998, 11: 26.483599999999999, 12: 26.495000000000001, 13: 26.468399999999999, 14: 26.4617, 15: 26.446300000000001, 16: 26.459199999999999, 17: 26.4574, 18: 26.445, 19: 26.4438, 20: 26.442299999999999}}

I wish it would have been something like:

{'train_error':[25.4965,...,26.1297]}

in other words, the dictionary that acts as a array should be an array not a dictionary with silly/weird indices.

Charlie Parker
  • 5,884
  • 57
  • 198
  • 323

1 Answers1

4

Selection by column name is a basic feature of pandas:

df = pd.read_csv('./tmp_errors/error_file.csv')
print df['train_error']
ASGM
  • 11,051
  • 1
  • 32
  • 53
  • but why do I get some weird message at the end? `Name: train_error, dtype: float64` do you get that too? – Charlie Parker Feb 20 '17 at 19:11
  • 1
    That's a built-in feature of pandas, telling you the name and type of the Series you just printed. It's not an extra row, and if you save it back to a CSV it won't be part of it. – ASGM Feb 20 '17 at 20:24
  • I added one more detail in my question, do you know the answer? – Charlie Parker Feb 20 '17 at 20:59
  • Yes, though you should probably make this a separate question rather than putting it into this existing question. The answer is in the [docs](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_dict.html): use the keyword `orient`: `df.to_dict(orient='list')`. That will make it a list rather than an array, though that matches what you have in your example. – ASGM Feb 21 '17 at 16:22