I am trying to read a csv file using numpy genfromtxt into a structured array. I plan to sort it and then use groupby to separate the file into groups based on the string values of one of the columns. Finally, I will splice the columns from each group for additional processing.
Here is a small example where I want to then have a specific column returned for each of the groups.
import numpy as np
from itertools import groupby
food1 = [[" vegetable", "tomato"], [" vegetable", "spinach"], [" fruit", "watermelon"], [" fruit", "grapes"], [" meat", "beef"]]
for key, group in groupby(food1, lambda x: x[0]):
print key
group[:1]
# In the line above, TypeError: 'itertools._grouper' object is unsubscriptable, I have tried it with food1 or food2
for thing in group:
print key + ": " + thing[1];
print " "
The output I would like is returning several arrays of the second column va;ies grouped by the first column's values,
So vegetable: ["tomato", "spinach"], fruits: ["water melon", "grapes"] ... etc.
I tried to splice the group return from groupby, but as it is an iterator, I would get TypeError: 'itertools._grouper' object is unsubscriptable.
I know I could splice the data loaded from genfromtxt, but it is the combination of grouping first and then splicing that is giving me trouble.
data = np.genfromtxt("file.txt", delimiter=',', skiprows=3)
# splicing a column from the ndarray read from the csv file
column2 = data[:,2];
Any other ideas how could I accomplish this group then splice?
Thanks.