2

I am trying to read a csv file using numpy genfromtxt into a structured array. I plan to sort it and then use groupby to separate the file into groups based on the string values of one of the columns. Finally, I will splice the columns from each group for additional processing.

Here is a small example where I want to then have a specific column returned for each of the groups.

import numpy as np
from itertools import groupby

food1 = [[" vegetable", "tomato"], [" vegetable", "spinach"], [" fruit", "watermelon"], [" fruit", "grapes"], [" meat", "beef"]]

for key, group in groupby(food1, lambda x: x[0]):
    print key   
    group[:1]
# In the line above, TypeError: 'itertools._grouper' object is unsubscriptable, I have tried it with  food1 or food2
    for thing in group:     
        print key + ": "  + thing[1];       
    print " "

The output I would like is returning several arrays of the second column va;ies grouped by the first column's values,

So vegetable: ["tomato", "spinach"], fruits: ["water melon", "grapes"] ... etc.

I tried to splice the group return from groupby, but as it is an iterator, I would get TypeError: 'itertools._grouper' object is unsubscriptable.

I know I could splice the data loaded from genfromtxt, but it is the combination of grouping first and then splicing that is giving me trouble.

data = np.genfromtxt("file.txt", delimiter=',', skiprows=3)
# splicing a column from the ndarray read from the csv file
column2 = data[:,2];

Any other ideas how could I accomplish this group then splice?

Thanks.

frank
  • 1,283
  • 1
  • 19
  • 39

1 Answers1

2

I think you are trying to do this:

from itertools import groupby

food1 = [[" vegetable", "tomato"], [" vegetable", "spinach"], [" fruit", "watermelon"], [" fruit", "grapes"], [" meat", "beef"]]

data={}
for key, group in groupby(sorted(food1), key=lambda x: x[0]):
    data[key.strip()]=[v[1] for v in group]

data then is:

{'vegetable': ['tomato', 'spinach'], 
 'fruit': ['watermelon', 'grapes'], 
 'meat': ['beef']}
dawg
  • 98,345
  • 23
  • 131
  • 206
  • Thanks, this works. An answer to another question of mine also led to another alternative to group the values and select a column without using groupby http://stackoverflow.com/questions/17560879/python-numpy-split-a-csv-file-by-the-values-of-a-string-column – frank Jul 11 '13 at 08:52
  • 1
    Better sort the list before grouping; otherwise, you will loss some items. You may use the following code to sort the list: food1.sort(key=lambda x: x[0]) – Ken T May 15 '15 at 03:35
  • @user2720402: indeed. Correction made – dawg May 15 '15 at 04:30