I need to create a python list object, or any object, out of a pandas DataFrame object grouping pieces of values from different rows

Question

My DataFrame has a string in the first column, and a number in the second one:

            GEOSTRING  IDactivity
9     wydm2p01uk0fd2z           2
10    wydm86pg6r3jyrg           2
11    wydm2p01uk0fd2z           2
12    wydm80xfxm9j22v           2
39    wydm9w92j538xze           4
40    wydm8km72gbyuvf           4
41    wydm86pg6r3jyrg           4
42    wydm8mzt874p1v5           4
43    wydm8mzmpz5gkt8           5
44    wydm86pg6r3jyrg           5
45    wydm8w1q8bjfpcj           5
46    wydm8w1q8bjfpcj           5

What I want to do is to manipulate this DataFrame in order to have a list object that contains a string, made out of the 5th character for each "GEOSTRING" value, for each different "IDactivity" value. So in this case, I have 3 different "IDactivity" values, and I will have in my list object 3 strings that look like this:

['2828', '9888','8888']

where again, the symbols you see in each string, are the 5th value of each "GEOSTRING" value.

What I'm asking is a solution, or an approach, that doesn't involve a too complicated for loop and have it as efficient as possible since I have to manipulate lots of data. I'd like it to be clean and fast.

I hope it's clear enough.

Rayhane Mama · Accepted Answer · 2017-07-09T10:51:42.713

this can be done easily as follows as a one liner: (considered to be pretty fast too)

result = df.groupby('IDactivity')['GEOSTRING'].apply(lambda x:''.join(x.str[4])).tolist()

this groups the dataframe by values of IDactivity then select from each corresponding string of GEOSTRING column the 5th element (index 4) and joins it with the other corresponding strings. Finally we add tolist() method to get the output as list not pandas Series.

output:

['2828', '9888', '8888']

Documentation:

pandas.groupby
pandas.apply

score 1 · Answer 2 · answered Jul 08 '17 at 21:46

1

Here's a solution involving a temp column, and taking inspiration for the key operation from this answer:

# create a temp column with the character we want from each string
dframe['Temp'] = dframe['GEOSTRING'].apply(lambda x: x[4])

# groupby ID and then concatenate using a sneaky call to .sum()
dframe.groupby('IDactivity')['Temp'].sum().tolist()

Result:

['2828', '9888', '8888']

answered Jul 08 '17 at 21:46

cmaher

5,100
1
22
34

This is actually useful since I'm learning how to handle dataframes. Rayhane answer is probably faster, but you have been helpful too, thanks – zampero Jul 09 '17 at 16:22

I need to create a python list object, or any object, out of a pandas DataFrame object grouping pieces of values from different rows

2 Answers2