1

I have a dataframe where I want to group by the ID field and get last letters in GG field. For example, say I have the following:

df1 = pd.DataFrame({
         'ID':['Q'] * 3,
         'GG':['L3S_0097A','L3S_0097B','L3S_0097C']

})

print (df1)
  ID         GG
0  Q  L3S_0097A
1  Q  L3S_0097B
2  Q  L3S_0097C

I am trying to groupby ID column and get only last letter in GG column and add it to the defaultdict like this:

{'Q': ['A','B','C']}

Here is the code I tried:

mm = df1.groupby('ID')['GG'].str[-1].apply(list).to_dict()
and also tried the following code:
for i, j in zip(df1.ID,df1.GG):
    mm[i].append(j[-1])

but both din't work. May I know how to do it?

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
amrutha
  • 193
  • 1
  • 11

1 Answers1

1

Use syntactic sugar - groupby by - 2 Series - GG Series with last value and df1['ID']:

mm = df1['GG'].str[-1].groupby(df1['ID']).apply(list).to_dict()

Or assign only last value back to GG:

mm = df1.assign(GG = df1['GG'].str[-1]).groupby('ID')['GG'].apply(list).to_dict()

print (mm)
{'Q': ['A', 'B', 'C']}

Pure python solution:

from collections import defaultdict

mm = defaultdict(list)
#https://stackoverflow.com/a/10532492
for i, j in zip(df1.ID,df1.GG):
    mm[i].append(j[-1])

print (mm)
defaultdict(<class 'list'>, {'Q': ['A', 'B', 'C']})
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • thanks.. it worked.. but may I know how can I do it with zip statement? because when I tried using the above posted zip code, I get the error like this: "AttributeError: 'list' object has no attribute 'str' " – amrutha Nov 20 '18 at 12:07
  • @amrutha - I think need `defaultdict` for this. – jezrael Nov 20 '18 at 12:14