0

I would appreciate some help with adjusting a bit of code to address a problem I have trying to manipulate data in pandas. I have a very large spreadsheet with two columns (names, notes). Moving down the names column, if the name matches the name from the previous cell value then append the adjacent notes value to the previous notes value. Then delete named row. Continue to next row down.

The code below looks for a static name(currently Jason). I need to have the code look for any names(only alpha chars - maybe a space between names do exist). Once the names do not match, the next name is added to the new name search. I couldn't figure out a wildcard search so I was testing with on name.

To sound this out logically.

 1. Compare name, if not exist, capture name for search. move to row 2.
 2. If name matches previous name(above) then append adjacent note value to the above note value.
 3. Delete current row and shift up.
 4. Compare name with next row down. If match, goto step 2 and repeat. if no match, goto step 1 and 
    capture new name.



 #----Code Begin
 import pandas as pd

 data = {'name': ['Jason', 'Jason', 'Molly Jin', 'Molly Jin', 'Molly Jin',
                 'Tina', 'Tina', 'Tina','Jake Robins', 'Amy','Amy'],
        'notes': ['Are you ok', 'just fine', 'I am here', 'where are you', 'over here',
                  'guest what','what','your funny','I am not here','what flavor','rasberry'] }


 df = pd.DataFrame(data)
 print(df)
 print('')

 s = df['name'].shift(-1)                 #Looks for the row previous names col.
 t = df['notes'].shift(-1)                #Looks for the row revious notes col.
 m = s.str.startswith('Jason', na=False)  #Searches names col for name (needs a wildcard search)
 df.loc[m, 'notes'] += (', ' + t[m])      #grabs adjacent notes col. data and appends to previous 
                                          #notes cell value with comma/space separator.

 print(df) 
 #----Code End


 What I would like:

 name          notes
 Jason         Are you ok, just fine
 Molly Jin     I am here, where are you, over here
 Tina          guest what, what, your funny
 Jake Robins   I am not here
 Amy           what flavor, rasberry

Thanks for any help that you can provide.

Boomer
  • 229
  • 1
  • 9
  • I think you want to group by `name` and `join` the `notes` column, check `df.groupby('name', sort=False)['notes'].agg(','.join)` – Shubham Sharma May 25 '21 at 16:14
  • Hello Shubham, Looking at this. I tried your code by itself and it didn't work. How would I apply this to my df? – Boomer May 25 '21 at 16:23
  • Could you please explain, why it's not working? – Shubham Sharma May 25 '21 at 16:25
  • It's probably because I don't understand, my apologies. I REM out my above four line code and applied your one line code. Then print(df). I get a duplicate df print. Cn you show me how you applied to the df in the sample please? – Boomer May 25 '21 at 16:30
  • You need to save the result of the aggregation operation in some variable, for e.g. check `result = df.groupby('name', sort=False)['notes'].agg(','.join)`. Now try to print `result` – Shubham Sharma May 25 '21 at 16:32
  • That what I didn't get. Once Again, you have saved my A$&*. I thank you. Still learning the ropes. I appreciate your help. Is there a place I can select answered and give you credit? – Boomer May 25 '21 at 16:36
  • Appreciate it! But this question is probably a dupe and had already been asked before so i guess there is no point in adding the additional answer – Shubham Sharma May 25 '21 at 16:39
  • Hey Shubham, Quick question about this. Why does the printing of the result end up with only one column instead of two? It looks like the name column drops when printing to file and to screen. Anyway to keep the name column adjacent to the now combined notes column? Please let me know if possible? – Boomer May 25 '21 at 17:44

0 Answers0