I would appreciate some help with adjusting a bit of code to address a problem I have trying to manipulate data in pandas. I have a very large spreadsheet with two columns (names, notes). Moving down the names column, if the name matches the name from the previous cell value then append the adjacent notes value to the previous notes value. Then delete named row. Continue to next row down.
The code below looks for a static name(currently Jason). I need to have the code look for any names(only alpha chars - maybe a space between names do exist). Once the names do not match, the next name is added to the new name search. I couldn't figure out a wildcard search so I was testing with on name.
To sound this out logically.
1. Compare name, if not exist, capture name for search. move to row 2.
2. If name matches previous name(above) then append adjacent note value to the above note value.
3. Delete current row and shift up.
4. Compare name with next row down. If match, goto step 2 and repeat. if no match, goto step 1 and
capture new name.
#----Code Begin
import pandas as pd
data = {'name': ['Jason', 'Jason', 'Molly Jin', 'Molly Jin', 'Molly Jin',
'Tina', 'Tina', 'Tina','Jake Robins', 'Amy','Amy'],
'notes': ['Are you ok', 'just fine', 'I am here', 'where are you', 'over here',
'guest what','what','your funny','I am not here','what flavor','rasberry'] }
df = pd.DataFrame(data)
print(df)
print('')
s = df['name'].shift(-1) #Looks for the row previous names col.
t = df['notes'].shift(-1) #Looks for the row revious notes col.
m = s.str.startswith('Jason', na=False) #Searches names col for name (needs a wildcard search)
df.loc[m, 'notes'] += (', ' + t[m]) #grabs adjacent notes col. data and appends to previous
#notes cell value with comma/space separator.
print(df)
#----Code End
What I would like:
name notes
Jason Are you ok, just fine
Molly Jin I am here, where are you, over here
Tina guest what, what, your funny
Jake Robins I am not here
Amy what flavor, rasberry
Thanks for any help that you can provide.