python pandas merge two or more lines of text into one line

Question

I have data frame with text data like below,

    name | address                  | number 
1   Bob    bob                        No.56
2          @gmail.com           
3   Carly  carly@world.com            No.90
4   Gorge  greg@yahoo     
5          .com                   
6                                     No.100

and want to make it like this frame.

    name | address               | number 
1   Bob    bob@gmail.com           No.56
2   Carly  carly@world.com         No.90                 
3   Gorge  greg@yahoo.com          No.100

I am using pandas to read file but not sure how to use merge or concat.

score 1 · Accepted Answer · answered Feb 15 '17 at 04:19

In case of name column consists of unique values,

print df

    name          address  number
0    Bob              bob   No.56
1    NaN       @gmail.com     NaN
2  Carly  carly@world.com   No.90
3  Gorge       greg@yahoo     NaN
4    NaN             .com     NaN
5    NaN              NaN  No.100

df['name'] = df['name'].ffill()
print df.fillna('').groupby(['name'], as_index=False).sum()

    name          address  number
0    Bob    bob@gmail.com   No.56
1  Carly  carly@world.com   No.90
2  Gorge   greg@yahoo.com  No.100

you may need ffill(), bfill(), [::-1], .groupby('name').apply(lambda x: ' '.join(x['address'])), strip(), lstrip(), rstrip(), replace() kind of thing to extend above code to more complicated data.

score 0 · Answer 2 · edited May 23 '17 at 12:08

If you want to convert a data frame of sex rows (with possible NaN entry in each column), there might be no direct pandas methods for that.

You will need some codes to assign the value in name column, so that pandas can know the split rows of bob and @gmail.com belong to same user Bob.

You can fill each empty entry in column name with its preceding user using the fillna or ffill methods, see pandas dataframe missing data.

df ['name'] = df['name'].ffill()

# gives
    name    address number
0   Bob bob No.56
1   Bob @gmail.com  
2   Carly   carly@world.com No.90
3   Gorge   greg@yahoo  
4   Gorge   .com    
5   Gorge       No.100

Then you can use the groupby and sum as the aggregation function.

df.groupby(['name']).sum().reset_index()

# gives
    name    address number
0   Bob bob@gmail.com   No.56
1   Carly   carly@world.com No.90
2   Gorge   greg@yahoo.com  No.100

You may find converting between NaN and white space useful, see Replacing blank values (white space) with NaN in pandas and pandas.DataFrame.fillna.

python pandas merge two or more lines of text into one line

2 Answers2

Linked