How to combine the rows and add columns in dataframe use python?

Question

Here is the data frame:

Name| File amount| other amount | other name
A      123         48             a
B      456         48             a
C      789         49             a
A                  48             b
A                  48             c
B                  48             d

What I want is the following:

Name is the ID and each name must have a column 'file amount', and what I want is group all the lines together based on Name, and put 'other name' on the right side if they have more than 1 column, add on the right side. and leave blank if they don't have the 'other name'.

Name| File amount| other_1| other_1 amount| other_2| other_2 amount| other_3| other_3 amount
A     123          a         48             b            48           c         48
B     456          a         48             d            48
C     789          a         49

`df2 = (df.assign(col=df.groupby('Name').cumcount().add(1)) .pivot(index='Name', columns='col') ) ; df2.columns = df2.columns.map(lambda x: f'{x[0]}_{x[1]}')` — mozway, Jul 11 '22 at 04:06

score 0 · Answer 1 · answered Jul 11 '22 at 02:05

0

df2 = df.groupby('Name').agg({
    'File amount': 'first',
    'other name': list,
    'other amount': list,
})
df3 = pd.concat([df2['File amount'], pd.DataFrame(df2['other name'].tolist(), index=df2.index).rename(columns=lambda x: 'other_' + str(x + 1)),
pd.DataFrame(df2['other amount'].tolist(), index=df2.index).rename(columns=lambda x: 'other_' + str(x + 1) + ' amount')],axis=1)
df3 = df3.reindex(sorted(df3.columns, key=lambda x: x.split(' ')[0]),axis=1)

answered Jul 11 '22 at 02:05

MoRe

2,296
2
3
23

what if add another key 'Name_1', how to do that? groupby(['Name', 'Name_1']) – huo shankou Jul 11 '22 at 03:16
If 'other name' 'other amount' they have null values, when you do the transformation, the null value will also occupy one column, that will incur that extra columns are created that are not needed. – huo shankou Jul 11 '22 at 03:18

score 0 · Answer 2 · answered Jul 11 '22 at 02:51

0

One more solution here!

other_name_cols = sum([[f"other_{i+1}", f"other_{i+1} amount"] for i in range(df.Name.value_counts().max())], list())
df_rs = df[['Name', 'File amount']].dropna().set_index('Name')
df_rs[other_name_cols] = None
for i, name in enumerate(df_rs.index):
    records = [x for y in df[df['Name'] == name][["other amount", "other name"]].values for x in y]
    df_rs.iloc[i, 1: 1+len(records)] = records

answered Jul 11 '22 at 02:51

Lazyer

917
1
6
16

Thanks for answering and I have tried this solution, unfortunately, my sample data is huge and can't run that...more than 100k rows – huo shankou Jul 11 '22 at 03:25
How about use **chunksize** when run **read_csv** – Lazyer Aug 24 '22 at 07:50

How to combine the rows and add columns in dataframe use python?

2 Answers2