I have combined 2 dataframes and I want to make sure that the user id is not repeated and all values are written in 1 line, and not duplicated. This is a ready-made dataframe, but I need to edit it and remove repetitions. I attach the input data:
rzd = pd.DataFrame({
'client_id': [111, 112, 113, 114, 115],
'rzd_revenue': [1093, 2810, 10283, 5774, 981]})
rzd
auto = pd.DataFrame({
'client_id': [113, 114, 115, 116, 117],
'auto_revenue': [57483, 83, 912, 4834, 98]})
auto
air = pd.DataFrame({
'client_id': [115, 116, 117, 118],
'air_revenue': [81, 4, 13, 173]})
air
client_base = pd.DataFrame({
'client_id': [111, 112, 113, 114, 115, 116, 117, 118],
'address': ['Комсомольская 4', 'Энтузиастов 8а', 'Левобережная 1а', 'Мира 14', 'ЗЖБИиДК 1',
'Строителей 18', 'Панфиловская 33', 'Мастеркова 4']})
client_base
frames = [rzd, auto, air]
result = pd.concat(frames)
result
full_result = result.merge(client_base)
full_result
client_id rzd_revenue auto_revenue air_revenue address
0 111 1093.0 NaN NaN Комсомольская 4
1 112 2810.0 NaN NaN Энтузиастов 8а
2 113 10283.0 NaN NaN Левобережная 1а
3 113 NaN 57483.0 NaN Левобережная 1а
4 114 5774.0 NaN NaN Мира 14
5 114 NaN 83.0 NaN Мира 14
6 115 981.0 NaN NaN ЗЖБИиДК 1
7 115 NaN 912.0 NaN ЗЖБИиДК 1
8 115 NaN NaN 81.0 ЗЖБИиДК 1
9 116 NaN 4834.0 NaN Строителей 18
10 116 NaN NaN 4.0 Строителей 18
11 117 NaN 98.0 NaN Панфиловская 33
12 117 NaN NaN 13.0 Панфиловская 33
13 118 NaN NaN 173.0 Мастеркова 4
The numbers are different documentation, by type merge, concat, and so on, but I did not find clear articles on editing.
As a result, it should look like this:
client_id rzd_revenue auto_revenue air_revenue address
0 111 1093.0 0.0 0.0 Комсомольская 4
1 112 2810.0 0.0 0.0 Энтузиастов 8а
2 113 10283.0 57483.0 0.0 Левобережная 1а
3 114 5774.0 83.0 0.0 Мира 14
4 115 981.0 912.0 81.0 ЗЖБИиДК 1