0

I am trying to compare two pandas dataframes containing old and new values. Some of the ids have been deleted in the new data and some have been added, so the indexes are not identical. I am able to concatenate the dataframes using the ids as indexes, columns as the axis, and 'first' and 'second' as the keys, but cannot find a way to preserve the order of both dataframes. The order of data1 is preserved but I would like the order of data2 to also be preserved.

What I tried:

data1={'id':[1, 2, 3, 4, 5],'value':[10, 25, 12, 100, 26]}
data2={'id':[1, 2, 6, 4, 5],'value':[10, 24, 48, 100, 60]}


df = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df_both = pd.concat([df.set_index('id'), df2.set_index('id')], 
               axis='columns', keys=['First', 'Second'])

print(df_both)

Resulting df_both:

    First Second
    value  value
id
1    10.0   10.0
2    25.0   24.0
3    12.0    NaN
4   100.0  100.0
5    26.0   60.0
6     NaN   48.0

id 6 was placed at the bottom of the dataframe because the index did not exist in data1.

What I want as the resulting df_both:

    First Second
    value  value
id
1    10.0   10.0
2    25.0   24.0
3    12.0    NaN
6     NaN   48.0
4   100.0  100.0
5    26.0   60.0

I would like for the deleted and new rows to maintain their position.

  • `df_both = pd.concat([df.set_index('id'), df2.set_index('id')][::-1], axis='columns', keys=['First', 'Second'][::-1]).loc[:, ::-1]` – Ch3steR Feb 08 '23 at 20:08

2 Answers2

2

There is no straightforward way to do that:

idx = pd.Index(pd.concat([df['id'], df2['id']]).sort_index().drop_duplicates())
df_both = (pd.concat([df.set_index('id'),
                      df2.set_index('id')],
                      axis=1, keys=['First', 'Second'])
             .reindex(idx))
print(df_both)

# Output
    First Second
    value  value
id              
1    10.0   10.0
2    25.0   24.0
3    12.0    NaN
6     NaN   48.0
4   100.0  100.0
5    26.0   60.0
Corralien
  • 109,409
  • 8
  • 28
  • 52
0

You can concatenate the dataframes on a temporary column with the desired order and then reset the index to obtain the desired result.

data1 = {'id': [1, 2, 3, 4, 5], 'value': [10, 25, 12, 100, 26]}
data2 = {'id': [1, 2, 6, 4, 5], 'value': [10, 24, 48, 100, 60]}

df = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

df['temp'] = range(len(df))
df2['temp'] = range(len(df), len(df) + len(df2))

df_both = pd.concat([df, df2], ignore_index=True)

df_both = df_both.sort_values(by='temp').drop('temp', axis=1)

df_both = df_both.set_index('id')

df_both.columns = pd.MultiIndex.from_product([['First', 'Second'], ['value']])

print(df_both)
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
Chiron
  • 1
  • 1
  • Hi, I received the error "ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 elements" – Anna Ortega Feb 08 '23 at 20:19