Minus of two dataframes with common multi-index

Question

Have two dataframes, one of them is very large, as follows:

import pandas as pd
import numpy as np
import string, random

siz = int(1e10)
random.seed(1234)
a1 = pd.Series((random.choice(string.ascii_uppercase) for _ in range(siz)), name='CatA')
a2 = pd.Series((random.choice(string.ascii_lowercase) for _ in range(siz)), name='CatB')
val1 = pd.Series(pd.Series(np.random.randint(2, high=10, size=siz), name='Value'))

df_a = pd.DataFrame([a1, a2, val1]).T.set_index(['CatA', 'CatB'])

siz = 1000
random.seed(4321)
b1 = pd.Series((random.choice(string.ascii_uppercase) for _ in range(siz)), name='CatA')
b2 = pd.Series((random.choice(string.ascii_lowercase) for _ in range(siz)), name='CatB')
val2 = pd.Series(pd.Series(np.random.randint(2, high=10, size=siz), name='Value'))

df_b = pd.DataFrame([b1, b2, val2]).T.set_index(['CatA', 'CatB'])

Want to quickly get the difference between the two dataframes based on their index, while keeping Value of df_a intact.
- df_b should be eliminated from df_a.
- Both dfs have the same structure. The Value of df_a should be preserved.
- The Value of df_b is dropped.

Tried df_a.sub(df_b.drop('Value', 1)) ... which doesn't work.

Is there a vectoriz-ed way to do this?

Python Pandas - Find difference between two data frames is not multi-index.

jezrael · Accepted Answer · 2020-07-01T05:55:51.383

1

I believe you need Index.isin with inverted mask by ~:

df = df_a[~df_a.index.isin(df_b.index)]

edited Jul 01 '20 at 05:55

answered Jul 01 '20 at 05:40

jezrael

822,522
95
1,334
1,252

Minus of two dataframes with common multi-index

1 Answers1