I have two data frames. I want to remove rows where the indexes do not occur in both data frames.
Here is an example of the data frames:
import pandas as pd
data = {'Correlation': [1.000000, 0.607340, 0.348844]}
df = pd.DataFrame(data, columns=['Correlation'])
df = df.rename(index={0: 'GINI'})
df = df.rename(index={1: 'Central government debt, total (% of GDP)'})
df = df.rename(index={2: 'Grants and other revenue (% of revenue)'})
data_2 = {'Correlation': [1.000000, 0.607340, 0.348844, 0.309390, -0.661046]}
df_2 = pd.DataFrame(data_2, columns=['Correlation'])
df_2 = df_2.rename(index={0: 'GINI'})
df_2 = df_2.rename(index={1: 'Central government debt, total (% of GDP)'})
df_2 = df_2.rename(index={2: 'Grants and other revenue (% of revenue)'})
df_2 = df_2.rename(index={3: 'Compensation of employees (% of expense)'})
df_2 = df_2.rename(index={4: 'Central government debt, total (current LCU)'})
I have found this question: How to remove rows in a Pandas dataframe if the same row exists in another dataframe? but was unable to use it as I am trying to remove if the index name is the same.
I also saw this question: pandas get rows which are NOT in other dataframe but removes rows which are equal in both data frames but I also did not find this useful.
What I have thought to do is to transpose then concat the data frames and remove duplicate columns:
df = df.T
df_2 = df_2.T
df3 = pd.concat([df,df_2],axis = 1)
df3.iloc[: , ~df3.columns.duplicated()]
The problem with this is that it only removes one of the columns that is duplicated but I want it to remove both these columns.
Any help doing this would be much appreciated, cheers.