I have a two Pandas dataframes and want to intersect (aka inner join) them on a key. I understand how a Pandas dataframe merge()
works, but the problem I have is that I want to keep the columns from one dataframe, not the columns from both.
For example, suppose I have the following two dataframes, df_users
and df_valid_users
.
df_users df_valid_users
user_id email dob user_id last_login
------- ----- --- ------- ----------
123 ... ... 456 ...
456 ... ... 710 ...
789 ... ...
Desired result:
user_id email dob
------- ----- ---
456 ... ...
I want to get all the columns from df_users
as long as df_users.user_id
is also in df_valid_users
.
I tried to use pandas.merge()
, but the result contains the columns from both dataframes. How do keep only the columns from the left dataframe? My real data has dozens of columns, not just a few as in this example.
import pandas as pd
rows_users = [ ['123', 'foo', '1970-01-01'],
['456', 'bar', '1980-01-01'],
['789', 'baz', '1990-01-01']]
df_users = pd.DataFrame(rows_users, columns=['user_id', 'email', 'dob'])
df_users
# user_id email dob
# 0 123 foo 1970-01-01
# 1 456 bar 1980-01-01
# 2 789 baz 1990-01-01
rows_valid_users = [ ['456', '2018-01-10'],
['710', '1918-01-02']]
df_valid_users = pd.DataFrame(rows_valid_users, columns=['user_id', 'last_login'])
df_valid_users
# user_id last_login
# 0 456 2018-01-10
# 1 710 1918-01-02
pd.merge(df_users, df_valid_users, how='inner', on=['user_id'])
# user_id email dob last_login
# 0 456 bar 1980-01-01 1918-01-02
Any help would be appreciated.