We have two CSV files: a.csv
and b.csv
.
a.csv
has tree columns: label, item1, item2. b.csv
has two columns: item1, item2. If item1 and item2 in a.csv
also occurr in b.csv
, that's a.csv
and b.csv
have same item1 and item2, the value of label in a.csv
should be 1 instead. How to use pandas to deal?
For example:
a.csv:
label item1 item2
0 123 35
0 342 721
0 876 243
b.csv:
item1 item2
12 35
32 721
876 243
result.csv:
label item1 item2
0 123 35
0 342 721
1 876 243
I tried this, but it doesn't work:
import pandas as pd
df1 = pd.read_csv("~/train_dataset.csv", names=['label', 'user_id', 'item_id', 'behavior_type', 'user_geohash', 'item_category', 'time','sales'], parse_dates=True)
df2 = pd.read_csv(~/train_user.csv", names=['user_id', 'item_id', 'behavior_type', 'user_geohash', 'item_category', 'time', 'sales'], parse_dates=True)
df1.loc[(df1['user_id'] == df2['user_id'])& (df1['item_id'] == df2['item_id']), 'label'] = 1