I have 2 really big data frames (lets say A and B).My objective is to make a new dataframe(C) from A and B with an additional Boolean columns; True: if the row is in B and False if its not in B
I get all unique identities from smaller one(B) and stored in a list (its size is 73739559) then I have element matching by pandas apply function but it crushes frequently
df['responsive'] = df.apply(lambda row: row.FullPhoneNumber in FullPhoneNumber, axis = 1)
Now im trying using the following code
f_res = open(main_path +"responsive.csv", 'a')
f_irr = open(main_path +"irresponsive.csv", 'a')
res_writer = csv.writer(f_res)
irr_writer = csv.writer(f_irr)
df['responsive'] =False
for row in df.iterrows():
if row[1]['FullPhoneNumber'] in FullPhoneNumber:
row[1]['responsive']=True
res_writer.writerow(row[1])
else:
irr_writer.writerow(row[1])
Buts its too slow.. I'm looking for something faster as I have +10GB