If data
and df_row
are Series
for your solution need convert output to list
and then to Series
:
result = set(data).difference(set(df_row))
pd.Series(list(result)).to_csv("part1left.txt", index=False)
Or write set to file in pure python:
result = set(data).difference(set(df_row))
with open("part1left.txt", 'w') as file_handler:
for item in result:
file_handler.write("{}\n".format(item))
Pandas only solution with filtering by boolean indexing
with Series.isin
and inverting mask by ~
:
s = data[~data.isin(set(df_row))].drop_duplicates()
s.to_csv("part1left.txt", index=False)
EDIT:
If need create Series
from files:
import pandas as pd
temp=u"""12354564
25345754
23545454
11565654
46456456"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename1.csv'
data = pd.read_csv(pd.compat.StringIO(temp), squeeze=True, header=None, dtype=str)
print (data)
0 12354564
1 25345754
2 23545454
3 11565654
4 46456456
Name: 0, dtype: int64
temp=u"""23545454
11565654
46456456"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename2.csv'
df_row = pd.read_csv(pd.compat.StringIO(temp), squeeze=True, header=None, dtype=str)
print (df_row)
0 23545454
1 11565654
2 46456456
Name: 0, dtype: int64