1

Good day everyone! I have 2 files txt and csv with numbers and I would like to compare and delete from 1st file numbers matching with a numbers in second file. Data and df_row are Datagrams.

For example: my first file consist

12354564
25345754
23545454
11565654
46456456

and my second file consist

23545454
11565654
46456456

so result should be

12354564
25345754

My code:

result = set(data).difference(set(df_row))
result.to_csv("part1left.txt")

but there is an error AttributeError: 'set' object has no attribute 'to_csv'

Qwertie
  • 5,784
  • 12
  • 45
  • 89
Alex Cam
  • 115
  • 1
  • 10

1 Answers1

1

If data and df_row are Series for your solution need convert output to list and then to Series:

result = set(data).difference(set(df_row))
pd.Series(list(result)).to_csv("part1left.txt", index=False)

Or write set to file in pure python:

result = set(data).difference(set(df_row))
with open("part1left.txt", 'w') as file_handler:
    for item in result:
        file_handler.write("{}\n".format(item))

Pandas only solution with filtering by boolean indexing with Series.isin and inverting mask by ~:

s = data[~data.isin(set(df_row))].drop_duplicates()
s.to_csv("part1left.txt", index=False)

EDIT:

If need create Series from files:

import pandas as pd

temp=u"""12354564
25345754
23545454
11565654
46456456"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename1.csv'
data = pd.read_csv(pd.compat.StringIO(temp), squeeze=True, header=None, dtype=str)
print (data)
0    12354564
1    25345754
2    23545454
3    11565654
4    46456456
Name: 0, dtype: int64


temp=u"""23545454
11565654
46456456"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename2.csv'
df_row = pd.read_csv(pd.compat.StringIO(temp), squeeze=True, header=None, dtype=str)
print (df_row)
0    23545454
1    11565654
2    46456456
Name: 0, dtype: int64
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • There is an Error : TypeError: 'set' type is unordered – Alex Cam Jul 19 '19 at 06:31
  • Thank u very much! but i have trouble when I use Pandas solution t is makes numbers with xxx.0 sample in a "part1left.txt" file. I didnt get how is possible) – Alex Cam Jul 19 '19 at 06:44
  • - Im so sorry, but could u explain me why I still can find some numbers from second file in first if we dropped them? 1 file is data, secons is df_row. so as an output I have to delete numbers from first file if they match to numbers in second.(( – Alex Cam Jul 19 '19 at 07:02
  • @BakhytgulAzhigaliyeva - if check `print (data.tolist())` and `print (df_row.tolist())` there are no whitespaces? Also another problem should be one or second values are not strings, need all values like strings. So then use `data = data.astype(str)` and `df_row = df_row.astype(str)` – jezrael Jul 19 '19 at 07:07
  • @BakhytgulAzhigaliyeva - Data are condidental? Is possible share it? – jezrael Jul 19 '19 at 07:22
  • According to print (data.tolist()) is still in a process. didnt give any output – Alex Cam Jul 19 '19 at 07:23
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/196688/discussion-between-jezrael-and-bakhytgul-azhigaliyeva). – jezrael Jul 19 '19 at 07:23