0

Team, I have two files with some duplicates. I want to print or create new list with unique ones. however, my list is getting printed empty. not sure why

f1 = open(file1, 'r')
f2 = open(file2, 'r')
unique = []
for lineA in f1.readlines():
        for lineB in f2.readlines():
            if lineA != lineB:
                print("lineA not equal to lineB", lineA, lineB)
            else:
                unique.append(lineB)
print(unique)

output

lineA not equal to lineB  node789
  node321

lineA not equal to lineB  node789
 node12345

[]

expected

lineA not equal to lineB  node789
  node321

lineA not equal to lineB  node789
 node12345

[node321,node12345]

Second Approach looking at comments list is getting populated but all empty and not recognizing actual strings.

 [~] $ cat  ~/backup/2strings.log
restr1
restr2

 [~] $ cat ~/backup/4strings.log 
restr1
restr2
restr3
restr4


file2 = os.environ.get('HOME') + '/backup/2strings.log'
file1 = os.environ.get('HOME') + '/backup/4strings.log'
f1 = open(file1, 'r')
f2 = open(file2, 'r')
unique = []
for lineA in f1.readlines():
        for lineB in f2.readlines():
            # if lineA.rstrip() != lineB.rstrip():
            if lineA.strip() != lineB.strip():
                print("lineA not equal to lineB", lineA, lineB)
            else:
                print("found uniq")
        unique.append(lineB.rstrip())
print(unique)
print(len(unique))

output

found uniq
lineA not equal to lineB restr1
 restr2

lineA not equal to lineB restr1
 

['', '', '', '', '']
5
AhmFM
  • 1,552
  • 3
  • 23
  • 53
  • Do you simply want to compare the files to see if they are the same, or compare each line, bc if the files arent the same size in terms of rows this isnt very consistent. – Josip Juros Mar 08 '22 at 07:49
  • may be lines are not matching, try this `lineA.strip() != lineB.strip()` – deadshot Mar 08 '22 at 07:49
  • Maybe use contains, Something like `if lineA in f2:` to see if the file contains the line, and not if the lines are a match? – Josip Juros Mar 08 '22 at 07:52
  • I want to check if each line string from f1 exists in entire.f2 and if not print out. for now each file has only 1 string in each line. – AhmFM Mar 08 '22 at 07:54
  • @deadshot I get list now but it only contains one string. my files have like 100 uniques. may be am missing indent in appending ? – AhmFM Mar 08 '22 at 08:03
  • I think also found a solution here https://stackoverflow.com/questions/51192126/get-unique-lines-in-two-text-files – AhmFM Mar 08 '22 at 08:36

2 Answers2

2

I recommend you to use a different but simpler approach. Use sets data structures. Link - https://docs.python.org/3/tutorial/datastructures.html#sets

Pseudo code

unique = []
items01 = set([line.strip() for line in open(file1).readlines()])
items02 = set([line.strip() for line in open(file2).readlines()])

# unique items not present file2
print(list(items01 - items02))
unique += list(items01 - items02)

# unique items not present file2
print(list(items02 - items01))
unique += list(items02 - items01)

# all unique items
print(unique)

In your code, you are using file01 as reference to check items in file01. You need to do the reverse of it too. Challenge No. 2 is too much time complexity. Python sets does hashing internally for performance boost, so use sets.

sam
  • 1,819
  • 1
  • 18
  • 30
0

As I see it from what you post, the only way your expected output deviates from your actual output is that node321 and node12345 are not added to the list unique, which is printed at the end. That is hardly surprising because in your code, you're appending lineB to unique in those cases where lineA and lineB match (because the appending takes place in an else after if lineA != lineB:).

Schnitte
  • 1,193
  • 4
  • 16