-1

im kinda new to python and Stackoverflow. forgive me If I did not explain my question properly.

First file (test1.txt): 

customer    ID    age country  version

 - Alex     #1233  25  Canada     7 
 - James    #1512  30  USA        2 
 - Hassan   #0051  19  USA        9



Second file (test2.txt): 

customer     ID    age country  version

 - Alex     #1233  25  Canada    3 
 - James    #1512  30  USA       7 
 - Bob      #0061  20  USA       2 
 - Hassan   #0051  19  USA       1

Results for the missing lines should be

Bob #0061 20 USA  2

Here is the code

    missing = []  
with open('C:\\Users\\yousi\\Desktop\\Work\\Python Project\\test1.txt.txt','r') as a_file:
    a_lines = a_file.read().split('\n')

with open('C:\\Users\\yousi\\Desktop\\Work\\Python Project\\test2.txt.txt','r') as b_file:
    b_lines = b_file.read().split('\n')


for line_a in a_lines:   
    for line_b in b_lines: 
        if line_a in line_b:
            break
    else: 

        missing.append(line_a)

print(missing)
a_file.close()
b_file.close()

The problem with this code is that it compares both files based on the entire line. I only want to check the first 3 columns, if they dont match then it prints the entire line.

new example:

First file (test1.txt)

60122 LX HNN --   4  32.7390  -114.6357     40 Winterlaven - Sheriff Sabstation
60122 LX HNZ --   4  32.7390  -114.6357     40 Winterlaven - Sheriff Sabstation
60122 LX HNE --   4  32.7390  -114.6357     40 Winterlaven - Sheriff Sabstation


second file (test2.txt)

60122 LX HNN --   4  32.739000   -114.635700   40   Winterlaven - Sheriff Sabstation        
60122 LX HNZ --   4  32.739000   -114.635700   40   Winterlaven - Sheriff Sabstation        
60122 LX HNE --   4  32.739000   -114.635700   40   Winterlaven - Sheriff Sabstation 
itsyoyo
  • 11
  • 5

2 Answers2

1

If you want to compare the first 3 columns, you should do this

a_line = 'Alex 1233 25 Canada'  # this is one file's line

# slipt line on white 
a_line = a_line.split()
>>> ['Alex', '1233', '25', 'Canada']

# cat first 3 columns
a_line = a_line[:3]
>>> ('Alex', '1233', '25')

# than you can compare
['Alex', '1233', '25', 'Canada'] == ['Alex', '1233', '25', 'Canada']
>>> True

['Alex', '1233', '25', 'Canada'] == ['Alex', '1233', '25', 'Canada2']
>>> False

Instead of using read().split('\n') you could use just readlines()

-1

If test1.txt and test2.txt contains the text from your question, then this script:

with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
    i1 = [line.split()[:-1] for line in f1 if line.strip().startswith('-')]
    i2 = (line.split() for line in f2 if line.strip().startswith('-'))
    missing = [line for line in i2 if line[:-1] not in i1]

for _, *line in missing:
    print(' '.join(line))

Prints:

Bob #0061 20 USA 2

EDIT: If the file doesn't contain - at the beginning of rows, then this script:

with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
    i1 = [line.split()[:-1] for line in f1 if line.strip()]
    i2 = (line.split() for line in f2 if line.strip())
    missing = [line for line in i2 if line[:-1] not in i1]

for line in missing:
    print(' '.join(line))

Prints:

Bob #0061 20 USA 2

EDIT 2: To compare only first 3 columns, you can use this example (note the [:3]):

with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:
    i1 = [line.split()[:3] for line in f1 if line.strip()]
    i2 = (line.split() for line in f2 if line.strip())
    missing = [line for line in i2 if line[:3] not in i1]

for line in missing:
    print(' '.join(line))

Prints nothing for the new example files you have in the question.

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • thank you. If the file does not contain the ( " - " ) . When i remove the startswith('-') it then will only print #0061 20 USA 2 without the "Bob" . How would i fix this? – itsyoyo May 27 '20 at 22:25
  • the code works perfectly for the example i gave you. for some reason it dosnt work for other scenarios. Could you take a look at the new example i posted. The results for the new example should be nothing. because both files share the same first 3 column values. The reason why I chose only the first 3 values is because as you can see the structure for both files are different. both files are not identical. some have extra spaces. I would really appreciate it if you could help me with this. Thank you – itsyoyo May 27 '20 at 23:36