0

I have two files. I want to read each line from file1 and check if it matches with any line in file2. I want this to repeat for each line in file1, and print the line number of file1 whose match was found in file2. So far I have this. It works for test files of 4-5 lines each but when working on large files of over 60k lines, I am getting a blank output

    num=0
    f1 = open('pdataf.txt', 'r')
    f2 = open('splitc.txt', 'r')
    fo = open('op.txt', 'w')
    for line1 in f1:
        for line2 in f2:
            num=num+1
            if line1==line2:
                nummy=str(num)
                fo.write(nummy)
                fo.write('\n')
                break
        continue
    f1.close()
    f2.close()
mkrieger1
  • 19,194
  • 5
  • 54
  • 65

2 Answers2

0

Firstly, you have a syntax error on line 8, replace the line with print(num). I don't have a lot of information about your problem, it might be a good idea to clarify, but what I suspect happens is that you have an end line character "\n" at the end of the lines you read.

To get rid of this character, you can use the rstrip() method as discussed in this topic.

So, I would suggest replacing your code by :

num=0
f1 = open('s.txt', 'r')
f2 = open('p.txt', 'r')
for line1 in f1:
    line1 = line1.rstrip()
    for line2 in f2:
        line2 = line2.rstrip()
        if line1==line2:
            num=num+1
            print(num)
f1.close()
f2.close()
Community
  • 1
  • 1
Gaboik1
  • 353
  • 3
  • 15
  • Hey, thanks! I just modified the question. The code seems to work for small files but gives nothing for large files. Can you please read the question again and help out? –  Sep 24 '16 at 05:14
  • I would really need a more specific description of your problem. Did you debug your program? At which line does it crash? What is the error? What do you mean by "it works for small files but gives nothing for large files"? – Gaboik1 Sep 24 '16 at 05:17
  • I tried my modified code on files with 5-6 lines and it gives the correct output, but for large files of 60k+ lines, it is crashing. I tried to debug. This is what I got: The program finished and will be restarted –  Sep 24 '16 at 05:20
  • I'm sorry but I'm afraid I can't help you much more than that if you don't provide me with specific information. This message "The program finished and will be restarted", where do you get it? Do you run the program in a console? Do you use an IDE like Jet Brains, Eclipse or Visual Studio? Does the program execute and terminate normally? – Gaboik1 Sep 24 '16 at 05:27
  • I am running the program on Ubuntu terminal. The program runs fine for some time, then exits with this error: "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 2064: invalid continuation byte" . My text files contain some German characters. I need to process the German characters as they are. Any ideas how to make it work? –  Sep 24 '16 at 05:30
  • You might want to take a look at this topic : http://stackoverflow.com/questions/11086752/read-a-text-file-with-non-ascii-characters-in-an-unknown-encoding – Gaboik1 Sep 24 '16 at 05:35
  • Do you know what character encoding your file is encoded with? – Gaboik1 Sep 24 '16 at 05:35
  • uchardet tells me it is in UTF-8. –  Sep 24 '16 at 05:38
  • And does the error occur when you're trying to read or when you're trying to print the character on the console? – Gaboik1 Sep 24 '16 at 05:43
  • While reading. Decoding fails. This is the error message: UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 0: invalid continuation byte –  Sep 24 '16 at 05:46
0
  1. Open Both file and assign the file pointer f1 and f2 for them.
  2. Initialise the variable k which will print the line in which no matching.
  3. Do rstrip of both files and compare the line.
with open('file1.txt') as f1, open('file2.txt') as f2:
    k=0
    for line1,line2 in zip(f1,f2):
        if line1.rstrip() != line2.rstrip():
            print(k, line1,'-->',line2)
        else:
            k+=1
    print("Both the file are similar")
Azhar Khan
  • 3,829
  • 11
  • 26
  • 32