Removal of duplicate lines from a text file using python

Question

Earlier I wrote the code for extracting a specific string from multiple files and the result is stored in a separate file.Now this file has duplicate results which I need to remove .

import glob
import re
import os.path
path=r"H:\sample"
file_array=glob.glob(os.path.join(path,'*.txt'))
with open("aiq_hits.txt","w") as out_file;
    for input_filename in file_array:
        with open(input_filename) as in_file:
            for line in in_file:
                match=re.findall(r"""(?<=')[^']*\.aiq(?=')|(?<=")[^"]*\.aiq(?=")""")                  
                for item in match:
                    out_file.write("%s\n" %item)
out_file.close()

This out_file has duplicate results which I need to remove and result should be the same file

Same question was asked and answered before: http://stackoverflow.com/questions/1215208/how-might-i-remove-duplicate-lines-from-a-file You could try searching for what you want before posting a question — koukouviou, Apr 15 '15 at 05:55

score 5 · Answer 1 · edited Dec 24 '22 at 21:55

5

Load input file.
Read input file by lines. The readlines will return a list of lines from the file content.
Create a new list.
Iterate every line from the lines.
Strip the white spaces from the line.
Check if the line is present in new_lines.
If not, then append the line in the new_lines list.
Write new_lines into the file.

Demo:

input_file = "input.txt"
with open(input_file, "r") as fp:
    lines = fp.readlines()
    new_lines = []
    for line in lines:
        #- Strip white spaces
        line = line.strip()
        if line not in new_lines:
            new_lines.append(line)

output_file = "output.txt"
with open(output_file, "w") as fp:
    fp.write("\n".join(new_lines))

edited Dec 24 '22 at 21:55

Lulucmy

148
1
12

answered Apr 15 '15 at 06:02

Vivek Sable

9,938
3
40
56

:Thanks for the suggestion but I need output in input file only – goshanky Apr 15 '15 at 06:22
Also the result is in one horizontal line – goshanky Apr 15 '15 at 06:22
1

1: use input file name during write file event. i.e. `with open(input_file, "wb") as fp:` and 2. not getting?? lines are on new lines.. you can shear you code with us on stackoverflow or email me vivekbsable@gmail.com – Vivek Sable Apr 15 '15 at 06:30

Removal of duplicate lines from a text file using python

1 Answers1