3

Earlier I wrote the code for extracting a specific string from multiple files and the result is stored in a separate file.Now this file has duplicate results which I need to remove .

import glob
import re
import os.path
path=r"H:\sample"
file_array=glob.glob(os.path.join(path,'*.txt'))
with open("aiq_hits.txt","w") as out_file;
    for input_filename in file_array:
        with open(input_filename) as in_file:
            for line in in_file:
                match=re.findall(r"""(?<=')[^']*\.aiq(?=')|(?<=")[^"]*\.aiq(?=")""")                  
                for item in match:
                    out_file.write("%s\n" %item)
out_file.close()

This out_file has duplicate results which I need to remove and result should be the same file

goshanky
  • 29
  • 1
  • 3

1 Answers1

5
  1. Load input file.
  2. Read input file by lines. The readlines will return a list of lines from the file content.
  3. Create a new list.
  4. Iterate every line from the lines.
  5. Strip the white spaces from the line.
  6. Check if the line is present in new_lines.
  7. If not, then append the line in the new_lines list.
  8. Write new_lines into the file.

Demo:

input_file = "input.txt"
with open(input_file, "r") as fp:
    lines = fp.readlines()
    new_lines = []
    for line in lines:
        #- Strip white spaces
        line = line.strip()
        if line not in new_lines:
            new_lines.append(line)

output_file = "output.txt"
with open(output_file, "w") as fp:
    fp.write("\n".join(new_lines))
Lulucmy
  • 148
  • 1
  • 12
Vivek Sable
  • 9,938
  • 3
  • 40
  • 56
  • :Thanks for the suggestion but I need output in input file only – goshanky Apr 15 '15 at 06:22
  • Also the result is in one horizontal line – goshanky Apr 15 '15 at 06:22
  • 1
    1: use input file name during write file event. i.e. `with open(input_file, "wb") as fp:` and 2. not getting?? lines are on new lines.. you can shear you code with us on stackoverflow or email me vivekbsable@gmail.com – Vivek Sable Apr 15 '15 at 06:30