-1

Given two text files, each line shows the absolute path of each image.

The first two lines of the first text file read

/home/picture/I10045.jpg
/home/picture/I10056.jpy

The first two lines of the second text file reads

Cat, Dog
Mouse, Mouse, Mouse

How is it that you would read in the two separate files and delete the duplicates of the second file. Then merge them together to make a third file.

Output in the third text file should read

/home/picture/I10045.jpg Cat, Dog
/home/picture/I10056.jpg Mouse
Ken
  • 19
  • 3

3 Answers3

2

This assumes that in your current working directory file1.txt contains:

/home/picture/I10045.jpg
/home/picture/I10056.jpy

and file2.txt contains

Cat, Dog
Mouse, Mouse, Mouse

It also assumes that we don't care about the order of the elements in each line of file2.txt since it uses set to remove duplicates. If you need that order i'd consider using a for loop instead of a comprehension and manually building a list while checking membership with in or making some unconventional use of OrderedDict, there's some more details on how to do that stuff in here: Removing duplicates in lists

#!/usr/bin/env python3

with open("file1.txt") as file1, open("file2.txt") as file2:
    file1_lines = [line.strip("\n") for line in file1]
    file2_lines = [set(line.strip("\n").split(", ")) for line in file2]

with open("file3.txt", "w") as file3:
    for line1, line2 in zip(file1_lines, file2_lines):
        print(line1, ", ".join(line2), file=file3)

The contents of file3.txt:

/home/picture/I10045.jpg Dog, Cat
/home/picture/I10056.jpy Mouse

An explanation of what's happening:

We open both input files using with, which is usually recommended.

We run a list comprehension on the open file1 object which just removes the newlines from each line, this will help when we join the lines together later.

We run another list comprehension over our open file2 object which removes newlines and then splits each line on commas into a set. This removes any duplicates and leaves us with a list of sets.

We open file3.txt for writing and use zip to allow us to iterate over both the lists we just made. we use join to rebuild the lines in file2.txt with commas from the sets that are in file2_lines. We don't have to do anything special to the lines from file1.txt.

We use print with the file= argument to write to our file.. it's worth noting that this is file= won't work in python2 without importing print_function from __future__.. if you're using python2 you should probably just use file3.write() instead.

Zhenhir
  • 1,157
  • 8
  • 13
0
#Function to remove the duplicates
def remove_dup(s):
    temp_s = s.split(',')       # Thinking that the second file only has the tags
    check = {}
    for i in temp_s:
        if i in check:
            check[i]+=1
        else:
            check[i]=1

    # Constructing the string
    return_string = ""
    for i in range(0,len(temp_s)):
        if check[temp_s[i]]==1 and i==0:
            return_string = return_string+temp_s[i]
        elif check[temp_s[i]]==1:
            return_string = return_string+", "+temp_s[i]

    return return_string

#Reading in the files
file1 = open('test1.txt','r')
text1 = [i.rstrip() for i in file1]

file2 = open('test2.txt','r')
dup_text2 = [i.rstrip() for i in file2]

# Removing duplicates
text2 = [remove_dup(i) for i in dup_text2]

# Adding the content
text3 = [text1[i]+" "+text2[i] for i in range(0,len(text1))]

# Writing to the file
with open('test3.txt','w') as f:
    for line in text3:
        f.write("%s\n" % line)

I hope this helps

0
i=0
with open('file3.txt', 'w') as outfile:
    with open('file1.txt', 'r') as file1, open('file2.txt', 'r') as file2:
        file2lines = file2.readlines()
        for line in file1 :
            outfile.write(line.replace('\n', '').strip() + ' ' + str(set(file2lines[i].replace('\n', '').replace(', ', ',').split(','))) + '\n')
            i=i+1

It opens both files, then uses file1 as the main for loop. Most of the code is the text clean up (removing spaces, new lines etc) and then I used split to convert the animals into a list, and then used set to eliminate duplicates. Then I converted it back to a string.

Baris Tasdelen
  • 316
  • 1
  • 5