0

I have a text file in the following format:

this is some text __label__a
this is another line __label__a __label__b
this is third line __label__x
this is fourth line __label__a __label__x __label__z

and another list of labels

list_labels = ['__label__x','__label__y','__label__z']

Each line could contain multiple labels from the list. what is the best way to replace labels from the list in each line with "__label__no"

example:

this is third line __label__no
this is fourth line __label__a __label__no

There are a lot more lines in the text file and labels and I was wondering what is the fastest way to achieve this.

shreyy
  • 5
  • 4
  • well, you can load the entire content of the file like this: `content = content_file.read()` and then would simply run over all elements of your `list_labels` and do a replace – StegSchreck Nov 15 '17 at 19:14
  • I am not sure if replace would help me with my fourth line which has multiple labels to be replace by one "__label__no" – shreyy Nov 15 '17 at 19:25
  • alright, I first thought, it would be a one-to-one replacement. Then, I would do the following: * go over each line of the file * check if an element of `list_labels` is present * save that state to a variable * replace the elements with blank space * add `label_no` to the line But this will only work if the position of the `label_no` is not relevant afterwards – StegSchreck Nov 15 '17 at 19:41

1 Answers1

0

This probably isn't a "fastest way" to do it, but depending on the length of your text file, this may work:

list_labels = ['__label__x','__label__y','__label__z']

with open('text.txt', 'r') as f:
    fcontents = f.readlines()

fcontents = [l.strip() for l in fcontents]

def remove_duplicates(l):
    temp = []
    [temp.append(x) for x in l if x not in temp]
    return temp

for line in fcontents:
    for ll in list_labels:
        if ll in line:
            l = line.replace(ll, '__label__no')
            line = ' '.join(remove_duplicates(l.split()))

    print line

output:

this is some text __label__a
this is another line __label__a __label__b
this is third line __label__no
this is fourth line __label__a __label__no

Borrowing the unique_list function from this question How can I remove duplicate words in a string with Python?