1

I have a successful code,which adds the words to the paranthesis:but i need to remove the duplicates in it.

My code:

import re
import collections

class Group:
    def __init__(self):
        self.members = set()
        self.text = []

with open('text1.txt') as f:
    groups = collections.defaultdict(Group)
    group_pattern = re.compile(r'^(\S+)\((.*)\)$')
    current_group = None
    for line in f:
        line = line.strip()
        m = group_pattern.match(line)
        if m:    # this is a group definition line
            group_name, group_members = m.groups()
            groups[group_name].members |= set(group_members.split(','))
            current_group = group_name
        else:
            if (current_group is not None) and (len(line) > 0):
                groups[current_group].text.append(line)

for group_name, group in groups.items():
    print "%s(%s)" % (group_name, ','.join(set(group.members)))
    print '\n'.join(group.text)
    print

My text file:

 Car(skoda,audi,benz,bmw)
 The above mentioned cars are sedan type and gives long rides efficient
 ......

Car(audi,Rangerover,Hummer)
SUV cars are used for family time and spacious.

Ouputs as:

Car(skoda,benz,bmw,Rangerover,Hummer,audi)
The above mentioned cars are sedan type and gives long rides efficient
......
SUV cars are used for family time and spacious.

Expected output:

Car(skoda,audi,benz,bmw,Rangerover,Hummer)
The above mentioned cars are sedan type and gives long rides efficient
......
SUV cars are used for family time and spacious.

Here audi is the duplicate in the output i removed it, but its inserted at last instead of second position. Please help!Answers will be appreciated!

1 Answers1

1

sets are unordered, so there will be no order to your set, if you need order maintained use sorted to sort on the order of the original list:

members = ["skoda","audi","benz","bmw","audi","Rangerover","Hummer"]

print ','.join(sorted(set(members),key=lambda x: members.index(x)))
skoda,audi,benz,bmw,Rangerover,Hummer
  1. set(members) removes the duplicates
  2. Then we use sorted with a lambda which creates a sorted list
  3. We sort using a key key=lambda x: members.index(x) which sorts based on the index that each element was at in the members list.
  4. when all sorted audi is placed in the list based on the index value it had in the original members list so it is back in the list as the second entry.

Because you are using sets from the start you lose the order and it cannot be regained without some structure that maintains the original order to sort against.

You can change your sets to lists if you want to maintain order and use a set at the end to remove dups so the last step would be something like:

','.join(sorted(set(self.members),key=lambda x: self.members.index(x)))

where self.members is now a list and we use it's order to resort the items in the set to their original order.

There is no way without using a container that keeps the order of keeping the original order of the elements.

class Group:
    def __init__(self):
        self.members = []
        self.text = []

with open('text1.txt') as f:
    groups = collections.defaultdict(Group)
    group_pattern = re.compile(r'^(\S+)\((.*)\)$')
    current_group = None
    for line in f:
        line = line.strip()
        m = group_pattern.match(line)
        if m:    # this is a group definition line
            group_name, group_members = m.groups()
            groups[group_name].members += filter(lambda x: x not in groups[group_name].members , group_members.split(','))
            current_group = group_name
        else:
            if (current_group is not None) and (len(line) > 0):
                groups[current_group].text.append(line)

for group_name, group in groups.items():
    print "%s(%s)" % (group_name, ','.join(group.members))
    print '\n'.join(group.text)
    print

The filter code is equivalent to [x for x in group_members.split(',') if x not in groups[group_name].members]

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321