Find repeated element in a nested list

Question

I have a nested list of elements:

employee_list =  [
    ['Name', '=', 'John'],
    ['Age', '=', '32'],
    ['Weight', '=', '60'],
    ['Name', '=', 'Steve'],
    ['Weight', '=', '85']
]

I want to create two lists of elements: one which has repeated elements and another with unique elements. But I also wanted the repetition to be maintained

unique_list = [['Age', '=', '32']]

repeated_list = [
    ['Name', '=', 'John'],
    ['Weight', '=', '60'],
    ['Name', '=', 'Steve'],
    ['Weight', '=', '85']
]

Uniqueness or repetition is determined by the first element of every sub list. For example: 'Name', 'Weight'. If there are two sub lists where the first element is 'Name' I consider it as repetition.

Can anyone suggest an easy way to do this?

@benvc I don't think it is an exact duplicate, the OP does not want to remove duplicates,they want to "partition" the elements. However the first answer already provides a big hint to one possible efficient solution. — Giacomo Alzetta, Sep 17 '18 at 14:47
By "repeated" you mean the entire sublist `['Weight', '=', '60']` is identical or just one of the significant elements (like `'Weight'`)? — user2390182, Sep 17 '18 at 14:47
@GiacomoAlzetta fair point, let me instead suggest that OP check out the link I posted in my previous comment and they will find what they need there (despite the fact that this question is a bit different). — benvc, Sep 17 '18 at 14:49
Do you want to retain original ordering ? How many elements will your list contain ? (I mean like 10, 100, 1000, 1billion ?) And most important: __what have you tried that didn't work ?__ — bruno desthuilliers, Sep 17 '18 at 14:50
@RahulAgarwal I tried looping over it. Adding the elements to a variable temporarily and then pushing it to another list. A lot like swapping. I know it is not the optimal way but I am not finding a good way do it. — john, Sep 17 '18 at 14:51
@Deep It would be really helpful if you provided your attempts with the code to solve the problem. — Giacomo Alzetta, Sep 17 '18 at 14:52
@benvc I do not want to remove duplicates. I think what I am asking is a bit different from the link you have shared. Thanks anyway. — john, Sep 17 '18 at 14:52
@Deep Read the answers anyway, they still show how to efficiently check if the current element is a duplicate or not, then it's easy to change the actual action you do according to that condition — Giacomo Alzetta, Sep 17 '18 at 14:54
@schwobaseggl I am sorry I did not make it clear in the question. I am talking about uniqueness or repetition of element like 'Name' , 'Weight' or 'Age' — john, Sep 17 '18 at 14:54
@brunodesthuilliers No, the order is not important. The list will contain elements in hundreds at max. I tried looping over it Adding elements temporarily to another list. Like we do in swapping. Not the best way to do it. — john, Sep 17 '18 at 14:57
@Deep so to be clear- you want to construct 2 lists for the unique sublists and also one for duplicated elements — Linkx_lair, Sep 17 '18 at 14:57
@Deep Do the other elements of the lists matter? Are `['Name', '=', 'John']` and `['Name', '=', 'Jane']` duplicates of one another? — Patrick Haugh, Sep 17 '18 at 14:59
@PatrickHaugh I am sorry Patrick. I did not make it clear. I am concerned with uniqueness or repetition of just first element in every sublist. For example: 'Name' , 'Age' , 'Weight'. — john, Sep 17 '18 at 15:01
@Linkx_lair Yes. I want two lists. But, the one where I am keeping the repeated elements I want to keep the sublists as many times as they have appeared in the original list. — john, Sep 17 '18 at 15:07
@benvc thanks for the suggestion. I have edited the question and tried to make it more understandable — john, Sep 17 '18 at 15:11

user2390182 · Answer 1 · 2018-09-17T16:12:11.100

6

You can use a collections.Counter and comprehend the two lists based on the counts of the significant first elements:

from collections import Counter

c = Counter(l[0] for l in employee_list)
# Counter({'Name': 2, 'Weight': 2, 'Age': 1})

uniq = [l for l in employee_list if c[l[0]] == 1]
# [['Age', '=', '32']]

rept = [l for l in employee_list if c[l[0]] > 1]
# [['Name', '=', 'John'],
#  ['Weight', '=', '60'],
#  ['Name', '=', 'Steve'],
#  ['Weight', '=', '85']]

Update: split rept by "key"

d = {}
for l in rept:
    d.setdefault(l[0], []).append(l)
list(d.values())
# [[['Name', '=', 'John'], ['Name', '=', 'John']],
#  [['Weight', '=', '60'], ['Weight', '=', '60']]]

edited Sep 17 '18 at 16:12

answered Sep 17 '18 at 14:55

user2390182

72,016
6
67
89

thanks for the solution. Just out of curiosity, if I want to create the lists for every set of repeated elements. How can that be achieved. For example: rept1 = [['Name', '=', 'John'], ['Name', '=', 'Steve']] and similarly for the rest – john Sep 17 '18 at 15:48
@Deep I would build a dict using the first elements as keys and the lists of lits as values. – user2390182 Sep 17 '18 at 15:52
I meant, the output we are getting in list 'rept'. It further has duplication ('name' and 'weight' both of them twice). If I need to break it down to the list of just 'name' and 'weight' or break it down till the time there are lists of just one type of elements in it. – john Sep 17 '18 at 15:59
@Deep That is exactly how I understood it. I added some code to the answer. – user2390182 Sep 17 '18 at 16:12

score 0 · Answer 2 · answered Sep 17 '18 at 14:54

You can not using list of list to do the Counter , it will return the

unhashable type: 'list'

So We need convert to list of tuple

employee_tuple=list(map(tuple,employee_list))
# then we using Counter    
from collections import Counter
d=Counter(employee_tuple)

l=list(map(d.get,employee_tuple))# get the freq of each item
l
Out[372]: [2, 1, 2, 2, 2]

# then we using filter 
from itertools import compress
list(compress(employee_list, map(lambda x: x == 1, l)))
Out[380]: [['Age', '=', '32']]


list(compress(employee_list, map(lambda x: x != 1, l)))
Out[384]: 
[['Name', '=', 'John'],
 ['Weight', '=', '60'],
 ['Name', '=', 'John'],
 ['Weight', '=', '60']]

score 0 · Answer 3 · answered Sep 17 '18 at 15:06

0

There are a variety of solutions you could use including list comprehensions and filters. You can also use sets and list to produce the unique set of elements and convert back into list as shown in the link provided by benvc Then after you get the list of unique elements, you can filter those elements from the original list to get the resulting list of duplicates (if any)

See python tips on filter

answered Sep 17 '18 at 15:06

Linkx_lair

569
1
9
21

OP has edited the question after lots of comments clarifying what they are really after, making my initial suggestion referenced here less than helpful in resolving the issue. – benvc Sep 17 '18 at 15:13

score 0 · Answer 4 · answered Sep 17 '18 at 16:06

If you created a test_list that contains all of the items in employee_list you can use the built in count method and count the appearances of each employee_list[i][0] in that list if the count == 1 then we append that entire item to our unique_list

employee_list =  [
    ['Name', '=', 'John'],
    ['Age', '=', '32'],
    ['Weight', '=', '60'],
    ['Name', '=', 'Steve'],
    ['Weight', '=', '85']
]

unique_list = []
repeated_list = [] 
test_list = []

for i in employee_list:
    for j in i:
        test_list.append(j)

for i in employee_list:
    if test_list.count(i[0]) == 1:
        unique_list.append(i)
    else:
        repeated_list.append(i)

print(f"Repeated: {repeated_list}")
print(f"Unique: {unique_list}")

(xenial)vash@localhost:~/python/stack_overflow$ python3.7 unique.py 
Repeated: [['Name', '=', 'John'], ['Weight', '=', '60'], ['Name', '=', 'Steve'], ['Weight', '=', '85']]
Unique: [['Age', '=', '32']]

score 0 · Answer 5 · answered Sep 17 '18 at 16:34

I go with the pure numpy solution (I added one more row to make it more general):

Lets say that this is our data:

data = np.array(data).astype(str)

data: array([['Name', '=', 'John'],
       ['Age', '_', '32'],
       ['Weight', '=', '60'],
       ['Name', '=', 'John'],
       ['Weight', '=', '60'],
       ['TT', '=', 'EE']], dtype='<U6')

The next step is to grab the unique rows:

uniq = np.unique(data, axis=0)
uniq: array([['Age', '_', '32'],
       ['Name', '=', 'John'],
       ['TT', '=', 'EE'],
       ['Weight', '=', '60']], dtype='<U6')

Now, we want to see which rows are not repeated more than once: (The answer for only once rows:)

only_once = np.array([row for row in uniq if sum(np.all(row==data, axis=1)) == 1])
only_once:
array([['Age', '_', '32'],
       ['TT', '=', 'EE']], dtype='<U6')

In order to get the repeated indices:

idx = []
for row in only_once:
    lst = np.all(data==row, axis=1)
    idx = np.where(lst)[0]
    idx.append(idx)
idx:
[array([1]), array([5])]

The matrix of the only repeated values:

result = np.delete(data, idx, axis=0)
result:
array([['Name', '=', 'John'],
       ['Weight', '=', '60'],
       ['Name', '=', 'John'],
       ['Weight', '=', '60']], dtype='<U6')

Find repeated element in a nested list

5 Answers5