Delete dictionary in a list with conditions

Question

I have list of dictionaries below, and I need to delete dictionaries having the same received_on and customer_group values but leave a random one item.

data = [
    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51',
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4a42673e2',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    },
    {
        'id': '16db0dd4199f5897',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    }
]

Expected output:

[
    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51',
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4199f5897',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'

    }
]

you can use if else and for loop right – Aaroosh Pandoh Nov 14 '19 at 05:01 — Aaroosh Pandoh, Nov 14 '19 at 05:01
Add unique ones, don't delete duplicate ones. – FatihAkici Nov 14 '19 at 05:02 — FatihAkici, Nov 14 '19 at 05:02
What have you tried so far? – MisterMiyagi Nov 14 '19 at 05:34 — MisterMiyagi, Nov 14 '19 at 05:34
@MisterMiyagi I posted an answer awhile ago. – Nov 14 '19 at 05:37 — , Nov 14 '19 at 05:37

score 1 · Answer 1 · answered Nov 14 '19 at 05:03

Here's one way to get the first unique datetime, if you want random item, you can shuffle the list first like in here

data = [
    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51',
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4a42673e2',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    },
    {
        'id': '16db0dd4199f5897',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    }
]

datetime = set()
result = []
for d in data:
    dt = d['received_on']
    if dt not in datetime:
        result.append(d)
        datetime.add(dt)
result

Output:

[{'id': '16e26a4a9f97fa4f',
  'received_on': '2019-11-01 11:05:51',
  'customer_group': 'Life-time Buyer'},
 {'id': '16db0dd4a42673e2',
  'received_on': '2019-10-09 14:12:29',
  'customer_group': 'Lead'}]

score 1 · Answer 2 · 2019-11-14T05:40:31.113

1

Using some ideas above, I also want to include customer_group as another condition aside from received_on. I got my expected result.

conditions, result = [], []
for d in data:
    condition = (d['received_on'], d['customer_group'])
    if condition not in conditions:
        result.append(d)
        conditions.append(condition)
print(len(result))

edited Nov 14 '19 at 05:40

answered Nov 14 '19 at 05:32

Consider using a `set` for `conditions` – Mad Physicist Nov 14 '19 at 05:43
Also, this does not fullfil your own criterion of random selection. – Mad Physicist Nov 14 '19 at 05:44

score 1 · Answer 3 · answered Nov 14 '19 at 05:35

Here's an idea:

import random

data = [
    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51',
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4a42673e2',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    },
    {
        'id': '16db0dd4199f5897',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    }
]


r_data = data.copy()
random.shuffle(r_data)
unique_data = {(elem['received_on'],elem['customer_group']):elem['id'] 
                for elem in data}
new_data = [{'id':val, 'received_on':key[0],'customer_group':key[1]} 
                for key,val in unique_data.items()]
new_data = sorted(new_data,key = lambda x:data.index(x)) #if you need sorted
print(new_data)

Output:

[{'id': '16e26a4a9f97fa4f', 'received_on': '2019-11-01 11:05:51', 'customer_group': 'Life-time Buyer'}, {'id': '16db0dd4199f5897', 'received_on': '2019-10-09 14:12:29', 'customer_group': 'Lead'}]

score 0 · Answer 4 · answered Nov 14 '19 at 05:07

I think adding the dictionaries whose received_on is not seen so far is easier than filtering out the ones with duplicate received_ons:

result = []
receivedList = []
for d in data:
    if d['received_on'] not in receivedList:
        result.append(d)
        receivedList.append(d['received_on'])

print(result)
[{'customer_group': 'Life-time Buyer',
  'id': '16e26a4a9f97fa4f',
  'received_on': '2019-11-01 11:05:51'},
 {'customer_group': 'Lead',
  'id': '16db0dd4a42673e2',
  'received_on': '2019-10-09 14:12:29'}]

score 0 · Answer 5 · answered Nov 14 '19 at 05:20

This is the better way to append in new array

data = [
    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51',
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4a42673e2',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    },
    {
        'id': '16db0dd4199f5897',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    }
]
unique_received = []
unique_customer_group = []
unique_data = []
for i in data:
    if i['customer_group'] not in unique_customer_group:
        if i['received_on'] not in unique_received:
            unique_data.append(i)
            unique_received.append(i['received_on'])
        unique_customer_group.append(i['customer_group'])

print(unique_data)

Output

[

    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51', 
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4a42673e2', 
        'received_on': '2019-10-09 14:12:29', 
        'customer_group': 'Lead'
    }
]

Mad Physicist · Answer 6 · 2019-11-14T05:41:53.417

You can use sort by a custom key and then use random.choice on each group returned by itertools.groupby.

Sorting the list:

keyfunc = lambda x: (x['received_on'], x['customer_group'])
data.sort(key=keyfunc)

Grouping:

g = itertools.groupby(data, keyfunc)

Selecting random elements requires you to turn each group iterator into a sequence:

result = [random.choice(list(group)) for k, group in g]

Normally, I would keep the key function separate, especially since it's used twice, and only combine the last two steps into

result = [random.choice(list(group)) for k, group in itertools.groupby(data, keyfunc)]

However, you can use sorted to write a monstrous, redundant, one-liner:

result = [random.choice(list(group)) for k, group in itertools.groupby(sorted(data, key=lambda x: (x['received_on'], x['customer_group'])), key=lambda x: (x['received_on'], x['customer_group']))]

Delete dictionary in a list with conditions

6 Answers6