1

I have list of dictionaries below, and I need to delete dictionaries having the same received_on and customer_group values but leave a random one item.

data = [
    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51',
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4a42673e2',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    },
    {
        'id': '16db0dd4199f5897',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    }
]

Expected output:

[
    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51',
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4199f5897',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'

    }
]
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264

6 Answers6

1

Here's one way to get the first unique datetime, if you want random item, you can shuffle the list first like in here

data = [
    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51',
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4a42673e2',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    },
    {
        'id': '16db0dd4199f5897',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    }
]

datetime = set()
result = []
for d in data:
    dt = d['received_on']
    if dt not in datetime:
        result.append(d)
        datetime.add(dt)
result

Output:

[{'id': '16e26a4a9f97fa4f',
  'received_on': '2019-11-01 11:05:51',
  'customer_group': 'Life-time Buyer'},
 {'id': '16db0dd4a42673e2',
  'received_on': '2019-10-09 14:12:29',
  'customer_group': 'Lead'}]
ExplodingGayFish
  • 2,807
  • 1
  • 5
  • 14
1

Using some ideas above, I also want to include customer_group as another condition aside from received_on. I got my expected result.

conditions, result = [], []
for d in data:
    condition = (d['received_on'], d['customer_group'])
    if condition not in conditions:
        result.append(d)
        conditions.append(condition)
print(len(result))
1

Here's an idea:

import random

data = [
    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51',
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4a42673e2',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    },
    {
        'id': '16db0dd4199f5897',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    }
]


r_data = data.copy()
random.shuffle(r_data)
unique_data = {(elem['received_on'],elem['customer_group']):elem['id'] 
                for elem in data}
new_data = [{'id':val, 'received_on':key[0],'customer_group':key[1]} 
                for key,val in unique_data.items()]
new_data = sorted(new_data,key = lambda x:data.index(x)) #if you need sorted
print(new_data)

Output:

[{'id': '16e26a4a9f97fa4f', 'received_on': '2019-11-01 11:05:51', 'customer_group': 'Life-time Buyer'}, {'id': '16db0dd4199f5897', 'received_on': '2019-10-09 14:12:29', 'customer_group': 'Lead'}]
Sayandip Dutta
  • 15,602
  • 4
  • 23
  • 52
0

I think adding the dictionaries whose received_on is not seen so far is easier than filtering out the ones with duplicate received_ons:

result = []
receivedList = []
for d in data:
    if d['received_on'] not in receivedList:
        result.append(d)
        receivedList.append(d['received_on'])

print(result)
[{'customer_group': 'Life-time Buyer',
  'id': '16e26a4a9f97fa4f',
  'received_on': '2019-11-01 11:05:51'},
 {'customer_group': 'Lead',
  'id': '16db0dd4a42673e2',
  'received_on': '2019-10-09 14:12:29'}]
FatihAkici
  • 4,679
  • 2
  • 31
  • 48
0

This is the better way to append in new array

data = [
    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51',
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4a42673e2',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    },
    {
        'id': '16db0dd4199f5897',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    }
]
unique_received = []
unique_customer_group = []
unique_data = []
for i in data:
    if i['customer_group'] not in unique_customer_group:
        if i['received_on'] not in unique_received:
            unique_data.append(i)
            unique_received.append(i['received_on'])
        unique_customer_group.append(i['customer_group'])

print(unique_data)

Output

[

    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51', 
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4a42673e2', 
        'received_on': '2019-10-09 14:12:29', 
        'customer_group': 'Lead'
    }
]
0

You can use sort by a custom key and then use random.choice on each group returned by itertools.groupby.

Sorting the list:

keyfunc = lambda x: (x['received_on'], x['customer_group'])
data.sort(key=keyfunc)

Grouping:

g = itertools.groupby(data, keyfunc)

Selecting random elements requires you to turn each group iterator into a sequence:

result = [random.choice(list(group)) for k, group in g]

Normally, I would keep the key function separate, especially since it's used twice, and only combine the last two steps into

result = [random.choice(list(group)) for k, group in itertools.groupby(data, keyfunc)]

However, you can use sorted to write a monstrous, redundant, one-liner:

result = [random.choice(list(group)) for k, group in itertools.groupby(sorted(data, key=lambda x: (x['received_on'], x['customer_group'])), key=lambda x: (x['received_on'], x['customer_group']))]
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264