Get a random sample of a dict

Question

I'm working with a big dictionary and for some reason I also need to work on small random samples from that dictionary. How can I get this small sample (for example of length 2)?

Here is a toy-model:

dy={'a':1, 'b':2, 'c':3, 'd':4, 'e':5}

I need to perform some task on dy which involves all the entries. Let us say, to simplify, I need to sum together all the values:

s=0
for key in dy.key:
    s=s+dy[key]

Now, I also need to perform the same task on a random sample of dy; for that I need a random sample of the keys of dy. The simple solution I can imagine is

sam=list(dy.keys())[:1]

In that way I have a list of two keys of the dictionary which are somehow random. So, going back to may task, the only change I need in the code is:

s=0
for key in sam:
    s=s+dy[key]

The point is I do not fully understand how dy.keys is constructed and then I can't foresee any future issue

score 8 · Answer 1 · answered Jan 20 '20 at 20:32

8

def sample_from_dict(d, sample=10):
    keys = random.sample(list(d), sample)
    values = [d[k] for k in keys]
    return dict(zip(keys, values))

answered Jan 20 '20 at 20:32

J-Mourad

179
2
3

3

Thank you for this code snippet, which might provide some limited, immediate help. A [proper explanation](https://meta.stackexchange.com/q/114762/349538) would greatly improve its long-term value by showing why this is a good solution to the problem and would make it more useful to future readers with other, similar questions. Please [edit] your answer to add some explanation, including the assumptions you’ve made. – Ismael Padilla Jan 20 '20 at 21:16

Jon Clements · Accepted Answer · 2016-10-12T15:58:13.207

3

Given your example of:

dy = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}

Then the sum of all the values is more simply put as:

s = sum(dy.values())

Then if it's not memory prohibitive, you can sample using:

import random

values = list(dy.values())
s = sum(random.sample(values, 2))

Or, since random.sample can take a set-like object, then:

from operator import itemgetter
import random

s = sum(itemgetter(*random.sample(dy.keys(), 2))(dy))

Or just use:

s = sum(dy[k] for k in random.sample(dy.keys(), 2))

An alternative is to use a heapq, eg:

import heapq
import random

s = sum(heapq.nlargest(2, dy.values(), key=lambda L: random.random()))

edited Oct 12 '16 at 15:58

answered Oct 12 '16 at 15:33

Jon Clements

138,671
33
247
280

why not just creating a list via random.sample? what's the advantage in the use of itemgetter? – user2988577 Oct 12 '16 at 15:52
@user2988577 `itemgetter` can sometimes be quicker - otherwise - just write it as `sum(dy[k] for k in random.sample(dy.keys(), 2))` – Jon Clements Oct 12 '16 at 15:53

score 1 · Answer 3 · answered May 30 '18 at 08:55

1

Replace the range(10) with some randome sample from numphy

{v:rows[v] for v in [list(rows.keys())[k] for k in range(10)]}

answered May 30 '18 at 08:55

MajorDaxx

131
1
4

score 0 · Answer 4 · answered Feb 02 '21 at 21:43

0

This should be quicker than creating a new dict and checking if the keys are part of the sample:

import random    
sample_n = 1000
output_dict = dict(random.sample(input_dict.items(), sample_n))

answered Feb 02 '21 at 21:43

muwnd

31
7

score -1 · Answer 5 · answered Jun 27 '19 at 13:12

import random
origin_dict =  {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
sample_rate = 0.3
random_keys = random.sample(list(origin_dict.keys()), int(sample_rate * len(origin_dict)))
random_values = [origin_dict[k] for k in random_keys]

sample_dict = dict(zip(random_keys, random_values))

output:

{'d': 4, 'c': 3}

Dustin Michels · Answer 6 · 2022-11-28T18:16:02.557

-1

Similar to @J-Mourad's nice answer, but using a dictionary comprehension:

def sample_from_dict(d, n=10):
    keys = random.sample(list(d), n)
    return {k: d[k] for k in keys}

edited Nov 28 '22 at 18:16

answered Nov 15 '22 at 19:02

Dustin Michels

2,951
2
19
31

Get a random sample of a dict

6 Answers6

Linked