-1

Let's say I have a dictionary

{'us': 
     {'male': 
            {'given_names': 
                          ['Alex', 'Bob', 'Charlie'] 
            }, 
      'female': 
            {'given_names': 
                          ['Alice', 'Betty', 'Claire'] 
            } 
      },

'uk': 
     {'male': 
            {'given_names': 
                          ['aaa', 'Bbb', 'cc'] 
            }, 
      'female': 
            {'given_names': 
                          ['ppp', 'ddd', 'sss'] 
            } 
      }

}

Now let's say I want to get 60% US names, 40% UK names, but with 50 50 % males and females names.

How Can I do it?

Current approach? Trying to think something similar to this But I guess it is more complex then that.

I was thinking to get all the names first, then applying a distribution from them? But it is not making some logical sense. Can someone help?

        # all_possible_names = [
        #     name
        #     for list_of_names in [
        #         self.library[area][gender][
        #             "given_names"
        #         ]
        #         for gender in self.genders
        #         for area in self.name_areas
        #     ]
        #     for name in list_of_names
        # ]
        # print(all_possible_names) `

Thanks.

Ahmad Anis
  • 2,322
  • 4
  • 25
  • 54

2 Answers2

2

Use random.choices with a weight and choice to split between male/female, assuming your dictionary is named d and N is the total amount of names you'd like, then:

from random import choice, choices

N = 3

names = [
    choice(d[country][choice(['male', 'female'])]['given_names'])
    for country in choices(['us', 'uk'], weights=[0.6, 0.4])
    for _ in range(N)
]
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • this seems great(Will test in a bit). just a quick question if I want to change the distribution of male and female, i simply have to set a weights for them right? i.e `choice(['male', 'female'], weights=[0.4, 0.6])`? Right? – Ahmad Anis Jun 10 '22 at 11:43
  • @AhmadAnis `choice` doesn't accept weights... use `choices` instead – Jon Clements Jun 10 '22 at 11:44
  • I hope the above code will work with `choices(['male', 'female'], weights=[0.4, 0.6])` – Ahmad Anis Jun 10 '22 at 11:45
  • @Ahmad it should - just try it out :) – Jon Clements Jun 10 '22 at 11:52
  • I do have a question here. `for _ in range(N)` for each iteration in N I am getting a single country so I suppose it is making a random choice once. Can this behaviour be changed to let it make choice in each iteration in N? – Ahmad Anis Jun 10 '22 at 12:11
  • I removed the `for _ in range(N)` part and added the `k=N` in `choices(['us', 'uk'])` part. I hope it does not affect the logic I want. – Ahmad Anis Jun 10 '22 at 12:30
  • @AhmadAnis well... that's for you to decide I guess? :p – Jon Clements Jun 10 '22 at 12:41
1

You can use numpy's random.choice to do the weight distribution

from numpy.random import choice as npchoice
from random import choice


some_dict = {
    "us": {
        "male": {"given_names": ["Alex", "Bob", "Charlie"]},
        "female": {"given_names": ["Alice", "Betty", "Claire"]},
    },
    "uk": {
        "male": {"given_names": ["aaa", "Bbb", "cc"]},
        "female": {"given_names": ["ppp", "ddd", "sss"]},
    },
}


possible_choices = ["us", "uk"]
probability_distribution = [0.6, 0.4]
number_of_items_to_pick = 200
countries = list(
    npchoice(possible_choices, number_of_items_to_pick, p=probability_distribution)
)
print(countries)


names = []
females = 0
males = 0
for country in countries:
    gender = choice(["male", "female"])
    if gender == "female":
        females += 1
    else:
        males += 1
    name = choice(some_dict[country][gender]["given_names"])
    names.append(name)
    print(f"{country} | {gender:.1} | {name}")


print(f"\nF: {females}  | M: {males}")
print(f"US: {countries.count('us')} | UK: {countries.count('uk')}")

I added some logic above for my testing, and to check the distribution.
It can be shortened to the logic below:

from numpy.random import choice as npchoice
from random import choice

names = [
    choice(some_dict[country][choice(["male", "female"])]["given_names"])
    for country in npchoice(["us", "uk"], 200, p=[0.6, 0.4])
]
Edo Akse
  • 4,051
  • 2
  • 10
  • 21