0

I have next DataFrame in pandas:

A  B
1  23
43 446
197 5
99 12
....

What I want to have is another DataFrame with the same columns A and B and random elements (0 < A_i < A_max, 0 < B_i < B_max), where every unique combination of A and B elements in some row doesn't exist in the first DataFrame.

user495490
  • 77
  • 2
  • 7
  • Can you describe what you tried so far; what the limits are on the combinations; from what distribution you wish to sample (I take it uniform?); and maybe if you are feeling up to it, why using a DataFrame is imperative? – Uvar Jun 07 '18 at 12:38
  • short to say, it could be the edges of some graph (A nad B are users id). I just want to generate another new edges of the same graph and save them in new DataFrame. I don't want to have repetitions between both sets of the edges. – user495490 Jun 07 '18 at 12:41
  • 1
    https://stackoverflow.com/questions/12581437/python-random-sample-with-a-generator-iterable-iterator you can take a look at for non-bruteforce. Then proceed along the lines of: `elems_1 = df.iloc[:,0]; elems_2 = df.iloc[:,1]; gdoe = set(list(zip(elems_1.values, elems_2.values))); import itertools as it; possibs = possibs = it.product([elems_1.values,elems_2.values])` Then the brute force method is to just create all the elements. `all_p = set(possibs); all_p.difference_update(gdoe)` From which you can randomly sample items which you can then transform into a DataFrame with wished for columns. – Uvar Jun 07 '18 at 13:08

1 Answers1

1

If you don't care about the distribution, you can simply use uniform distribution from random.

Assuming the original DataFrame is named df and you want a random_df of the same length:

from random import random
import pandas as pd

A_max = df['A'].max()
B_max = df['B'].max()

random_df = pd.DataFrame(columns=df.columns)

i = 0
while i < range(len(df)):
    A_random = int(random() * A_max)
    B_random = int(random() * B_max)

    # Checking that the combination does not exist in the original DataFrame
    if len(df[(df['A'] == A_random) & (df['B'] == B_random)] == 0:
        i += 1
        random_df.append({'A': A_random, 'B': B_random}, ignore_index=True)
pdn
  • 81
  • 5