7

I have a Dataframe like

 Sou  Des
  1    3
  1    4
  2    3
  2    4
  3    1
  3    2
  4    1
  4    2

I need to assign random value for each pair between 0 and 1 but have to assign the same random value for both similar pairs like "1-3", "3-1" and other pairs. I'm expecting a result dataframe like

 Sou  Des   Val
  1    3    0.1
  1    4    0.6
  2    3    0.9
  2    4    0.5
  3    1    0.1
  3    2    0.9
  4    1    0.6
  4    2    0.5

How to assign same random value similar pairs like "A-B" and "B-A" in python pandas .

4 Answers4

6

Let's create first a sorted by axis=1 helper DF:

In [304]: x = pd.DataFrame(np.sort(df, axis=1), df.index, df.columns)

In [305]: x
Out[305]:
   Sou  Des
0    1    3
1    1    4
2    2    3
3    2    4
4    1    3
5    2    3
6    1    4
7    2    4

now we can group by its columns:

In [306]: df['Val'] = (x.assign(c=1)
                        .groupby(x.columns.tolist())
                        .transform(lambda x: np.random.rand(1)))

In [307]: df
Out[307]:
   Sou  Des       Val
0    1    3  0.989035
1    1    4  0.918397
2    2    3  0.463653
3    2    4  0.313669
4    3    1  0.989035
5    3    2  0.463653
6    4    1  0.918397
7    4    2  0.313669
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
2

This is new way

s=pd.crosstab(df.Sou,df.Des)

b = np.random.random_integers(-2000,2000,size=(len(s),len(s)))
sy = (b + b.T)/2

s.mul(sy).replace(0,np.nan).stack().reset_index()

Out[292]: 
   Sou  Des       0
0    1    3   -60.0
1    1    4  -867.0
2    2    3   269.0
3    2    4  1152.0
4    3    1   -60.0
5    3    2   269.0
6    4    1  -867.0
7    4    2  1152.0
BENY
  • 317,841
  • 20
  • 164
  • 234
0

The trick here is to do a bit of work away from the dataframe. You can break this down into three steps:

  • assemble a list of all tuples (a,b)
  • assign a random value to each pair so that (a,b) and (b,a) have the same value
  • fill in the new column

Assuming your dataframe is called df, we can make a list of all the pairs ordered so that a <= b. I think this will be easier than trying to keep track of both (a,b) and (b,a).

pairs = set([(a,b) if a <= b else (b,a) 
             for a, b in df.itertuples(index=False,name=None))

It's simple enough to assign a random number to each of these pairs and store it in a dictionary, so I'll leave that to you. Call it pair_dict.

Now, we just have to lookup the values. We'll ultimately want to write

df['Val'] = df.apply(<some function>, axis=1)

where our function looks up the appropriate value in pair_dict.

Rather than try to cram it into a lambda (though we could), let's write it separately.

def func(row):
    if row['Sou'] <= row['Des']:
        key = (row['Sou'], row['Des'])
    else:
        key = (row['Des'], row['Sou'])
    return pair_dict[key]
hoyland
  • 1,776
  • 14
  • 14
0

if you are ok having the "random" value coming from the hash() method you can achieve with frozenset()

df = pd.DataFrame([[1,1,2,2,3,3,4,4],[3,4,3,4,1,2,1,2]]).T
df.columns = ['Sou','Des']
df['Val']= df.apply(lambda x: hash(frozenset([x["Sou"],x["Des"]])),axis=1)
print df

which gives:

   Sou  Des         Val
0    1    3  1580307032
1    1    4 -1736016661
2    2    3   741508915
3    2    4 -1930135584
4    3    1  1580307032
5    3    2   741508915
6    4    1 -1736016661
7    4    2 -1930135584

reference: Why aren't Python sets hashable?

Dickster
  • 2,969
  • 3
  • 23
  • 29