1

I want to generate 6 random numbers(weights) that always equals one 1000000 times and multiply it the columns of a data i have imported from as csv file. Store the sum in another column(weighted average) and find the difference between the max and min the new column(range). I want to repeat the process 1000000 times and get the least range and the set of random numbers(weights) generated to find that.

Here is what i have done so far: 1.Generate 6 random numbers 2.Import data from csv 3. Multiply the data random numbers with the data from the csv file and find the average(weighted average) 4. save the weighted average in a new column F(x) 5. Find the range 6. Repeat this 1000000 times and get the random numbers that gives me the least range.

Here is some Data from the file

     A    B      C    D      E    F    F(x)
 0  4.9  3.9    6.3  3.4    7.3  3.4    0.0
 1  4.1  3.7    7.7  2.8    5.5  3.9    0.0
 2  6.0  6.0    4.0  3.1    3.7  4.3    0.0
 3  5.6  6.3    6.6  4.6    8.3  4.6    0.0

Currently getting 0.0 for all F(x) which should not be so.

arr = np.array(np.random.dirichlet(np.ones(6), size=1))

arr=pd.DataFrame(arr)

ar=(arr.iloc[0])

df = pd.read_csv('weit.csv')

df['F(x)']=df.mul(ar).sum(1)
df

df['F(x)'].max() - df['F(x)'].min()

I am getting 0 for all my weighted averages. I need to get the weighted average

I cant loop the code to run 1000000 times and get me the least range.

winfred adrah
  • 428
  • 6
  • 18
  • Can you create [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve)? Maybe also help [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – jezrael Mar 30 '19 at 05:53
  • Is possible add some sample data from file? – jezrael Mar 30 '19 at 06:11
  • I have added some data to it @ jezrael – winfred adrah Mar 30 '19 at 06:35
  • thank you, also not sure is necessary use `random.dirichlet` ? Or is possible use [random.choice](https://stackoverflow.com/a/25174767/2901002) ? – jezrael Mar 30 '19 at 06:42

1 Answers1

0

If understand correctly what you need:

#data from file
print (df)
     A    B    C    D    E    F
0  4.9  3.9  6.3  3.4  7.3  3.4
1  4.1  3.7  7.7  2.8  5.5  3.9
2  6.0  6.0  4.0  3.1  3.7  4.3
3  5.6  6.3  6.6  4.6  8.3  4.6

np.random.seed(3434)

Generate 2d array with 6 'columns' and N 'rows' filled unique random numbers by this:

N = 10
#in real data
#N = 1000000 
N = 10
arr = np.array(np.random.dirichlet(np.ones(6), size=N))
print (arr)
[[0.07077773 0.08042978 0.02589592 0.03457833 0.53804634 0.25027191]
 [0.22174594 0.22673581 0.26136526 0.04820957 0.00976747 0.23217594]
 [0.01202493 0.14247592 0.3411326  0.0239181  0.08448841 0.39596005]
 [0.09354759 0.54989312 0.08893737 0.22051801 0.03850101 0.00860291]
 [0.09418778 0.33345217 0.11721214 0.33480462 0.11894247 0.00140081]
 [0.04285476 0.04531546 0.38105815 0.04316535 0.46902838 0.0185779 ]
 [0.00441747 0.08044848 0.33383453 0.09476135 0.37568431 0.11085386]
 [0.14613552 0.11260451 0.10421495 0.27880266 0.28994218 0.06830019]
 [0.50747802 0.15704797 0.04410511 0.07552837 0.18744306 0.02839746]
 [0.00203448 0.13225783 0.43042505 0.33410145 0.08385366 0.01732753]]

Then convert values from DataFrame to 2d numpy array:

b = df.values
#pandas 0.24+
#b = df.to_numpy()
print (b)
[[4.9 3.9 6.3 3.4 7.3 3.4]
 [4.1 3.7 7.7 2.8 5.5 3.9]
 [6.  6.  4.  3.1 3.7 4.3]
 [5.6 6.3 6.6 4.6 8.3 4.6]]

Last multiple both arrays together to 3d array and sum per axis 2, last for subtract maximum with minimum use numpy.ptp:

c = np.ptp((arr * b[:, None]).sum(axis=2), axis=1)
print (c)

[2.19787892 2.08476765 1.2654273  1.45134533]

Another solution with numpy.einsum:

c = np.ptp(np.einsum('ik,jk->jik', arr, b).sum(axis=2), axis=1)
print (c)
[2.19787892 2.08476765 1.2654273  1.45134533]

Loop solution for compare, but slow with large N:

out = []
for row in df.values:
#    print (row)
    a = np.ptp((row * arr).sum(axis=1))
    out.append(a)
print (out)
[2.197878921892329, 2.0847676512823052, 1.2654272959079576, 1.4513453259898297]   
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    this is the error i am getting shape must be (37, 6): given (1, 6) – winfred adrah Mar 30 '19 at 06:54
  • What is `df = pd.read_csv('weit.csv')` and then `print (df.info())` ? – jezrael Mar 30 '19 at 06:56
  • Thanks, that works. But two questions 1. How do i force each set of the 6 random numbers to be equal to 1? 2. How do I do multiply with each set of random numbers with df and get the least rangeof the 10 random sets generated? forgive me but i am a php developer trying to do some work with python – winfred adrah Mar 30 '19 at 07:57
  • `1.` - Use `arr = np.array(np.random.dirichlet(np.ones(6), size=N))`,but if `N = 1000000` I am not sure if some values are duplicated. 2. It is not necessary, if multiple 2d arrays to 3d array axis2 is instaed loop N times, reason is for improve performance. 3. no problem, goodluck with python. :) – jezrael Mar 30 '19 at 08:02
  • unfortunayely when i use arr = np.array(np.random.dirichlet(np.ones(6), size=N)), the romdom numbers are not the numbers multiplying the data in df. so my weighted average is always wrong – winfred adrah Mar 30 '19 at 09:53
  • @winfredadrah - hmmm, is possible create some sample data with expected final output? – jezrael Mar 30 '19 at 09:54
  • 1
    Yeah, i can. I have only one problem now. anytime i use arr = np.array(np.random.dirichlet(np.ones(6), size=N)). print c prints out only 0.0 for all its output – winfred adrah Mar 30 '19 at 21:39
  • @winfredadrah - It seems dar=ta related problem, for me it working nice. – jezrael Mar 30 '19 at 22:17