1

I am running a np.random.choice like the one below.

record = np.random.choice(data, size=6, p=prob)
        maxv = max(record)
        minv = min(record)
        val = record

From this I am finding the min and the max. I want to join this to an pandas dataframe. Below is my desired output:

Min,Max,value
1,5,2
1,5,3
1,5,3
1,5,5
1,5,1
1,5,3

This is an example of the output I would like from one simulation. Keep in mind I am performing this simulation many times so I would like to continuously be able to add onto the dataframe that is created. Each simulation will have its own min and max respectively. I also would like to keep the min and max in the output (why 1 and 5 are in the example output).

user3609179
  • 301
  • 8
  • 20

3 Answers3

1

I'd create the df with the initial data column 'Val' and then just add the new columns in a one liner:

In [242]:
df = pd.DataFrame({'Val':np.random.randint(1,6,6)})
df['Min'], df['Max'] = df['Val'].min(), df['Val'].max()
df

Out[242]:
   Val  Min  Max
0    4    2    5
1    5    2    5
2    5    2    5
3    4    2    5
4    5    2    5
5    2    2    5
EdChum
  • 376,765
  • 198
  • 813
  • 562
0

This is how I solve it:

record = np.random.choice(data, size=6, p=prob)
maxv = [max(record)] * len(record)
minv = [min(record)] * len(record)

new_data = zip(minv, maxv, record)

df = DataFrame(new_data, columns=['Min', 'Max', 'val'])
jxu
  • 50
  • 4
  • Sorry for the late response but if I have the np.random.choice within a loop to produce a bunch of outputs how can I append them all to one dataframe? – user3609179 Jul 22 '15 at 16:45
  • if you get a chance please look at how I can append this from a loop – user3609179 Jul 22 '15 at 16:58
  • I don't quite get your problem here. But if produce multiple np.random.choice, you can use np.concatenate to concatenate the result first. However, in that case, I think EdCum version will be much better. – jxu Jul 22 '15 at 18:10
0

Simply iterate through simulation and append values into dataframe:

# CREATE DATA FRAME STRUCTURE
df = pd.DataFrame(columns=['Min', 'Max', 'val'])

# RUN SIMULATION IN LOOP ITERATION
record = np.random.choice(data, size=6, p=prob)

for i in range(len(record)):
    maxv = np.max(record)
    minv = np.min(record)
    val = record[i]   

    # APPEND ROW
    df.loc[len(df)] = [maxv, minv, val]
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • 1
    I believe that is an inefficient approach, though a common one. DataFrames, like arrays, occupy contiguous memory and it is very expensive to append to them. It's always better to append to a list (which is designed for that) and convert to a dataframe at the end. Also, you don't need the 0 in range, and you should use vectorized np.max and np.min on the whole record instead of individually on the rows. Just my two cents. – cxrodgers Jul 22 '15 at 02:39
  • Excellent points @cxrodgers! Indeed, dataframes are intended to load at once and not appended. Only until recently did pandas allow the `df.loc[i]` as a row append. And this [SO post](http://stackoverflow.com/questions/10715965/add-one-row-in-a-pandas-dataframe) shows the popularity of the row append. Plus, the OP mentioned running simulations many times. Feel free to downvote, but you'll get the upvote. – Parfait Jul 22 '15 at 03:26