Create Panda Df from numpy array

Question

I am running a np.random.choice like the one below.

record = np.random.choice(data, size=6, p=prob)
        maxv = max(record)
        minv = min(record)
        val = record

From this I am finding the min and the max. I want to join this to an pandas dataframe. Below is my desired output:

Min,Max,value
1,5,2
1,5,3
1,5,3
1,5,5
1,5,1
1,5,3

This is an example of the output I would like from one simulation. Keep in mind I am performing this simulation many times so I would like to continuously be able to add onto the dataframe that is created. Each simulation will have its own min and max respectively. I also would like to keep the min and max in the output (why 1 and 5 are in the example output).

How to create the desired output above from the example code in a pandas dataframe. — user3609179, Jul 22 '15 at 01:51
basically how to create a dataframe. with the constant min and the max in the first two columns but then the other values in the third column — user3609179, Jul 22 '15 at 01:53

score 1 · Accepted Answer · answered Jul 22 '15 at 08:14

I'd create the df with the initial data column 'Val' and then just add the new columns in a one liner:

In [242]:
df = pd.DataFrame({'Val':np.random.randint(1,6,6)})
df['Min'], df['Max'] = df['Val'].min(), df['Val'].max()
df

Out[242]:
   Val  Min  Max
0    4    2    5
1    5    2    5
2    5    2    5
3    4    2    5
4    5    2    5
5    2    2    5

score 0 · Answer 2 · answered Jul 22 '15 at 02:12

0

This is how I solve it:

record = np.random.choice(data, size=6, p=prob)
maxv = [max(record)] * len(record)
minv = [min(record)] * len(record)

new_data = zip(minv, maxv, record)

df = DataFrame(new_data, columns=['Min', 'Max', 'val'])

answered Jul 22 '15 at 02:12

jxu

50
4

Sorry for the late response but if I have the np.random.choice within a loop to produce a bunch of outputs how can I append them all to one dataframe? – user3609179 Jul 22 '15 at 16:45
if you get a chance please look at how I can append this from a loop – user3609179 Jul 22 '15 at 16:58
I don't quite get your problem here. But if produce multiple np.random.choice, you can use np.concatenate to concatenate the result first. However, in that case, I think EdCum version will be much better. – jxu Jul 22 '15 at 18:10

Parfait · Answer 3 · 2015-07-22T03:26:33.917

0

Simply iterate through simulation and append values into dataframe:

# CREATE DATA FRAME STRUCTURE
df = pd.DataFrame(columns=['Min', 'Max', 'val'])

# RUN SIMULATION IN LOOP ITERATION
record = np.random.choice(data, size=6, p=prob)

for i in range(len(record)):
    maxv = np.max(record)
    minv = np.min(record)
    val = record[i]   

    # APPEND ROW
    df.loc[len(df)] = [maxv, minv, val]

edited Jul 22 '15 at 03:26

answered Jul 22 '15 at 02:27

Parfait

104,375
17
94
125

1

I believe that is an inefficient approach, though a common one. DataFrames, like arrays, occupy contiguous memory and it is very expensive to append to them. It's always better to append to a list (which is designed for that) and convert to a dataframe at the end. Also, you don't need the 0 in range, and you should use vectorized np.max and np.min on the whole record instead of individually on the rows. Just my two cents. – cxrodgers Jul 22 '15 at 02:39
Excellent points @cxrodgers! Indeed, dataframes are intended to load at once and not appended. Only until recently did pandas allow the `df.loc[i]` as a row append. And this [SO post](http://stackoverflow.com/questions/10715965/add-one-row-in-a-pandas-dataframe) shows the popularity of the row append. Plus, the OP mentioned running simulations many times. Feel free to downvote, but you'll get the upvote. – Parfait Jul 22 '15 at 03:26

Create Panda Df from numpy array

3 Answers3