Dynamic pandas dataframe generation

Question

Here is code I wrote to generate a dataframe that contains 4 columns

num_rows = 10

df = pd.DataFrame({ 'id_col' : [x+1 for x in range(num_rows)] , 'c1': [randint(0, 9) for x in range(num_rows)], 'c2': [randint(0, 9) for x in range(num_rows)], 'c3': [randint(0, 9) for x in range(num_rows)] })
df

print(df) renders :

id_col  c1  c2  c3
0       1   3   1   5
1       2   0   2   4
2       3   1   2   5
3       4   0   5   6
4       5   0   0   1
5       6   6   5   8
6       7   1   6   8
7       8   5   8   8
8       9   1   5   2
9      10   2   9   2

I've set the number or rows to be dynamically generated via the num_rows variable.

How to dynamically generate 1000 columns where each column is prepended by 'c'. So columns c1,c2,c3....c1000 are generated where each columns contains 10 rows ?

jezrael · Accepted Answer · 2018-07-27T13:24:30.957

For better performance I suggest use for create DataFrame numpy function numpy.random.randint and then change columns names by list comprehension, for new column by position use DataFrame.insert:

np.random.seed(458)

N = 15
M = 10
df = pd.DataFrame(np.random.randint(10, size=(M, N)))
df.columns = ['c{}'.format(x+1) for x in df.columns]
df.insert(0, 'idcol', np.arange(M))

print (df)
   idcol  c1  c2  c3  c4  c5  c6  c7  c8  c9  c10  c11  c12  c13  c14  c15
0      0   8   2   1   6   2   1   0   9   7    8    0    5    5    6    0
1      1   0   2   5   0   0   2   5   2   9    2    1    0    0    5    0
2      2   5   1   3   5   4   5   3   0   2    1    7    8    9    5    4
3      3   8   7   7   0   1   3   6   7   5    8    8    9    8    5    5
4      4   2   8   1   7   3   7   4   6   0    7    0    9    4    0    4
5      5   9   2   1   6   1   9   5   6   7    4    6    1    7    3    7
6      6   1   9   3   9   7   7   2   7   9    8    2    7    2    5    5
7      7   7   6   6   6   4   2   9   0   6    5    7    0    0    4    9
8      8   6   4   2   1   3   1   7   0   4    3    0    5    4    7    7
9      9   1   3   5   7   2   2   1   5   6    1    9    5    9    6    3

Another solution with numpy.hstack for stack first id column to 2d array:

np.random.seed(458)

arr = np.hstack([np.arange(M)[:, None], np.random.randint(10, size=(M, N))])
df = pd.DataFrame(arr)
df.columns = ['idcol'] + ['c{}'.format(x) for x in df.columns[1:]]
print (df)
   idcol  c1  c2  c3  c4  c5  c6  c7  c8  c9  c10  c11  c12  c13  c14  c15
0      0   8   2   1   6   2   1   0   9   7    8    0    5    5    6    0
1      1   0   2   5   0   0   2   5   2   9    2    1    0    0    5    0
2      2   5   1   3   5   4   5   3   0   2    1    7    8    9    5    4
3      3   8   7   7   0   1   3   6   7   5    8    8    9    8    5    5
4      4   2   8   1   7   3   7   4   6   0    7    0    9    4    0    4
5      5   9   2   1   6   1   9   5   6   7    4    6    1    7    3    7
6      6   1   9   3   9   7   7   2   7   9    8    2    7    2    5    5
7      7   7   6   6   6   4   2   9   0   6    5    7    0    0    4    9
8      8   6   4   2   1   3   1   7   0   4    3    0    5    4    7    7
9      9   1   3   5   7   2   2   1   5   6    1    9    5    9    6    3

@pyd sure, it is usefull if want same output of np.random, it is used for testing. But if remove it get also different values, better in real code. You can also check [this](https://stackoverflow.com/q/21494489/2901002) — jezrael, Jul 27 '18 at 17:56

score 1 · Answer 2 · answered Jul 27 '18 at 12:59

IIUC, use str.format and dict comprehension

num_rows = 10
num_cols = 15

df = pd.DataFrame({ 'c{}'.format(n):  [randint(0, 9) for x in range(num_rows)] for n in range(num_cols)}, 
                  index=[x+1 for x in range(num_rows)] , )
    c0  c1  c2  c3  c4  c5  c6  c7  c8  c9
1   1   6   2   1   3   1   8   8   2   0
2   2   6   2   2   5   7   4   1   6   2
3   1   2   6   8   7   5   5   7   2   2
4   5   5   3   3   4   7   8   1   8   6
5   7   2   8   6   5   6   2   0   0   4
6   8   2   4   4   6   3   0   1   0   2
7   5   6   8   5   1   0   4   8   4   7
8   1   5   4   5   2   4   4   6   2   7
9   5   7   7   8   5   0   2   7   3   2
10  4   8   5   3   3   7   5   1   5   1

score 0 · Answer 3 · answered Jul 27 '18 at 13:05

You can use the np.random.randint to create a full array of random values, f-strings (Python 3.6+) with a list comprehension for column naming, and pd.DataFrame.assign with np.arange for defining "id_col":

import pandas as pd, numpy as np

rows = 10
cols = 5
minval, maxval = 0, 10

df = pd.DataFrame(np.random.randint(minval, maxval, (rows, cols)),
                  columns=[f'c{i}' for i in range(1, cols+1)])\
       .assign(id_col=np.arange(1, num_rows+1))

print(df)

   c1  c2  c3  c4  c5  id_col
0   8   4   6   0   8       1
1   8   3   5   9   0       2
2   1   3   3   6   2       3
3   6   4   1   1   7       4
4   3   7   0   9   5       5
5   4   6   8   8   6       6
6   0   3   9   9   7       7
7   0   6   1   2   4       8
8   3   7   1   2   0       9
9   6   6   0   5   8      10

Dynamic pandas dataframe generation

3 Answers3