Better way to add constant column to pandas data frame

Question

Currently when I have to add a constant column to an existing data frame, I do the following. To me it seems not all that elegant (the part where I multiply by length of dataframe). Wondering if there are better ways of doing this.

import pandas as pd

testdf = pd.DataFrame({'categories': ['bats', 'balls', 'paddles'],
                       'skus': [50, 5000, 32],
                       'sales': [500, 700, 90]})

testdf['avg_sales_per_sku'] = [testdf.sales.sum() / testdf.skus.sum()] * len(testdf)

score 19 · Accepted Answer · answered Mar 30 '15 at 01:54

19

You can fill the column implicitly by giving only one number.

testdf['avg_sales_per_sku'] = testdf.sales.sum() / testdf.skus.sum()

From the documentation:

When inserting a scalar value, it will naturally be propagated to fill the column

answered Mar 30 '15 at 01:54

Geeklhem

689
7
12

score 2 · Answer 2 · answered Mar 30 '15 at 02:10

It seems confusing to me to mix the categorical average with the aggregate average. You could also use:

testdf['avg_sales_per_sku'] = testdf.sales / testdf.skus
testdf['avg_agg_sales_per_agg_sku'] = testdf.sales.sum() / float(testdf.skus.sum())  # float is for Python2

>>> testdf
  categories  sales  skus  avg_sales_per_sku  avg_agg_sales_per_agg_sku
0       bats    500    50            10.0000                   0.253837
1      balls    700  5000             0.1400                   0.253837
2    paddles     90    32             2.8125                   0.253837

Better way to add constant column to pandas data frame

2 Answers2

Linked