Create N number of rows depeding on INT in another column

Question

I have a data frame that looks like below.

unique_key   #rows_needed  avalue
a                 2          1000
b                 1          1000
c                 5          2500
d                18          10000

I am trying to create a dateframe that looks like this based on the number of rows_needed and calculates the avg(avalue) depending on the rows_needed column.

unique_key   #rows_needed   avg(avalue)
a                 2          500
a                 2          500
b                 1          1000
c                 5          500
c                 5          500
c                 5          500
c                 5          500
c                 5          500
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556

I took a look at the melt function in pandas, but I do not believe that will work. Any help is appreciated.

How about making a list of number like: 5 -> [5,5,5,5,5]? https://stackoverflow.com/questions/14059094/i-want-to-multiply-two-columns-in-a-pandas-dataframe-and-add-the-result-into-a-n (any lambda will work) and then you can unnest this https://stackoverflow.com/questions/48197234/unnest-explode-a-pandas-series and finally apply the division? — Ajay, Nov 18 '20 at 22:45

teepee · Accepted Answer · 2020-11-19T18:34:07.907

2

Update:

The explode method is a simple way to do this:

import numpy as np
import pandas as pd
df = pd.DataFrame({'key': ['a', 'b'], 'rowsneeded': [3, 4], 'val': [100,600]})
df['eachval'] = df.val/df.rowsneeded
df['key'] = [[k] * r for k, r in zip(df.key, df.rowsneeded)] 
df = df.explode('key')

Original Answer:

You can make another table with the repeated values and merge it to the original:

import numpy as np
import pandas as pd
df = pd.DataFrame({'key': ['a', 'b'], 'rowsneeded': [3, 4], 'val': [100,600]})

repeated_vals = [[k] * r for k, r in zip(df.key, df.rowsneeded)] 
flat_list = [item for sublist in repeated_vals  for item in sublist]
df['eachval'] = df.val/df.rowsneeded
df = pd.DataFrame(flat_list, columns=['key']).merge(df, how='left', on='key')

edited Nov 19 '20 at 18:34

answered Nov 18 '20 at 22:29

teepee

2,620
2
22
47

Hmm if the key is more than one character ['aa','bb'] this does not work. Its in the right direction and I appreciate it :) – Bigmoose70 Nov 18 '20 at 22:59
repeated_vals = [[s] * n for s, n in zip(df2.key.values, df2.rowsneeded.values)] this is what I used to overcome the obstacle I ran into. – Bigmoose70 Nov 19 '20 at 00:46
I was just about to say I fixed my answer to do that same thing; see above. – teepee Nov 19 '20 at 00:48
I actually learned an even better technique using explode that I would recommend instead. See my edits above. – teepee Nov 19 '20 at 18:32

Create N number of rows depeding on INT in another column

1 Answers1