1

I have a data frame that looks like below.

unique_key   #rows_needed  avalue
a                 2          1000
b                 1          1000
c                 5          2500
d                18          10000

I am trying to create a dateframe that looks like this based on the number of rows_needed and calculates the avg(avalue) depending on the rows_needed column.

unique_key   #rows_needed   avg(avalue)
a                 2          500
a                 2          500
b                 1          1000
c                 5          500
c                 5          500
c                 5          500
c                 5          500
c                 5          500
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556
d                18          555.556

I took a look at the melt function in pandas, but I do not believe that will work. Any help is appreciated.

Bigmoose70
  • 453
  • 6
  • 15
  • crap I am editing the post. – Bigmoose70 Nov 18 '20 at 22:07
  • How about df['avg'] = df['avalue']/df['rowsneeded'] ? – Ajay Nov 18 '20 at 22:08
  • @Ajay I edited the post – Bigmoose70 Nov 18 '20 at 22:14
  • How about making a list of number like: 5 -> [5,5,5,5,5]? https://stackoverflow.com/questions/14059094/i-want-to-multiply-two-columns-in-a-pandas-dataframe-and-add-the-result-into-a-n (any lambda will work) and then you can unnest this https://stackoverflow.com/questions/48197234/unnest-explode-a-pandas-series and finally apply the division? – Ajay Nov 18 '20 at 22:45

1 Answers1

2

Update:

The explode method is a simple way to do this:

import numpy as np
import pandas as pd
df = pd.DataFrame({'key': ['a', 'b'], 'rowsneeded': [3, 4], 'val': [100,600]})
df['eachval'] = df.val/df.rowsneeded
df['key'] = [[k] * r for k, r in zip(df.key, df.rowsneeded)] 
df = df.explode('key')

Original Answer:

You can make another table with the repeated values and merge it to the original:

import numpy as np
import pandas as pd
df = pd.DataFrame({'key': ['a', 'b'], 'rowsneeded': [3, 4], 'val': [100,600]})

repeated_vals = [[k] * r for k, r in zip(df.key, df.rowsneeded)] 
flat_list = [item for sublist in repeated_vals  for item in sublist]
df['eachval'] = df.val/df.rowsneeded
df = pd.DataFrame(flat_list, columns=['key']).merge(df, how='left', on='key')
teepee
  • 2,620
  • 2
  • 22
  • 47
  • Hmm if the key is more than one character ['aa','bb'] this does not work. Its in the right direction and I appreciate it :) – Bigmoose70 Nov 18 '20 at 22:59
  • repeated_vals = [[s] * n for s, n in zip(df2.key.values, df2.rowsneeded.values)] this is what I used to overcome the obstacle I ran into. – Bigmoose70 Nov 19 '20 at 00:46
  • I was just about to say I fixed my answer to do that same thing; see above. – teepee Nov 19 '20 at 00:48
  • I actually learned an even better technique using explode that I would recommend instead. See my edits above. – teepee Nov 19 '20 at 18:32