0

I am trying to fill missing values for group (area code,shop_name,item_name,date,sales_amount). For each group, I need to fill the sales_amount with 52 weeks of data. I need to create two different columns with ffill(df.ffill()) and bfill (df.bfill()) and then I need to sum(the newly created columns with ffill & bfill/2 to obtain my result.

area_code   shop_name   item_name   week_date   sales_amount
101 Global Market   Mango Fruits    6/3/2018    5.13
101 Global Market   Mango Fruits    6/10/2018   nan
101 Global Market   Mango Fruits    6/17/2018   7.13
101 Global Market   Chips   6/3/2018    5
101 Global Market   Chips   6/10/2018   nan
102 Global Market   Mango Fruits    6/3/2018    10.34
102 Global Market   Mango Fruits    6/10/2018   nan
102 Global Market   Chips   6/10/2018   nan
102 Global Market   Chips   6/17/2018   nan
102 Global Market   Chips   6/24/2018   nan
102 Global Market   Potato  6/24/2018   nan


After

area_code   shop_name   item_name   week_date   sales_amount
101 Global Market   Mango Fruits    6/3/2018    5.13
101 Global Market   Mango Fruits    6/10/2018   6.13
101 Global Market   Mango Fruits    6/17/2018   7.13
101 Global Market   Chips   6/3/2018    5
101 Global Market   Chips   6/10/2018   5
102 Global Market   Mango Fruits    6/3/2018    10.34
102 Global Market   Mango Fruits    6/10/2018   10.34
102 Global Market   Chips   6/10/2018   Value available before this week for this group
102 Global Market   Chips   6/17/2018   Value available before this week for this group
102 Global Market   Chips   6/24/2018   Value available before this week for this group
102 Global Market   Potato  6/24/2018   Value available before this week for this group
For example - 
Week 1 10
Week 2 nan
week 3 nan

"Value available before this week for this group" means that week 3, week 2 will have same value as week 1. Otherwise, If week 1 and 3 has some data , then fill week 2 according to ffill or bfill. if it doesn like in this, then simply fill the value by ffill or bfill for each group.

  1. How to iterate over dataframe?
  2. How to iterate over each groups and fill the value?

I tried using but didnt get any luck

My week data that needs to be fill starts with 6/3/2018 and ends at week 6/3/2019

Pandas: filling missing values by mean in each group

Code run
  • 165
  • 9
  • I am not clear with your expected output, could you please elaborate on the line `102 Global Market Chips 6/10/2018 Value available before this week for this group` and the following rows? I dont see any data for the group at all.. so it should be `null`? – anky Jul 28 '20 at 16:27
  • I have updated the question, please check @anky – Code run Jul 28 '20 at 17:06
  • Yes, if data is not available then null – Code run Jul 28 '20 at 17:07
  • Are you looking for an answer in Spark/PySpark? Or you want this is pandas? Then a Google search would help you in that case – dsk Jul 28 '20 at 17:33
  • Both ways will answer the question., I was just looking for good links and increase me understanding in this – Code run Jul 28 '20 at 17:56

0 Answers0