-1

i have a small python problem i’m trying to solve;

so i have a dataframe in this format;

date, product, store, sales_amt
1/1/2019, A, A, 200
1/1/2019,A,B,120
1/2/2019, A, A, 75
1/3/2019,A,A,69
1/3/2019,A,B,23
----
----
----
1/31/2019,A,B,49

i have a scenario in which i have a dataframe which contains 4 columns ( date, product, store and sales_amt)

the dates are supposed to span a whole month ( eg in this case, january 2019) but there are some missing days in the dataframe.

does anyone have any tips on python code that can loop through the dates for a particular month and add a new row to the dataframe with the missing date, product/store combination and a sales_amt of zero?

For example, there is no entry for the product/store combination of A/B on 1/2/2019

Goal at the end is to have an entry for every day of that month for every product/store combination.

how best can i do this? Looks like something for resample but not really sure.

Any ideas on how to go about this would be really appreciated.

thanks

Prune
  • 76,765
  • 14
  • 60
  • 81
femi
  • 974
  • 2
  • 13
  • 44
  • 1
    Welcome to StackOverflow. [On topic](https://stackoverflow.com/help/on-topic), [how to ask](https://stackoverflow.com/help/how-to-ask), and ... [the perfect question](https://codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-question/) apply here. StackOverflow is a knowledge base for *specific* programming problems -- not a coding or tutorial resource. – Prune Nov 19 '19 at 18:57
  • There are many posted questions on SO, and many other tutorial sites that show you how to compare consecutive values in a chosen column. Use those. When you've made an honest attempt at a solution and get stuck, *then* you have a reasonable question for this site. – Prune Nov 19 '19 at 18:58
  • Related: https://stackoverflow.com/questions/19324453/add-missing-dates-to-pandas-dataframe – masotann Nov 19 '19 at 18:59
  • Thanks @Nagia, a slight difference between this and my question is that i have an additional store/product combination to group on. i think thats the key thing i 'm really struggling with. – femi Nov 20 '19 at 22:10

1 Answers1

0

One way to do this is create a dataframe with all the necessary row combinations, and then use merge with your partially filled dataframe.

# initialize a dataframe with all the rows, with 0 values
dates = pd.date_range('01-01-2019', '01-31-2019')
products = ['a','b']
stores = ['a','b']

df_default = pd.DataFrame(list(itertools.product(dates, products, stores)))

#changing to string type so can join with second dataframe easier
df_default[0] = df_default.astype(str)
df_default[3] = 0
print(df_default)
             0  1  2  3
0   2019-01-01  a  a  0
1   2019-01-01  a  b  0
2   2019-01-01  b  a  0
3   2019-01-01  b  b  0
4   2019-01-02  a  a  0

# make a partially filled dataframe
df = pd.DataFrame([('2019-01-01', 'a','a', 1),
               ('2019-01-01', 'b', 'a',2)])

# use merge to get the values of df into df_default
print(pd.merge(df_default,s, how='left', on=[0,1,2]).drop('3_x',axis=1).fillna(0).head(5))

            0  1  2  3_y
0  2019-01-01  a  a  1.0
1  2019-01-01  a  b  0.0
2  2019-01-01  b  a  2.0
3  2019-01-01  b  b  0.0
4  2019-01-02  a  a  0.0

hope that helps

masotann
  • 901
  • 10
  • 29