Adding specific days in python table

Question

I have a dataset (Product_ID,date_time, Sold) which has products sold on various dates. The dates are not consistent and are given for 9 months with random 13 days or more from a month. I have to segregate the data in a such a way that the for each product how many products were sold on 1-3 given days, 4-7 given days, 8-15 given days and >16 given days. . So how can I code this in python using pandas and other packages

PRODUCT_ID DATE_LOCATION Sold 0E4234 01-08-16 0:00 2 0E4234 02-08-16 0:00 7 0E4234 04-08-16 0:00 3 0E4234 08-08-16 0:00 1 0E4234 09-08-16 0:00 2 . . (same product for 9 months sold data) . 0G2342 02-08-16 0:00 1 0G2342 03-08-16 0:00 2 0G2342 06-08-16 0:00 1 0G2342 09-08-16 0:00 1 0G2342 11-08-16 0:00 3 0G2342 15-08-16 0:00 3 . . .(goes for 64 products each with 9 months of data) .

I don't know even how to code for this in python The output needed is

PRODUCT_ID      Days   Sold
0E4234          1-3      9
                4-7      3
                8-15     16
                 >16     (remaing values sum)
0G2342          1-3      3
                4-7      1
                8-15     7
                 >16    (remaing values sum)
.
.(for 64 products)
.

Would be happy if at least someone posted a link to where to start

That's an entire program that needs to be written, and SO is not a code writing service. You should take some python / pandas tutorials, test some stuff out, and ask questions here where you encounter SPECIFIC problems — Ofer Sadan, May 31 '18 at 05:53
How are the dates inconsistent? The rows you showed seem to all have the same format... — Samuel Dion-Girardeau, May 31 '18 at 06:01

jezrael · Accepted Answer · 2018-05-31T07:42:48.763

2

You can first convert dates to dtetimes and get days by dt.day:

df['DATE_LOCATION'] = pd.to_datetime(df['DATE_LOCATION'], dayfirst=True)
days = df['DATE_LOCATION'].dt.day

Then binning by cut:

rng = pd.cut(days, bins=[0,3,7,15,31], labels=['1-3', '4-7','8-15', '>=16'])
print (rng)
0      1-3
1      1-3
2      4-7
3     8-15
4     8-15
5      1-3
6      1-3
7      4-7
8     8-15
9     8-15
10    8-15
Name: DATE_LOCATION, dtype: category
Categories (4, object): [1-3 < 4-7 < 8-15 < >=16]

And aggregate sum by product and binned Series:

df = df.groupby(["PRODUCT_ID",rng])['Sold'].sum()
print (df)
PRODUCT_ID  DATE_LOCATION
0E4234      1-3              9
            4-7              3
            8-15             3
0G2342      1-3              3
            4-7              1
            8-15             7
Name: Sold, dtype: int64

If need also count per years:

df = df.groupby([df['DATE_LOCATION'].dt.year.rename('YEAR'), "PRODUCT_ID",rng])['Sold'].sum()
print (df)

YEAR  PRODUCT_ID  DATE_LOCATION
2016  0E4234      1-3              9
                  4-7              3
                  8-15             3
      0G2342      1-3              3
                  4-7              1
                  8-15             7
Name: Sold, dtype: int64

edited May 31 '18 at 07:42

answered May 31 '18 at 06:17

jezrael

822,522
95
1,334
1,252

I am getting a different output does the date starting from August of 2016 and ending at May of 2017 affect it since it involves 2 yrs? – san May 31 '18 at 07:25
@san - Can you explain more? I working only with data from your question, with real data get different output. – jezrael May 31 '18 at 07:31
If need alsou get output per years, added tested solution to output. – jezrael May 31 '18 at 07:45
Is there any way to add one more subdivision "Mean" above 1-3 days? – san May 31 '18 at 07:48
@san - Do you think all groups? Then change `['Sold'].sum()` to `['Sold'].agg(['sum','mean'])` – jezrael May 31 '18 at 07:49
No like the overall mean of the number of products sold for that product like every day avg of x number of products where sold – san May 31 '18 at 08:08
@san - Can you add some data sample and excpected output? I dont understand it, sorry. – jezrael May 31 '18 at 08:10
pd.pivot_table(df4,index=["PRODUCT_ID"],values=["Sold"],aggfunc=[np.mean] (gives the avg value of no of products sold for each product)i. I need this value to be placed above the "1-3" row of each product so first row is avg the second row is 1-3 and so on – san May 31 '18 at 08:44
@san - I think [this](https://stackoverflow.com/a/39966738) answer is what need, only change `sum` to `mean` – jezrael May 31 '18 at 08:48
that gives the mean value for 1-3,4-7,8-15,>16 separately Is there any way I could get the mean value for all the value of a single product(1-3+4-7+8-15+>16 ) – san May 31 '18 at 08:55
So need `df = df.mean(level=1)` or `df = df.mean(level=2)` if `year`s? – jezrael May 31 '18 at 08:58

score 0 · Answer 2 · answered May 31 '18 at 06:11

Assume your dataframe named df.

df["DATE_LOCATION"] = pd.to_datetime(df.DATE_LOCATION)
df["DAY"] = df.DATE_LOCATION.dt.day

def flag(x):
    if 1<=x<=3:
        return '1-3'
    elif 4<=x<=7:
        return '4-7'
    elif 8<=x<=15:
        return '8-15'
    else:
        return '>16' # maybe you mean '>=16'.

df["Days"] = df.DAY.apply(flag)

df.groupby(["PRODUCT_ID","Days"]).Sold.sum()

Adding specific days in python table

2 Answers2

Linked