1

I want to convert month wise(BP,RE) values into day wise(BP,RE) values by dividing month wise value by corresponding number of days in that month in a dataframe except for one category named Quality Parameters.I want to show day wise value for the category named Quality Parameters as the same value of month wise value(eg: if January month value is 45, I want to show day wise values from first of January to 31st of January as 45.Similarly for all months).

I used the below code for splitting month wise data into day wise.In this exception is not handled(category named Quality Parametes):

distillery_excel_data_upper = distillery_excel_data.apply(lambda x: x.astype(pd.StringDtype()).str.upper())
distillery_excel_data_upper['Date'] = pd.to_datetime(distillery_excel_data_upper['Year'] + distillery_excel_data_upper['Month'], format='%Y%B')
excel_date = pd.to_datetime(distillery_excel_data_upper['Date'])
daywise_excel_data = (distillery_excel_data_upper.assign(Daily=[pd.date_range(*x, freq='D') for x in zip(excel_date, excel_date.add(pd.offsets.MonthEnd()))]).explode("Daily"))
daywise_excel_data["BP"] = daywise_excel_data["BP"].apply(pd.to_numeric)
daywise_excel_data["RE"] = daywise_excel_data["RE"].apply(pd.to_numeric)

daywise_excel_data[['BP', 'RE']] = daywise_excel_data[['BP', 'RE']].div(
daywise_excel_data.groupby(["Date", "Category", "Parameter", "Region", "Factory", "UOM"])['Date'].transform("count"), axis=0)
daywise_excel_data = daywise_excel_data.drop(['Date', 'Year', 'Month'], axis=1)

daywise_excel_data.rename(columns={'Daily': 'date'}, inplace=True)

Above code is applicable for all categories.In this BP,RE,Category,Parameter,Region,Factory,UOM,Year and Month are dataframe columns.

Below shows the input dataframe:

    Year  Month   Region    Factory  Category         Parameter UOM    BP       RE
0   2023  April  Tamil Nadu   A     Production            ac     L    30.0     60.0
1   2023  April  Tamil Nadu   B     Quality Parameter     acc    L    45.0     20.0
2   2023  April  Karnataka    C     Quality Parameter      bc    L    23.0     50.0
3   2023  May    Karnataka    C     Quality Parameter      bc    L     0.0     43.0
4   2023  June   Karnataka    C     Quality Parameter      bc    L    10.0     56.0
5   2023  June   Karnataka    C     Production             bc    L    60.0     90.0

Desired output:

  Year  Month    Date     Region    Factory  Category         Parameter UOM      BP     RE
0 2023  April 2023-04-01 Tamil Nadu   A     Production            ac     L       1.0   2.0
0 2023  April 2023-04-02 Tamil Nadu   A     Production            ac     L       1.0   2.0
.. ..   ..       ..       ..          ..       ..     .. .. .. .. 
.. ..   ..       ..       ..          ..       ..     .. .. .. .. 
0 2023  April 2023-04-30 Tamil Nadu   A     Production            ac     L       1.0   2.0
1 2023  April 2023-04-01 Tamil Nadu   B     Quality Parameter     acc    L       45.0  20.0
1 2023  April 2023-04-02 Tamil Nadu   B     Quality Parameter     acc    L       45.0  20.0
.. ..   ..       ..       ..          ..       ..     .. .. .. .. 
.. ..   ..       ..       ..          ..       ..     .. .. .. ..

1 2023  April 2023-04-30 Tamil Nadu   B     Quality Parameter     acc    L       45.0  20.0
2 2023  April 2023-04-01 Karnataka    C     Quality Parameter     acc    L       23.0  50.0
2 2023  April 2023-04-02 Karnataka    C     Quality Parameter     acc    L       23.0  50.0
.. ..   ..       ..       ..          ..       ..     .. .. .. .. 
.. ..   ..       ..       ..          ..       ..     .. .. .. ..
2 2023  April 2023-04-30 Karnataka    C     Quality Parameter     acc    L       23.0  50.0
3 2023  May   2023-05-01 Karnataka    C     Quality Parameter     acc    L       0.0   43.0
3 2023  May   2023-05-02 Karnataka    C     Quality Parameter     acc    L       0.0   43.0
.. ..   ..       ..       ..          ..       ..     .. .. .. .. 
.. ..   ..       ..       ..          ..       ..     .. .. .. ..
3 2023  May   2023-05-31 Karnataka    C     Quality Parameter     acc    L       0.0   43.0
4 2023  June  2023-05-01 Karnataka    C     Quality Parameter     acc    L       10.0  56.0
4 2023  June  2023-05-02 Karnataka    C     Quality Parameter     acc    L       10.0  56.0
.. ..   ..       ..       ..          ..       ..     .. .. .. .. 
.. ..   ..       ..       ..          ..       ..     .. .. .. ..
4 2023  June  2023-05-30 Karnataka    C     Quality Parameter     acc    L       10.0  56.0
5 2023  June  2023-05-01 Karnataka    C     Production        acc    L       2.0    3.0
5 2023  June  2023-05-02 Karnataka    C     Production        acc    L       2.0    3.0
.. ..   ..       ..       ..          ..       ..     .. .. .. .. 
.. ..   ..       ..       ..          ..       ..     .. .. .. ..
5 2023  June  2023-05-30 Karnataka    C     Production        acc    L       2.0    3.0

Can anyone suggest a solution to achieve this result.

AbinBenny
  • 59
  • 6

1 Answers1

1

Use:

#create datetime column like 3rd column
df.insert(2, 'Date', pd.to_datetime(df['Year'].astype(str) + df['Month'], format='%Y%B'))

#get number of rows per month
df['days'] = df['Date'].dt.days_in_month

#repeat rows by months
df1 = df.loc[df.index.repeat(df['days'])]

#add counter to days
df1['Date'] += pd.to_timedelta(df1.groupby(level=0).cumcount(), 'd')

#create default index
df1 = df1.reset_index(drop=True)

#test if not equal Quality Parameter
m = df1['Category'].ne('Quality Parameter')

#divide number of days for all rows match condition
df1.loc[m, ['BP', 'RE']] = df1[['BP', 'RE']].div(df1.pop('days'), axis=0)

print (df1.head())
   Year  Month       Date      Region Factory    Category Parameter UOM   BP  \
0  2023  April 2023-04-01  Tamil Nadu       A  Production        ac   L  1.0   
1  2023  April 2023-04-02  Tamil Nadu       A  Production        ac   L  1.0   
2  2023  April 2023-04-03  Tamil Nadu       A  Production        ac   L  1.0   
3  2023  April 2023-04-04  Tamil Nadu       A  Production        ac   L  1.0   
4  2023  April 2023-04-05  Tamil Nadu       A  Production        ac   L  1.0   

    RE  
0  2.0  
1  2.0  
2  2.0  
3  2.0  

print (df1.iloc[50:56])
    Year  Month       Date      Region Factory           Category Parameter  \
50  2023  April 2023-04-21  Tamil Nadu       B  Quality Parameter       acc   
51  2023  April 2023-04-22  Tamil Nadu       B  Quality Parameter       acc   
52  2023  April 2023-04-23  Tamil Nadu       B  Quality Parameter       acc   
53  2023  April 2023-04-24  Tamil Nadu       B  Quality Parameter       acc   
54  2023  April 2023-04-25  Tamil Nadu       B  Quality Parameter       acc   
55  2023  April 2023-04-26  Tamil Nadu       B  Quality Parameter       acc   

   UOM    BP    RE  
50   L  45.0  20.0  
51   L  45.0  20.0  
52   L  45.0  20.0  
53   L  45.0  20.0  
54   L  45.0  20.0  
55   L  45.0  20.0  

EDIT: In your solution is possible use:

m = daywise_excel_data['Category'].ne('Quality Parameter')
daywise_excel_data.loc[m, ['BP', 'RE']] = daywise_excel_data[['BP', 'RE']]
                           .div(daywise_excel_data.df['Date'].dt.days_in_month, axis=0)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Can you modify my code to achieve the desired output.`distillery_excel_data`is the input dataframe.@jezrael – AbinBenny Apr 28 '23 at 07:10
  • @AbinBenny - Instead `df` use `distillery_excel_data` – jezrael Apr 28 '23 at 07:23
  • should i use the `m = daywise_excel_data['Category'].ne('Quality Parameter') daywise_excel_data.loc[m, ['BP', 'RE']] = daywise_excel_data[['BP', 'RE']] .div(daywise_excel_data.df['Date'].dt.days_in_month, axis=0)` below `daywise_excel_data[['BP', 'RE']] = daywise_excel_data[['BP', 'RE']].div( daywise_excel_data.groupby(["Date", "Category", "Parameter", "Region", "Factory", "UOM"])['Date'].transform("count"), axis=0)` in my code@jezrael – AbinBenny Apr 28 '23 at 07:28
  • can you add the updated code of my code.so it will easier to understand@jezrael – AbinBenny Apr 28 '23 at 07:31
  • is there any way to achieve the same by modifying my code@jezrael – AbinBenny Apr 28 '23 at 07:35
  • can you add it in mode at correct position and show it in answer section@jezrael – AbinBenny Apr 28 '23 at 07:47
  • @AbinBenny So if replace `daywise_excel_data[['BP', 'RE']] = daywise_excel_data[['BP', 'RE']].div( daywise_excel_data.groupby(["Date", "Category", "Parameter", "Region", "Factory", "UOM"])['Date'].transform("count"), axis=0) ` in your code by EDIT paragraph - `m = daywise_excel_data['Category'].ne('Quality Parameter') daywise_excel_data.loc[m, ['BP', 'RE']] = daywise_excel_data[['BP', 'RE']] .div(daywise_excel_data.df['Date'].dt.days_in_month, axis=0)` there is some problem for you? – jezrael Apr 28 '23 at 07:51
  • then how can i solve this issue?can you suggest a solution@jezrael – AbinBenny Apr 28 '23 at 07:53
  • @AbinBenny - OK, first solution in my answer not working? Or there is another issue? – jezrael Apr 28 '23 at 07:54
  • Its working.I need to convert all the values in the dataframe to uppercase before`df1 = df1.reset_index(drop=True)` and convert all the dataframe columns to string type at the last@jezrael – AbinBenny Apr 28 '23 at 08:00
  • 1
    @AbinBenny - so need `cols = df1.select_dtypes(object).columns` and then `df1[cols] = df1[cols].apply(lambda x: x.str.upper())` ? – jezrael Apr 28 '23 at 08:02