1

I want to distribute the numbers preset in the list in whole month

a) Given a Holiday list, I want to dynamically assign '1' on the holiday date and '0' for working day .

eg.

Holiday_List = ['2020-01-01','2020-01-05','2020-01-12','2020-01-19','2020-01-26']
Start_date = datetime.datetime(year=2020, month =1 , day=1)
end_date = datetime.datetime(year =2020,month =1,day=28 )

Below is the outpput I am looking for in dataframe,where 'Date' and 'Holiday' are columns.

Date        Holiday
01-01-2020  1
02-01-2020  0
03-01-2020  0
04-01-2020  0
05-01-2020  1
06-01-2020  0
07-01-2020  0
08-01-2020  0
09-01-2020  0
10-01-2020  0
11-01-2020  0
12-01-2020  1
13-01-2020  0
14-01-2020  0
15-01-2020  0
16-01-2020  0
17-01-2020  0
18-01-2020  0
19-01-2020  1
20-01-2020  0
21-01-2020  0
22-01-2020  0
23-01-2020  0
24-01-2020  0
25-01-2020  0
26-01-2020  1
27-01-2020  0
28-01-2020  0

B) Given a list of nos like [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18].. I want to break into 3 equal part and store it in 3 different list.

a=[1,2,3,4,5,6],b=[7,8,9,10,11,12], c=[13,14,15,16,17,18]..
sequence should be there like first 6 element in a, sec in 'b' and 3rd in 'c'

C) I want to distribute the above lists a,b,c in whole months such that gap between 1 element of a,b and c should be 8 days only..similarly for others nos. and there is one constraint I cannot assign any no. of holiday.

Below is the final output I am looking for, where list values are assign in column "Values" and Here I have assigning dummy value 'NW' to have gap of 8 days between every list.

Date       Holiday  Values
01-01-2020  1       Holiday
02-01-2020  0          1
03-01-2020  0          2
04-01-2020  0          3
05-01-2020  1        Holiday
06-01-2020  0         4
07-01-2020  0         5
08-01-2020  0         6
09-01-2020  0        NW
10-01-2020  0        NW
11-01-2020  0         7
12-01-2020  1      Holiday
13-01-2020  0        8
14-01-2020  0        9
15-01-2020  0        10
16-01-2020  0        11
17-01-2020  0        12
18-01-2020  0        NW
19-01-2020  1     Holiday
20-01-2020  0       13
21-01-2020  0       14
22-01-2020  0       15
23-01-2020  0       16
24-01-2020  0       17
25-01-2020  0       18
26-01-2020  1     Holiday
27-01-2020  0       NW
28-01-2020  0       NW
jaco0646
  • 15,303
  • 7
  • 59
  • 83
  • 3
    Where is the code you are getting stuck on? – Tim Stack Apr 22 '20 at 09:17
  • 3
    What you've tried so far? – Avishka Dambawinna Apr 22 '20 at 09:18
  • I am new to python..have not done much work on ..so ,cant think of ligib for this problem..Kinldy help me – Balbinder Singh Kohtra Apr 22 '20 at 09:48
  • (1) create column with zeros in all cells (it should be easy for you) and later use `Holiday_List` to put `1` in some cells. You may have to use module `datatime` to convert strings in `Holiday_List` into `datetime` objects. – furas Apr 22 '20 at 09:52
  • use [pandas.date_range()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html) to generate dates – furas Apr 22 '20 at 09:54
  • You have to try it yourself first else you will not learn anything by copy-pasting. Just do some research on the matter on google. [DateTime](https://www.pythonforbeginners.com/basics/python-datetime-time-examples) [timedelta](https://www.guru99.com/date-time-and-datetime-classes-in-python.html) refer those and try to print the dates for given range of date. [Similar question](https://stackoverflow.com/questions/7274267/print-all-day-dates-between-two-dates/7274316) give it a try first and share it with your question. then I think everyone will help your to solve your problems. Cheers mate! – Avishka Dambawinna Apr 22 '20 at 10:00
  • (2) `d = [1,2,3 ...]` , `l = len(d)//3`, `print(d[ 0*l : 1*l ], d[ 1*l : 2*l ], d[ 2*l : 3*l ])`. But maybe there is method for this in module `collections` or `functools` or `itertools` – furas Apr 22 '20 at 10:02
  • (3) first create column with `NW` in all cells. Next use column `Holiday` to get only rows with `1` and assign `"Holiday"`. And similar way you can use column `Holiday` to get rows with `0` and then you can easier assign values (and skip holidays) – furas Apr 22 '20 at 10:08
  • Hint: `strftime()` method [here](https://www.tutorialspoint.com/python/time_strftime.htm) – Avishka Dambawinna Apr 22 '20 at 10:10
  • 1
    for array splitting part, you can use `numpy.split()` [refer](https://www.tutorialspoint.com/numpy/numpy_split.htm) or you can build your own algorithm from scratch – Avishka Dambawinna Apr 22 '20 at 10:31
  • Thanks Avishka..for the help..Can u help me with teh part c please..I am stcuk there – Balbinder Singh Kohtra Apr 22 '20 at 11:08
  • @Balbinder Get a count for each iteration, till `count < 8` have to print elements of the(ex: a) list, if the length of that list is less than 8 you have to print that dummy value to fill that remaining... – Avishka Dambawinna Apr 22 '20 at 12:22
  • @BalbinderSinghKohtra The answer furas provided is the correct one. Mins has an issue with Part C 8 day gap. So please accept that answer as the correct one – Avishka Dambawinna Apr 25 '20 at 11:32
  • It's solved. But, please consider furas answer as accepted one – Avishka Dambawinna Apr 25 '20 at 13:02

2 Answers2

1

A) You can use date_range to create column with dates

df = pd.DataFrame()

df['Date'] = pd.date_range(start_date, end_date)

Next you can create column Holiday with zeros in all cells

df['Holiday'] = 0

And next you can replace some values

for item in holiday_list:
    item = datetime.datetime.strptime(item, '%Y-%m-%d')
    df['Holiday'][ df['Date'] == item ] = 1

but maybe this part could be simpler using isin()

mask = df['Date'].dt.strftime('%Y-%m-%d').isin(holiday_list)

df['Holiday'][mask] = 1

or using numpy.where()

import numpy as np

mask = df['Date'].dt.strftime('%Y-%m-%d').isin(holiday_list)

df['Holiday'] = np.where(mask, 1, 0)

or simply keep it as True/False instead of 1/0

df['Holiday'] = df['Date'].dt.strftime('%Y-%m-%d').isin(holiday_list)

import pandas as pd
import datetime

holiday_list = ['2020-01-01','2020-01-05','2020-01-12','2020-01-19','2020-01-26']
start_date = datetime.datetime(year=2020, month=1, day=1)
end_date = datetime.datetime(year=2020,month=1, day=28)

df = pd.DataFrame()

df['Date'] = pd.date_range(start_date, end_date)

df['Holiday'] = 0
mask = df['Date'].dt.strftime('%Y-%m-%d').isin(holiday_list)
df['Holiday'][mask] = 1

print(df)

B) you could use [start:start+size] to split list

 numbers = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]

 size = len(numbers)//3 

 print(d[size*0:size*1], d[size*1:size*2], d[size*2:size*3])

or

 print(d[:size], d[size:size*2], d[size*2:])

Similar way you can split dataframe (after filtered "Holiday") to work with 8 days [start:star+8] but I wil use it in (C)


C) You can create column Values with NW in all cells

df['Values'] = 'NW'

Next you can use previous mask to assign "Holiday"

mask = df['Date'].dt.strftime('%Y-%m-%d').isin(holiday_list)

df['Values'][ mask ] = 'Holiday'

Using ~ you can negate mask to reverse selection - to select cells withou "Holiday"

selected = df['Values'][ ~mask ]

and now I can try to assing

for a, b in zip(range(0, len(selected), 8), range(0, len(numbers), size)):
    selected[a:a+size] = numbers[b:b+size]

df['Values'][ ~mask ] = selected

but maybe it can be done in simpler way. Maybe with groupby() or rolling() ?


import pandas as pd
import datetime

holiday_list = ['2020-01-01','2020-01-05','2020-01-12','2020-01-19','2020-01-26']
start_date = datetime.datetime(year=2020, month=1, day=1)
end_date = datetime.datetime(year=2020,month=1, day=28)

df = pd.DataFrame()

# ---

df['Date'] = pd.date_range(start_date, end_date)

mask = df['Date'].dt.strftime('%Y-%m-%d').isin(holiday_list)

df['Holiday'] = 0
df['Holiday'][mask] = 1

# ---

df['Values'] = 'NW'
df['Values'][ mask ] = 'Holiday'

numbers = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]
size = len(numbers)//3

selected = df['Values'][ ~mask ]

for a, b in zip(range(0, len(selected), 8), range(0, len(numbers), size)):
    selected[a:a+size] = numbers[b:b+size]

df['Values'][ ~mask ] = selected
print(df)

EDIT:

I created this code.

Main problem was it sometimes create copy of data and it change values in this copy but not in original dataframe - so I use masks instead of slicings.

It may display warning that it changes values in copy of data (not in original dataframe) but finally it gives me correct result.

Maybe using information from Returning a view versus a cop it could remove this warning

import pandas as pd
import datetime

holiday_list = [
    '2020-01-01','2020-01-05', 
    #'2020-01-10','2020-01-11', # add more to test when there is less then 7 NW 
    '2020-01-12','2020-01-19','2020-01-26'
]
start_date = datetime.datetime(year=2020, month=1, day=1)
end_date = datetime.datetime(year=2020,month=1, day=28)

df = pd.DataFrame()

# ---

df['Date'] = pd.date_range(start_date, end_date)

mask = df['Date'].dt.strftime('%Y-%m-%d').isin(holiday_list)

df['Holiday'] = 0
df['Holiday'][mask] = 1

# ---

df['Values'] = 'NW'
df['Values'][ mask ] = 'Holiday'

numbers = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]
size = len(numbers)//3

start = 0
for b in range(0, len(numbers), size):
    # find first and last NW to replace (needs `start` to keep few NW at the end of previous 8 days gap)
    mask = (df['Values'] == 'NW') & (df.index >= start)

    # change size if there is less then 7 `NW`
    print('NW:', sum(mask)) # sum() counts all `True` in mask
    if sum(mask) <= size:
        left = size - sum(mask)
        size = sum(mask)
        print('shorter:', size, left)

    # first and last NW to replace
    start = df[ mask ].index[0]
    end   = df[ mask ].index[size-1]  
    print('start, end:', start, end)

    # use new mask to select and replace values
    # (using slicing [0:6] doesn't work beacuse it create copy of data
    #  and it doesn't replace in original dataframe)
    mask = mask & (df.index >= start) & (df.index <= end)
    df['Values'][ mask ] = numbers[b:b+size]

    # create gap 8days
    start += 8+1

print(df)
furas
  • 134,197
  • 12
  • 106
  • 148
  • @furas..Thaank u so much for the help ,request u to help me with the part c,I am trying to achieve gap of 8 days between each list as shown above in the output..I am stuck there – Balbinder Singh Kohtra Apr 22 '20 at 11:03
  • part C makes the biggest problem - I added code which uses `range(len())` to create ranges `[start:start+8]` because I don't know other method for this. – furas Apr 22 '20 at 11:16
  • thnaks so much..I need to maintaina gap of 8 days between between each list..in your code gap between no. 1 and 7 is 8 but gap between 7 and 13 is 9 days..I only want 8 days gap..thanks..see the oupt below – Balbinder Singh Kohtra Apr 22 '20 at 11:51
  • it can be even more complex problem. Current version first create `selected` and later calculate `a` inside `selected`, new versiou would have to first calculate next positon (`a`) and later create `selected`. – furas Apr 22 '20 at 16:30
  • is there any way to solve this?its been a long time I am trying to solve this..please – Balbinder Singh Kohtra Apr 22 '20 at 16:37
  • at this moment I don't know solution with pandas functions. You can try to do it with normal `for`-loop (and `df.iterrow()`) - and inside loop check value in column `Holiday` and count rows to create 8days gap. So it will need some variables to count 8 days, and control what number to put in `Values` – furas Apr 22 '20 at 17:15
  • sure..Thanks for the help..Really Apprecited..please do let me know if u find the solution of this – Balbinder Singh Kohtra Apr 22 '20 at 17:22
  • @furas..Thank you much..it worked..Really helpful..Appreciated your time and efforts. – Balbinder Singh Kohtra Apr 23 '20 at 08:27
1

I hope you solved it by now :) anyway this is my approach to solve the problem,

First of all, there are certain assumptions that I consider about when writing the code,

  • The length of the given array of integers is <= 18 which makes the length of a,b,c arrays <= 8

First, we need to divide the given array into equal three parts, and if the length of split arrays are < 8 we need to fill them with NW dummy values so the array length becomes 8.

To do that easily, we could use numpy.array, the array needs to split and add string type data NW. to do that we could use object as dtype of the array numpy.chararray here is an application

arr = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18], dtype=object)  

then we need to split the array into three equal parts,

arr = np.split(arr,3)

those created arrays need to fill if their length is < 8, np.insert

for i in range(len(arr[0]), 8):
    arr = np.insert(arr, i, dummy, axis=1)  # fill remaining slots of arrays with dummy value(NW)

Then we need to consider,

Part- A

We need to get the number of days between two days delta (can put that calculation inside the for statement) we need to get the dates for that range of days with the help of (datetime — Basic date and time types ) and iteration.

delta = end_date - Start_date
for i in range(delta.days + 1):
    day = Start_date + timedelta(days=i)

we can use .strftime() to define the time format we need.

day.strftime("%d-%m-%Y")

Finally, we need to check the current date given from the iteration is in the Holiday_List and print 1 Holiday next to date. If not, we need to print 0 and the elements from arrays next to date and also need to make sure to have a gap of 8 days between every list and empty day slot need to fill with the dummy value NW.

count = 0
for i in range(delta.days + 1):
    day = Start_date + timedelta(days=i)
    if day.strftime("%Y-%m-%d") in Holiday_List:
        print("{}\t{}\t{}".format(day.strftime("%d-%m-%Y"), 1, hDay))
    else:
        print("{}\t{}\t{}".format(day.strftime("%d-%m-%Y"), 0, arr[count//8][count%8]))
        count += 1

here count//8 will decide which array need to use to print its' elements and count%8 choose which element needs to print.


So the program,

import datetime
import numpy as np
from datetime import timedelta

Holiday_List = ['2020-01-01','2020-01-05','2020-01-12','2020-01-19','2020-01-26']
Start_date = datetime.datetime(year=2020, month =1 , day=1)
end_date = datetime.datetime(year =2020,month =1,day=28 )

delta = end_date - Start_date
print(delta)
hDay = "Holiday"
dummy = "NW"

# --- numpy array ---
arr = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18], dtype=object)  #Assumed that the array length of is divisible by 3 every time

arr = np.split(arr,3)   #spilts the array to three equal parts

for i in range(len(arr[0]), 8):
    arr = np.insert(arr, i, dummy, axis=1)  # fill remaining slots with dummy value(NW)
   


print("{}\t{}\t{}".format("Date", "Holiday", "Values"))

count = 0

for i in range(delta.days + 1):
    day = Start_date + timedelta(days=i)
    if day.strftime("%Y-%m-%d") in Holiday_List:
        print("{}\t{}\t{}".format(day.strftime("%d-%m-%Y"), 1, hDay))
    else:
        print("{}\t{}\t{}".format(day.strftime("%d-%m-%Y"), 0, arr[count//8][count%8]))
        count += 1

EDIT:

The above code has an issue in the last part that determines the gap and setting the dummy value NW

"When there are no holidays then you would need 3 NW so I would add 3 NW to every list ('a', 'b', 'c'), and then I would work with every list separately. I would use external for-loop like for data in arr: instead of arr[count//8] and I would count gap to skip last element if gap is 8 and element is 'NW' (BTW: if you add more holidays then you has to create gap bigger then 8). – @furas "

So with the help of @furas able to solve the issue(Thanks to him) :), Excess dummy values NW were neglected by iterating through the list,

import datetime
import numpy as np
from datetime import timedelta

Holiday_List = ['2020-01-01','2020-01-05','2020-01-12','2020-01-19','2020-01-26']

Start_date = datetime.datetime(year=2020, month=1, day=1)
end_date = datetime.datetime(year=2020, month=1, day=28)

delta = end_date - Start_date
print(delta)

hDay = "Holiday"
dummy = "NW"

# --- numpy array ---

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18], dtype=object)  # Assumed that the array length of is divisible by 3 every time

arr = np.split(arr, 3)  # spilts the array to three equal parts

for i in range(len(arr[0]), 9):  # add 3 'NW' instead of 2 'NW'
    arr = np.insert(arr, i, dummy, axis=1)  # fill remaining slots with dummy value(NW)

print("{}\t{}\t{}".format("Date", "Holiday", "Values"))

# ---

i = 0

for numbers in arr:

    gap = 0
    numbers_index = 0
    numbers_count = len(numbers) - 3  # count numbers without 3 `NW`

    while i < delta.days + 1:
        day = Start_date + timedelta(days=i)
        i += 1

        if day.strftime("%Y-%m-%d") in Holiday_List:
            print("{}\t{}\t{}".format(day.strftime("%d-%m-%Y"), 1, hDay))
            if numbers_index > 0:  # don't count Holiday before displaying first number from list `data` (ie. '2020-01-01')
                gap += 1
        else:
            value = numbers[numbers_index]
            # always put number (!='NW') or put 'NW' when gap is too small (<9)
            if value != 'NW' or gap < 9:
                print("{}\t{}\t{}".format(day.strftime("%d-%m-%Y"), 0, value))
                numbers_index += 1
                gap += 1
            # IDEA: maybe it could use `else:` to put `NW` without adding `NW` to `arr`

        # exit loop if all numbers are displayed and gap is big enough
        if numbers_index >= numbers_count and gap >= 9:
            break

Answer provided by the @furas is less messier, you should study that. Cheers mate, learned a lot actually!

Community
  • 1
  • 1
Avishka Dambawinna
  • 1,180
  • 1
  • 13
  • 29
  • yeah Avishka,@furas helped me to solve this issue..Thank u so much for taking out the time and creating the solutions..Really helpful..Appreciated. – Balbinder Singh Kohtra Apr 24 '20 at 14:58
  • @BalbinderSinghKohtra Glad to help mate... Always try your best to solve probs. on your own ..Cheers mate!:) – Avishka Dambawinna Apr 24 '20 at 15:02
  • 1
    this gives wrong result - gap between 7 and 13 is 9 instead of 8. It put number `13` at `21-01-2020` but it should be at `20-01-2020`. Problem is that you use always 2 `NW` at the end but sometimes there is `Holiday` and it should skip `NW` – furas Apr 25 '20 at 10:55
  • BTW: if you remove holiday `'2020-01-05'` from list then you will create gap 7 instead of 8 between 1 and 7. – furas Apr 25 '20 at 11:02
  • @furas, yeah I saw it yesterday. Your answer is the correct one. I tried to delete my answer, but not allowed. I was trying to debug it but yet unable to skip that `NW`. Any solution in mind? \ – Avishka Dambawinna Apr 25 '20 at 11:28
  • @furas If you were able to solve the issue in my code much appreciate it. Please :) – Avishka Dambawinna Apr 25 '20 at 11:35
  • 1
    last part is complicateda and in answer I tried three times to create working code. I already tried to change your code - at this moment without result because I would have to make bigger change. When there is no holidays then you would need 3 `NW` so I would add 3 `NW` to every list ('a', 'b', 'c') and then I would work with every list separatelly. I would use external `for`-loop like `for data in arr:` instead of `arr[count//8]` and I would count `gap` to skip last element if gap is 8 and element is 'NW' (BTW: if you add more holidays then you has to create gap bigger then 8). – furas Apr 25 '20 at 11:44
  • @furas yeah I was trying the whole day to solve that issue apparently it seems I need to change the whole structure of the code. I can't delete the answer b'cause it's accepted damn only thing left is to solve issue n edit – Avishka Dambawinna Apr 25 '20 at 11:50
  • @furas, The full count of 9 is the key :) damn it, try to do the same thing with 8, the whole day . Thank you very much, mate! :) – Avishka Dambawinna Apr 25 '20 at 12:37
  • Thank u so much furas and Avishka for all the efforts..Really Appreciated guys. – Balbinder Singh Kohtra Apr 25 '20 at 15:03