1

I have a dataframe that has a list of One Piece manga, which currently looks like this:

0           #                                     Title Pages
Date                                                         
1997-07-19  1  Romance Dawn - The Dawn of the Adventure    53
1997-07-28  2               That Guy, "Straw Hat Luffy"    23
1997-08-04  3          Introducing "Pirate Hunter Zoro"    21
1997-08-11  4          Marine Captain "Axe-Hand Morgan"    19
1997-08-25  5          Pirate King and Master Swordsman    19
1997-09-01  6                     The First Crew Member    23
1997-09-08  7                                   Friends    20
1997-09-13  8                          Introducing Nami    19

Although every episode is to be issued weekly, sometimes they are delayed or on break, resulting in an irregular interval in the dates. What I would like to do is to add a missing date. For example, between 1997-08-11 and 1997-08-25, there should be 1997-08-18 (7 days from 1997-08-11) where the episode was not issued. Could you help me out with how to operate this code?

Thank you.

Jisung-P91
  • 13
  • 2
  • 2
    Does this answer your question? [Add missing dates to pandas dataframe](https://stackoverflow.com/questions/19324453/add-missing-dates-to-pandas-dataframe) – Mateo Torres Jun 09 '20 at 04:05
  • I don't think so because I need to fill in dates for an interval > = 2 weeks. re-index or re-sample doesn't provide the frequency. I think I would need some sort of function for this. – Jisung-P91 Jun 09 '20 at 04:21
  • Let me see if I understand this correctly. You want to fill-in the dates, but only when the gap is >= 14 days? and the date in the middle should be in 1 week intervals? There seems to be a couple of corner cases that might be difficult to properly determine. Such as a gap of 22 days. Filling in 7D intervals will leave you with a 1 day interval (in the end). Is this the desired output? – Mateo Torres Jun 09 '20 at 04:48
  • The next record is 9 days free from the first record. Are there any new rules as the date interval is not necessarily every week? – r-beginners Jun 09 '20 at 05:06

2 Answers2

0

I used relativedelta and list comprehension to get a 14-day interval per row and .shift(1) to compare to another row with .np.where() with a 1 returning a row where we would want to insert a row before. Then, I looped through the dataframe and appended the relevant rows to another dataframe. Then, I used pd.concat to bring the two dataframes together, sorted by date, deleted the helper columns and reset the index.

There may be some gaps as others have mentioned like 22 days+ but this should get you in the right direction. Perhaps you could turn it into a function and run it multiple times, which is why I added .reset_index(drop=True) at the end. Obviously, you could just make this more advanced, but I hope this helps.

from dateutil.relativedelta import relativedelta
import pandas
from datetime import datetime

df = pd.DataFrame({'Date': {0: '1997-07-19',
  1: '1997-07-28',
  2: '1997-08-04',
  3: '1997-08-11',
  4: '1997-08-25',
  5: '1997-09-01',
  6: '1997-09-08',
  7: '1997-09-13'},
 '#': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8},
 'Title': {0: 'Romance Dawn - The Dawn of the Adventure',
  1: 'That Guy, "Straw Hat Luffy"',
  2: 'Introducing "Pirate Hunter Zoro"',
  3: 'Marine Captain "Axe-Hand Morgan"',
  4: 'Pirate King and Master Swordsman',
  5: 'The First Crew Member',
  6: 'Friends',
  7: 'Introducing Nami'},
 'Pages': {0: 53, 1: 23, 2: 21, 3: 19, 4: 19, 5: 23, 6: 20, 7: 19}})

df['Date'] = pd.to_datetime(df['Date'])
df['Date2'] = [d - relativedelta(days=-14) for d in df['Date']]
df['Date3'] = np.where((df['Date'] >= df['Date2'].shift(1)), 1 , 0)
df1 = pd.DataFrame({})
n=0
for j in (df['Date3']):
    n+=1
    if j == 1:
        new_row = pd.DataFrame({"Date": df['Date'][n-1] - relativedelta(days=7)}, index=[n])
        df1=df1.append(new_row)
df = pd.concat([df, df1]).sort_values('Date').drop(['Date2', 'Date3'], axis=1).reset_index(drop=True)
df

Output:

    Date        #    Title                                      Pages
0   1997-07-19  1.0  Romance Dawn - The Dawn of the Adventure   53.0
1   1997-07-28  2.0  That Guy, "Straw Hat Luffy"                23.0
2   1997-08-04  3.0  Introducing "Pirate Hunter Zoro"           21.0
3   1997-08-11  4.0  Marine Captain "Axe-Hand Morgan"           19.0
4   1997-08-18  NaN  NaN                                        NaN
5   1997-08-25  5.0  Pirate King and Master Swordsman           19.0
6   1997-09-01  6.0  The First Crew Member                      23.0
7   1997-09-08  7.0  Friends                                    20.0
8   1997-09-13  8.0  Introducing Nami                           19.0
David Erickson
  • 16,433
  • 2
  • 19
  • 35
  • I think this is close to what I would like to achieve. I am having a hard time understanding your np.where function and loop. Could you explain what these lines try to achieve? – Jisung-P91 Jun 10 '20 at 05:36
  • The np.where is comparing Date to Date2 and returns a 1 if there is a 14 day or more gap. Then, later in the loop `for j in (df['Date3']): n+=1 if j == 1:'` The code `if j == 1:` is using that logic ressult of 1 from the earlier np.where to insert a row. Make sense? – David Erickson Jun 10 '20 at 05:54
0

You sould use the shift builtin function.

df['day_between'] =  df['Date'].shift(-1) - df['Date']

output of print(df[['Date', 'day_between']]) is then:

        Date day_between
0 1997-07-19      9 days
1 1997-07-28      7 days
2 1997-08-04      7 days
3 1997-08-11     14 days
4 1997-08-25      7 days
5 1997-09-01      7 days
6 1997-09-08      5 days
7 1997-09-13         NaT
pyOliv
  • 1,253
  • 1
  • 6
  • 21