0

I need to subtract dates with integer as the return so I can reference that integer fo another calculated column.

Orignal "io" csv file provides columns "ID", "CreatedDate", and "IFSPDate". I need to add two calculated columns.

Calculated Column 1: "Days". I used the code below to subtract dates in a calculated column

io['CreatedDate'] = pd.to_datetime(io.CreatedDate)

io['IFSPDate'] = pd.to_datetime(io.IFSPDate)

for ind, row in io.iterrows():
    dataset.loc[ind,"Days to Table"] = row['CreatedDate'] - row['IFSPDate']

I get this outcome

| ID | CreatedDate | IFSPDate   | Days             |
|----|-------------|------------|------------------|
| 1  | 2021-09-17  | 2021-09-17 | 0 days 00:00:00  |
| 2  | 2021-08-05  | 2021-01-13 | 204 days 00:00:00|
| 3  | 2021-09-03  | 2041-08-31 | 3 days 00:00:00  |
| 4  | 2021-09-16  | 2021-07-27 | 51 days 00:00:00 |

Calculated Column 2: "Timeliness" In this column I want to answer the question "is cell in days column <= 4" then return 1 if yes and 0 if no.

The problem: Using the current code, the outcome of "Days" column is not an integer, so I can't use <= operator in "Timeliness" column to return 0 or 1.

The outcome I want

| ID | CreatedDate | IFSPDate   | Days | Timeliness |
|----|-------------|------------|------|------------|
| 1  | 2021-09-17  | 2021-09-17 | 0    | 1          |
| 2  | 2021-08-05  | 2021-01-13 | 204  | 0          |
| 3  | 2021-09-03  | 2041-08-31 | 3    | 1          |
| 4  | 2021-09-16  | 2021-07-27 | 51   | 0          |

UPDATE:

Using io["Days"] = (io["Created_Date"] - io["IFSP_Date"]).dt.days returned an integer when subtracting the dates

Using io["Timeliness"] = np.where(io['Days'] <= 4, '1', '0') created the new column and returned a 1 or 0.

ddejohn
  • 8,775
  • 3
  • 17
  • 30
WriteJP
  • 3
  • 2
  • you could convert the result of subtraction to a timestamp, and then figure out does it have enough milliseconds so that it is longer than 4 days – tsamridh86 Sep 20 '21 at 02:32
  • Please see [how to ask Pandas questions on SO](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – ddejohn Sep 20 '21 at 03:35
  • Is that IFSPDate for ID 3 supposed to be 2041 or is that a typo? If it's supposed to 2041, then this gets quite a bit trickier. How should days be calculated when one of the years has a leap year? – ddejohn Sep 20 '21 at 22:09

1 Answers1

1

No need to iterate. This should work, but I can't confirm without sample data:

io["Days"] = (io["CreatedDate"] - io["IFSPDate"]).dt.days
io["Timelines"] = (io["Days"] <= 4).astype(int)

I assume your "created" and "ifsp" columns are proper Pandas Timestamp or datetime columns.

ddejohn
  • 8,775
  • 3
  • 17
  • 30
  • I updated with source data. hope this helps. – WriteJP Sep 20 '21 at 12:24
  • @WriteJP this solution works assuming your date columns are proper datetimes (not `str`) and also assuming that the 2041 in your sample data is just a typo. – ddejohn Sep 20 '21 at 22:11
  • It worked! I used io["Days"] = (io["Created_Date"] - io["IFSP_Date"]).dt.days to get the results in "Days to be an integer" and I used io["Timeliness"] = np.where(io['Days'] <= 4, '1', '0') to return the 1 or 0. Thank you SO much! – WriteJP Sep 21 '21 at 01:27
  • Sure thing, glad to help! Although it sounds like you gave the `Timelines` column string values of 1 and 0, not integers. Is that intentional? – ddejohn Sep 21 '21 at 01:53
  • Unfortunately, I have no idea. I'm coming from excel trying to lean to clean and modify csv files because excel bogs down my machine. As a result, I haven't learned the fundamentals, I'm piecing together when I see here and on YouTube to accomplish specific tasks. – WriteJP Sep 21 '21 at 09:03