-1

I am starting off with Python and using Pandas.

I have 2 CSVs i.e

CSV1

Date         Col1   Col2
2021-01-01    20     15
2021-01-02    22     12
2021-01-03    30     18
.
.
2021-12-31    125    160
so on and so forth...

CSV2

Start_Date   End_Date      Sunday  Monday  Tuesday Wednesday Thursday Friday Saturday
2021-01-01   2021-02-25      15      25      35       45       30       40     55
2021-02-26   2021-05-31      25      30      44       35       50       45     66
.
.
2021-09-01   2021-0-25       44      25      65       54       24       67     38

Desired result

Date         Col1   Col2   New_Col3   New_Col4
2021-01-01    20     15       Fri        40
2021-01-02    22     12       Sat        55
2021-01-03    30     18       Sun        15
.
.
2021-12-31    125    160      Fri        67
  • New_Col3 is the weekday abbreviation of Date
  • New_Col4 is the cell in CSV2 where the Date falls between Start_Date and End_Date row-wise, and from the corresponding weekday column-wise.
Bill Huang
  • 4,491
  • 2
  • 13
  • 31

2 Answers2

0
# Convert date column to datetime
df1['Date'] = pd.to_datetime(df1['Date'])
df2['Start_Date'] = pd.to_datetime(df2['Start_Date'])
df2['End_Date'] = pd.to_datetime(df2['End_Date'])

# Get abbreviated weekday name
df1['New_Col3'] = df1['Date'].apply(lambda x: x.strftime('%a'))

New_Col4 = []

# Iterate over df1
for i in range(len(df1)):
    # If df1['date'] is in between df2['Start_Date'] and df2['End_Date']
    # Get the value according to df1['date'] weekday name
    for j in range(len(df2)):
        if df2.loc[j, 'Start_Date'] <= df1.loc[i, 'Date'] <= df2.loc[j, 'End_Date']:
            day_name = df1.loc[i, 'Date'].strftime('%A')
            New_Col4.append(df2.loc[j, day_name])

# Assign the result to a new column
df1['New_Col4'] = New_Col4
# print(df1)

        Date  Col1  Col2 New_Col3  New_Col4
0 2021-01-01    20    15      Fri        40
1 2021-01-02    22    12      Sat        55
2 2021-01-03    30    18      Sun        15
3 2021-03-03    40    18      Wed        35
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
0

Keys

  1. Construct datetime and interval indexes to enable pd.IntervalIndex.get_indexer(pd.DatetimeIndex) for efficient row-matching. (reference post)
  2. Apply a value retrieval function from df2 on each row of df1 for New_Col4.

With this approach, an explicit double for-loop search can be avoided in row-matching. However, a slow .apply() is still required. Maybe there is a fancy way to combine these two steps, but I will stop here for the time being.

Data

Typo in the last End_Date is changed.

import pandas as pd
import io

df1 = pd.read_csv(io.StringIO("""
Date         Col1   Col2
2021-01-01    20     15
2021-01-02    22     12
2021-01-03    30     18
2021-12-31    125    160
"""), sep=r"\s+", engine='python')

df2 = pd.read_csv(io.StringIO("""
Start_Date   End_Date      Sunday  Monday  Tuesday Wednesday Thursday Friday Saturday
2021-01-01   2021-02-25      15      25      35       45       30       40     55
2021-02-26   2021-05-31      25      30      44       35       50       45     66
2021-09-01   2022-01-25       44      25      65       54       24       67     38
"""), sep=r"\s+", engine='python')

df1["Date"] = pd.to_datetime(df1["Date"])
df2["Start_Date"] = pd.to_datetime(df2["Start_Date"])
df2["End_Date"] = pd.to_datetime(df2["End_Date"])

Solution

# 1. Get weekday name
df1["day_name"] = df1["Date"].dt.day_name()
df1["New_Col3"] = df1["day_name"].str[:3]

# 2-1. find corresponding row in df2
df1.set_index("Date", inplace=True)
idx = pd.IntervalIndex.from_arrays(df2["Start_Date"], df2["End_Date"], closed="both")
df1["df2_row"] = idx.get_indexer(df1.index)

# 2-2. pick out the value from df2
def f(row):
    """Get (#row, day_name) in df2"""
    return df2[row["day_name"]].iloc[row["df2_row"]]

df1["New_Col4"] = df1.apply(f, axis=1)

Result

print(df1.drop(columns=["day_name", "df2_row"]))

Out[319]: 
            Col1  Col2 New_Col3  New_Col4
Date                                     
2021-01-01    20    15      Fri        40
2021-01-02    22    12      Sat        55
2021-01-03    30    18      Sun        15
2021-12-31   125   160      Fri        67
Bill Huang
  • 4,491
  • 2
  • 13
  • 31