-1

I've two dfs. I wanted to assign df1.date = df2.start_date if df1.date <= df2.end_date.

df1 = {"date": ['2020-12-23 18:20:37', '2021-08-20 12:17:41.487'], "result": [ 'pass', 'fail']}

df2 = {"start_date": ['2021-08-19 12:17:41.487','2021-08-12 12:17:41.487', '2021-08-26 12:17:41.487'],
"end_date": ['2021-08-26 12:17:41.487', '2021-08-19 12:17:41.487', '2021-09-02 12:17:41.487']}

I just give two rows while in real I'm doing this query on 100,000 rows. How do I achieve this?

Deep
  • 1
  • 1

2 Answers2

1

assuming im understanding your question correctly and that both your dataframes line up with each other. you could loop through each row and do a compare across to the other df. however if you have thousands of records this could take some time.

    df1 = pd.DataFrame({"date": [datetime.date(2014, 12, 29), datetime.date(2015, 1, 26), datetime.date(2015, 2, 26), datetime.date(2015, 3, 8), datetime.date(2015, 4, 10)], 
"result": ['pass', 'fail', 'fail', 'pass', 'pass']})

df2 = pd.DataFrame({'start_date': [datetime.date(2015, 1, 1), datetime.date(2015, 2, 1), datetime.date(2015, 3, 1), datetime.date(2015, 4, 1), datetime.date(2015, 5, 1)], 
                    'end_date': [datetime.date(2015, 1, 25), datetime.date(2015, 2, 20), datetime.date(2015, 3, 15), datetime.date(2015, 4, 24), datetime.date(2015, 5, 23)]})

for i in range(len(df1)):
    if (df1.date[i] <= df2.end_date[i]):
        df1.date[i] = df2.start_date[i]

but again this is assuming that both data frames have the same length and its a direct compare across

Jeremy
  • 116
  • 4
0

We can make use of numpy's where.

# df1.date = df2.start_date if df1.date <= df2.end_date
import numpy as np 
df1.date = np.where(df1.date <= df2.end_date, df2.start_date, df1.date)

New df1

    date    result
0   2015-01-01  pass
1   2015-02-01  pass
2   2015-03-01  fail
3   2015-04-01  fail
4   2015-05-01  pass

Data used

df1 = {"date": [ '2014-12-29', '2015-01-26', '2015-02-26', '2015-03-08', '2015-04-10' ], 
"result": [ 'pass', 'pass', 'fail', 'fail', 'pass' ]}

df2 = {"start_date": [ '2015-01-01', '2015-02-01', '2015-03-01', '2015-04-01', '2015-05-01' ],
"end_date": [ '2015-01-25', '2015-02-20', '2015-03-15', '2015-04-24', '2015-05-23' ]}

df1 = pd.DataFrame(data = df1)
df2 = pd.DataFrame(data = df2)
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
  • I'm getting "ValueError: Can only compare identically-labeled Series objects" – Deep Jul 22 '22 at 19:28
  • Could you share your data like I have done in my answer? It is quite unusual to have `**` in column names for instance. – NelsonGon Jul 22 '22 at 19:34
  • I've modified data with actual values I'm using, I just give two rows while in real I'm doing this query on 100,000 rows. – Deep Jul 22 '22 at 19:45
  • I cannot reproduce the error even with the new data. It works fine for me. Are you running the code as provided here? See also https://stackoverflow.com/questions/18548370/pandas-can-only-compare-identically-labeled-dataframe-objects-error – NelsonGon Jul 22 '22 at 19:48