0

I'm trying to compare two values in the same row to eachother to see which is greater. The two rows are dates like this:

PheneVisit  PheneDate   ER Date
phchp010v3  9/22/06 
phchp010v3  9/22/06     5/1/20
phchp101v1  9/14/09 
phchp101v1  9/14/09 

And I want to put the highest value in a new column called "dates" using the code below, but it does not catch the blank/NaN values for the ER Date column and simply produces more NaN columns if there is no ER Date.

Here is the code I have currently:

for index, row in data.iterrows():
    if (pd.to_datetime(row['PheneDate']) > pd.to_datetime(row['ER Date'])) or (row['ER Date'] == 0):
        print('1')
        data["date order"] = data['PheneDate']
    elif pd.to_datetime(row['ER Date']) > pd.to_datetime(row['PheneDate']) or [row['PheneDate'] == 0]:
        print(2)
        data["date order"] = data['ER Date']

which produces the following output:

date order

5/1/20

The desired output would be

9/22/06
5/1/20
9/14/09
9/14/09

** I simply need to check the values for NaN in each cell and if it is empty then the non empty date should be taken. I also tried == 'nan' and == 0 **

  • 1
    check if its just blank that is try == '' – darth baba Jul 08 '21 at 16:38
  • @darthbaba sadly this did not work either. I've tried many things such ```as or (data['ER Date'][index] == "NaN")``` and "nan" as well as .isnull.any() When I print the empty cell, i get nan and for type i get float, which is why I tried == 0 – Guitarman045 Jul 08 '21 at 16:41

2 Answers2

1

You can use pandas.to_datetime to convert to the datetime type and use normal comparison:

df['PheneDate'] = pd.to_datetime(df['PheneDate'])
df['ER'] = pd.to_datetime(df['ER'])
df['Date'] = df[['PheneDate', 'ER']].max(axis=1)

output:

   PheneVisit  PheneDate         ER       Date
0  phchp010v3 2006-09-22        NaT 2006-09-22
1  phchp010v3 2006-09-22 2020-05-01 2020-05-01
2  phchp101v1 2009-09-14        NaT 2009-09-14
3  phchp101v1 2009-09-14        NaT 2009-09-14

If you need to convert back to string with a specific regional formatting, use the following:

>>> df['Date'].dt.strftime('%m/%d/%y')

0    09/22/06
1    05/01/20
2    09/14/09
3    09/14/09

Finally, if you really want to avoid leading zeros in the month/day formatting:

>>> df['Date'].dt.strftime('%-m/%-d/%y')

0    9/22/06
1     5/1/20
2    9/14/09
3    9/14/09
mozway
  • 194,879
  • 13
  • 39
  • 75
  • perfect, I guess I was over doing it with the iterrows and all that, however I thought what I had would work anyway. I guess not. Thank you! – Guitarman045 Jul 08 '21 at 16:59
  • It is very often more efficient and more concise to use vector operation rather than manually iterating over the rows. – mozway Jul 08 '21 at 17:03
  • is what you provided a vector operation? – Guitarman045 Jul 08 '21 at 18:27
  • The [pandas.DataFrame.max](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.max.html) is vectorized – mozway Jul 08 '21 at 18:41
0

Try numpy where:

df['new_col'] = np.where(df['ER Date'] > df['PheneDate'], df['ER Date'], df['PheneDate'])