0

For the entire data frame I need to compare 4 dates all on the same row. Find the latest date and highlight it. The highlighted cell is the highest between comp1 - comp4.

The output I need will look like this: enter image description here

I started by making sure all comps were date times dtypes and I even tried making them objects and comparing them before writing this but with no luck.

Here is what I have tried/searched on line for but none of these work:

checks.style.highlight_max(color= 'yellow', axis=0)

Nothing gets highlighted

I also tried to use subsets but for some reason no matter if check the dtypes on each comp they non not stay a date time or an object but instead become a float for some odd reason

checks.style.highlight_max(color= 'yellow', axis=0, subset=['CAC Clearance', 'ASB Results Received','Arch Assessment','Bio Assessment'])

This is the error i get but I have them all as date times before I run it.

TypeError: '>=' not supported between instances of 'float' and 'datetime.date'

Lastly I tried to do a groupby the ID and even that way I cant not seem to get it to work.

example date using print(checks.head().to_records())/print(checks.head().to_dict())

outputs (only can give certain info for now timestamps)

TypeError Traceback (most recent call last) in ----> 1 print(checks.head().to_records())/print(checks.head().to_dict())

TypeError: unsupported operand type(s) for /: 'NoneType' and 'NoneType'

1st print example:

'2021-10-13T00:00:00.000000000', '2021-10-13T00:00:00.000000000')

2nd print example:

Timestamp('2021-10-13 00:00:00'), 4: Timestamp('2021-10-13 00:00:00')}, 'Bio Assessment': {0: Timestamp('2021-10-13 00:00:00'), 1: Timestamp('2021-10-14 00:00:00'), 2: Timestamp('2021-10-13 00:00:00'), 3: Timestamp('2021-10-13 00:00:00'), 4: Timestamp('2021-10-13 00:00:00')}}

JQTs
  • 142
  • 2
  • 11
  • 1
    Please do not link or embed external images of source code or data. Images make it difficult to efficiently assist you as they cannot be copied and offer poor usability as they cannot be searched. See: [Why not upload images of code/errors when asking a question?](https://meta.stackoverflow.com/q/285551/15497888) If you need assistance formatting a small sample of your DataFrame as a copyable piece of code for SO see [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). – Henry Ecker Nov 06 '21 at 22:22
  • 1
    Presumably you have missing Data meaning you have `NaN` inside a row with dates and ordering is not possible. If you provide a sample of the DataFrame `print(checks.head().to_records())`/`print(checks.head().to_dict())` this question would be a lot easier to answer. Since it is not possible to derive the underlying dtypes of your DataFrame from just this image. – Henry Ecker Nov 06 '21 at 22:23
  • Wow that was fast. Forgive me for the picture. I tried to just copy past an example form excel and it uploaded like that. Will just to figure that out. Also, will use try to show an example of the data in the original post will have to edit some of that though. – JQTs Nov 06 '21 at 23:32
  • and yup there at NaTs for sure. Ill change that around and try to format the dates as well just to show year-month-day and see if that works – JQTs Nov 06 '21 at 23:39

1 Answers1

0

I figured it out.

  • first had to copy my df to stop copy warning

  • then use this code to make all my date time string and fill in the NaT with "0"s. This was the only way I could compare with out a str/int to datetime/timestamp error:

    checks['comp1'] = checks['comp1'].dt.strftime('%Y-%m-%d').fillna("0")

  • I tried to use the highlight style above in the original post but only a few dates would highlight so I made this function that out puts a new columns with the info I need.

     def find_lastest_date(df, comp1, comp2, comp3, comp4, comp5):
      # Debris not finished
      if df[comp1] == "0":
          return 'No "comp1" Date'
    
    
    
      # compares comp1 to all other comps
      if ((df[comp1] > df[comp2]) & (df[comp1] > df[comp3]) & (df[comp1] > df[comp4]) & (df[comp1] > df[comp5])):
          return '"comp1" is greater than Latest Date'
      # compares comp2 to all other comps
      elif ((df[comp2] > df[comp1]) & (df[comp2] >= df[comp3]) & (df[comp2] >= df[comp4]) & (df[comp2] >= df[comp5])):
          return "'comp2' Latest/has other matching dates"
      # compares comp3 to all other comps
      elif  ((df[comp3] > df[comp1]) & (df[comp3] >= df[comp2]) & (df[comp3] >= df[comp4]) & (df[comp3] >= df[comp5])):
          return '"comp3" Latest/has other matching dates'
      # compares comp4 to all other comps
      elif ((df[comp4] > df[comp1]) & (df[comp4] >= df[comp2]) & (df[comp4] >= df[comp3]) & (df[comp4] >= df[comp5])):
          return '"comp4" Latest/has other matching dates'
      elif((df[comp5] > df[comp1]) & (df[comp5] >= df[comp2]) & (df[comp5] >= df[comp3]) & (df[comp5] >= df[comp4])):
           return '"comp45" Latest/has other matching dates'
    
      # Comp matches
    
      # All comps == "0" leave blank
      elif ((df[comp1] == "0") & (df[comp2] == "0") & (df[comp3] == "0") & (df[comp4] == "0") & (df[comp5] == "0")):
          return ""
    
    
      # All comps macth
      elif ((df[comp1] == df[comp2]) & (df[comp1] == df[comp3]) & (df[comp1] == df[comp4]) & (df[comp1] == df[comp5])):
          return "'comp1' matches Latest Date"
    
      # comp1 match to another other comp
      elif ((df[comp1] == df[comp2]) | (df[comp1] == df[comp3]) | (df[comp1] == df[comp4]) | (df[comp1] == df[comp5])):
          return "'comp1' matches Latest Date"
      else:
          return ""
    

Here is the way I called this function which make a new column and inserts it into the column index I wanted like so:

df.insert(df.columns.get_loc('comp1'), 'Output column name', df.apply(find_lastest_date, comp1='comp1', comp2='2comp2', comp3='comp3, comp4='comp4',comp5='comp5',axis=1))
JQTs
  • 142
  • 2
  • 11