0

I want to flag the anomalies in the desired_columns (desired_D to L). Here, an anomaly is defined as any value <1500 and >400000 in each row.

See below for the dataset

import pandas as pd
  
# intialise data of lists
data = { 
         'A':['L1', 'L2', 'L3', 'L4', 'L5'],
         'B':[1,1,1,1,1],
         'C':[1,2,3,5,9],
         'desired_D':[12005, 18190, 1021, 13301, 31119],
        'desired_E':[11021, 19112, 19021, 15, 24509 ],
        'desired_F':[10022,19910, 19113,449999, 25519],
        'desired_G':[14029, 29100, 39022, 24509, 412271],
        'desired_H':[52119,32991,52883,69359,57835],
         'desired_J':[41218, 52991,55121,69152,79355],
         'desired_K': [43211,7672991,56881,211,77342],
          'desired_L': [31211,42901,53818,62158,69325],
        }
  
# Create DataFrame
df = pd.DataFrame(data)
  
# Print the output.
df

Currently, my code flags columns B, and C inclusively (I want to exclude them).

The revised code looks like this:

# function to flag the anomaly in each row- this flags columns B and C as well (I want to exclude these columns)

dont_format_cols = ['B','C']

def flag_outliers(s, dont_format_cols):
    if s.name in dont_format_cols:
        return '' # or None, or whatever df.style() needs
    else:
        s = pd.to_numeric(s, errors='coerce')
        indexes = (s<1500)|(s>400000)
        return ['background-color: red' if v else '' for v in indexes]

styled = df.style.apply(flag_outliers, axis=1)
styled

The error after edits

enter image description here

Desired output: should exclude B and C,refer to the image below.

enter image description here

smci
  • 32,567
  • 20
  • 113
  • 146
RayX500
  • 247
  • 2
  • 10
  • [`df.apply(..., axis=1)`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html) applies your outlier-styling function (column-wise) to ***all*** of df. If you only want to apply it to some columns, then inspect the series name `s.name` inside the styling function, as in [Pandas style function to highlight specific columns](https://stackoverflow.com/a/41655004/202229) – smci Apr 05 '21 at 00:45
  • Does this answer your question? [What does the subset argument do in pandas.io.formats.style.Styler.format?](https://stackoverflow.com/questions/59203022/what-does-the-subset-argument-do-in-pandas-io-formats-style-styler-format) – smci Apr 09 '21 at 20:24
  • Please state your question accurately. *"How to flag anomalies..."* is a red herring (you're not actually looking for an anomaly-detection heuristic), you're trying to selectively apply styling. – smci Apr 09 '21 at 20:25

2 Answers2

1

df.style.apply(..., axis=1) applies your outlier-styling function (column-wise) to all of df's columns. If you only want to apply it to some columns, use the subset argument.


EDIT: I wasn't aware df.style.apply() had a subset argument, I had proposed these hacky approaches:

1: inspect the series name s.name inside the styling function, like the solution Pandas style function to highlight specific columns.

### Hack solution just hardwire it into the body of `flag_outliers()` without adding in an extra arg `dont_format_cols`    
def flag_outliers(s):
    dont_format_cols = ['B','C']
    if s.name in dont_format_cols:
        return '' # or None, or whatever df.style() needs
    else:
        # code to apply formatting

2: Another hack approach: add a second arg dont_format_cols to your function flag_outliers(s, dont_format_cols). Now you have to pass it in in the apply call, so you'll need a lambda:

styled = df.style.apply(lambda s: flag_outliers(s, dont_format_cols), axis=1)

and:

def flag_outliers(s, dont_format_cols):
    if s.name in dont_format_cols:
        return '' # or None, or whatever df.style() needs
    else:
        # code to apply formatting
 
smci
  • 32,567
  • 20
  • 113
  • 146
  • Sadly, this didn't work for me, I think I am not getting something. I used your code (pls refer to my revised code) and got an error `TypeError: flag_outliers() missing 1 required positional argument: 'dont_format_cols' `. – RayX500 Apr 05 '21 at 01:36
  • @RickyTricky: it does work. Obviously since we added a second arg `dont_format_cols`, you either have to pass it, or default it. Or as. a quick hack, you can just hardwire it into the body of `flag_outliers()` without adding in an extra arg `dont_format_cols` – smci Apr 05 '21 at 01:43
  • Revised the answer to show that. Please read about pandas `apply()` and look at some examples of when we declare lambda functions to pass in missing/extra parameters. – smci Apr 05 '21 at 01:48
  • works now, after changing `axis=1` to `axis = 0`. Thanks – RayX500 Apr 05 '21 at 02:00
  • @RickyTricky: that might be wrong, `axis = 0` means apply the function row-wise. – smci Apr 05 '21 at 02:03
  • `axis = 1`, it still captured `B` and `C` in my code, which i don't want. can you share your replace your pseudocode with the full code and share the image of your output. Thanks – RayX500 Apr 05 '21 at 02:24
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/230742/discussion-between-rickytricky-and-smci). – RayX500 Apr 05 '21 at 03:53
  • why are you ignoring the `subset` argument? – Attack68 Apr 09 '21 at 07:57
  • 1
    @Attack68: I wasn't 'ignoring' it, aware it existed. I updated the link in the answer – smci Apr 09 '21 at 20:20
0

Use the subset argument. That is precisely its purpose to isolate styles to only specific regions.

i.e df.style.apply(flag_outliers, axis=1, subset=<list of used columns>)

You can see examples in the pandas Styler user guide documentation entitled finer slicing.

Attack68
  • 4,437
  • 1
  • 20
  • 40