Compare decimal places in numpy.where

Question

I have a data frame, where I need to check row wise count of decimal places (with other condition) using in numpy.where and assign a value to a column: (Values are in string format)

Now wherever the count of decimal is not 2 and value is not in between -100 to 200, then I have add a value saying 'issues' in a new column otherwise new column will be blank.

I don't want to use apply, want to use numpy.where.

Output:

please provide your input in a reproducible format, not an image — mozway, Aug 29 '23 at 13:21
Also what is the expected output? None of your values have 2 significant decimal digits, did you mean non-zero digits? Are your values floats or strings? — mozway, Aug 29 '23 at 13:24
Assuming `df` your DataFrame, [edit] your question with the output of `df.to_dict('list')`. Also read [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — mozway, Aug 29 '23 at 13:26
*count of decimal is not 2 and value is not in between -100 to 200* - but `-0.11` has 2 decimal digits and is between -100 and 200. Why did you mark it "issue"? — RomanPerekhrest, Aug 29 '23 at 13:29
`np.where(c,a,b)` is only as useful as the 3 arguments (arrays or Series) `c`, the condition, has to be a Series that is true or false for each row of that frame - as a whole, not in an iterative sense. `where` is a not an iterator. — hpaulj, Aug 29 '23 at 15:02

mozway · Answer 1 · 2023-08-29T13:35:38.017

The exact logic is unclear as your description and example are not exactly matching, but for the general logic, assuming you want to flag values that are ending with 2 digits or not between -100-200, you can use pandas.to_numerid+between and a regex with str.extract+str.len:

import numpy as np

# is the numeric value between -100 and 200?
m1 = pd.to_numeric(df['name2'], errors='coerce').between(-100, 200)
# is the count of the decimal not 2?
m2 = df['name2'].str.extract(r'\.(\d*[1-9]+)', expand=False).str.len().ne(2)

df['Error'] = np.where(m1&m2, 'No Issue', 'Issue')

Output:

  Name      name2     Error
0    A      0.029  No Issue
1    B          0  No Issue
2    V          2  No Issue
3    D  -0.000029  No Issue
4    E      -0.11     Issue

Reproducible input:

df = pd.DataFrame({'Name': list('ABVDE'),
                   'name2': ['0.029', '0', '2', '-0.000029', '-0.11']
                  })

Intermediates:

  Name      name2    m1     m2  m1&m2     Error
0    A      0.029  True   True   True  No Issue
1    B          0  True   True   True  No Issue
2    V          2  True   True   True  No Issue
3    D  -0.000029  True   True   True  No Issue
4    E      -0.11  True  False  False     Issue

score 0 · Answer 2 · answered Aug 29 '23 at 16:47

This is using only np.where to get the answer you are asking. And code is easily readable.

cond = (df['name2'].astype(str).str.split('.').str[1].str.len() == 2) | (~df['name2'].between(-100, 200))
df['flag'] = np.where(cond, "issue", "no issue")

name     name2      flag
0    A  0.029000  no issue
1    B  0.000000  no issue
2    C  2.000000  no issue
3    D -0.000029  no issue
4    E -0.110000     issue

Compare decimal places in numpy.where

2 Answers2