0

I have a data frame, where I need to check row wise count of decimal places (with other condition) using in numpy.where and assign a value to a column: (Values are in string format)

enter image description here

Now wherever the count of decimal is not 2 and value is not in between -100 to 200, then I have add a value saying 'issues' in a new column otherwise new column will be blank.

I don't want to use apply, want to use numpy.where.

Output: enter image description here

jared
  • 4,165
  • 1
  • 8
  • 31
sadaa
  • 21
  • 4
  • please provide your input in a reproducible format, not an image – mozway Aug 29 '23 at 13:21
  • Also what is the expected output? None of your values have 2 significant decimal digits, did you mean non-zero digits? Are your values floats or strings? – mozway Aug 29 '23 at 13:24
  • I'm not sure how to do that – sadaa Aug 29 '23 at 13:24
  • Assuming `df` your DataFrame, [edit] your question with the output of `df.to_dict('list')`. Also read [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – mozway Aug 29 '23 at 13:26
  • Values are in string format – sadaa Aug 29 '23 at 13:27
  • *count of decimal is not 2 and value is not in between -100 to 200* - but `-0.11` has 2 decimal digits and is between -100 and 200. Why did you mark it "issue"? – RomanPerekhrest Aug 29 '23 at 13:29
  • `np.where(c,a,b)` is only as useful as the 3 arguments (arrays or Series) `c`, the condition, has to be a Series that is true or false for each row of that frame - as a whole, not in an iterative sense. `where` is a not an iterator. – hpaulj Aug 29 '23 at 15:02

2 Answers2

1

The exact logic is unclear as your description and example are not exactly matching, but for the general logic, assuming you want to flag values that are ending with 2 digits or not between -100-200, you can use pandas.to_numerid+between and a regex with str.extract+str.len:

import numpy as np

# is the numeric value between -100 and 200?
m1 = pd.to_numeric(df['name2'], errors='coerce').between(-100, 200)
# is the count of the decimal not 2?
m2 = df['name2'].str.extract(r'\.(\d*[1-9]+)', expand=False).str.len().ne(2)

df['Error'] = np.where(m1&m2, 'No Issue', 'Issue')

Output:

  Name      name2     Error
0    A      0.029  No Issue
1    B          0  No Issue
2    V          2  No Issue
3    D  -0.000029  No Issue
4    E      -0.11     Issue

Reproducible input:

df = pd.DataFrame({'Name': list('ABVDE'),
                   'name2': ['0.029', '0', '2', '-0.000029', '-0.11']
                  })

Intermediates:

  Name      name2    m1     m2  m1&m2     Error
0    A      0.029  True   True   True  No Issue
1    B          0  True   True   True  No Issue
2    V          2  True   True   True  No Issue
3    D  -0.000029  True   True   True  No Issue
4    E      -0.11  True  False  False     Issue
mozway
  • 194,879
  • 13
  • 39
  • 75
0

This is using only np.where to get the answer you are asking. And code is easily readable.

cond = (df['name2'].astype(str).str.split('.').str[1].str.len() == 2) | (~df['name2'].between(-100, 200))
df['flag'] = np.where(cond, "issue", "no issue")

name     name2      flag
0    A  0.029000  no issue
1    B  0.000000  no issue
2    C  2.000000  no issue
3    D -0.000029  no issue
4    E -0.110000     issue
ragas
  • 848
  • 2
  • 7