Multiple outcomes from numpy where statement

Question

I have an excel sheet that has 4 column, (Column 1,2,3,4).

The data in the columns is binary 1 or 0. however, sometimes in the column there are X's. when an X appears in one row of a column, it will appear at that row over all columns.

What i am trying to do is to sum all the columns across by rows and attribute a Pass, Fail or X status.

Max sum across the rows can be 4, if it is 4 it is a Fail. If it is less than 4 its a pass. If it is an X i want it to remain as an X in the new df created

What i have done is:
criteria = df['Column 1'] + df['Column 2'] + df['Column 3 '] + df['Column 4']
df['criteria'] = np.where(criteria == 4, 'Fail', 'Pass')

The problem is that this marks the Xs as a pass.

sushant_padha · Answer 1 · 2021-01-30T09:39:04.720

First of all, in the code you are adding all the values. This won't work as you intend because all the values may be strings, or worse, a combination of ints and strings. This will either return a string, or raise a TypeError.

Hereafter, the values of df['Column 1'], df['Column 2'], df['Column 3'], df['Column 4'] are considered strings. (you may have to convert them to strings, if they are not already)

First, find 'X' in the df columns, and set df['criteria'] to 'X' if it does.

Second, convert the strings to numbers and add the numbers, and then do the comparison with 4.

Use this:

# make `np.array` of the concerned columns
slice = np.array((
    df['Column 1'], df['Column 2'], df['Column 3'], df['Column 4']
))

# `slice` is a transpose of `df`, so transpose again to make it same
slice = np.transpose(slice)

for i in range(len(slice)):
    row = slice[i]
    if np.array(np.where(row == 'X')).size > 0:
        df['criteria'][i] = 'X'
    else:
        row = row.astype('int')
        if sum(row) == 4:
            df['criteria'][i] = 'Fail'
        elif sum(row) < 4:
            df['criteria'][i] = 'Pass'

NOTES:

This assumes that a column named 'criteria' already exists and has the appropriate size to store all the values.
For more information about the first if statement in the for loop (line 11), see How can I check whether a numpy array is empty or not?

This code was tested with the following value of df

# Note: This `DataFrame` looks different.
# Note: The columns are horizontal instead of vertical
df = pd.DataFrame.from_dict(data={
    'Column 1': ['0', '1', 'X', '1'],
    'Column 2': ['1', '1', 'X', '1'],
    'Column 3': ['0', '1', 'X', '0'],
    'Column 4': ['0', '1', 'X', '1'],
    'Expected': ['Pass', 'Fail', 'X', 'Pass'],
    'criteria': ['', '', '', '']
}, dtype='str')

And the expected result (df['Expected']) matched df['criteria']

@fountainhead, yes I have. I provided some notes about it as well. — sushant_padha, Jan 30 '21 at 09:41

Multiple outcomes from numpy where statement

1 Answers1