0

I have an excel sheet that has 4 column, (Column 1,2,3,4).

The data in the columns is binary 1 or 0. however, sometimes in the column there are X's. when an X appears in one row of a column, it will appear at that row over all columns.

What i am trying to do is to sum all the columns across by rows and attribute a Pass, Fail or X status.

Max sum across the rows can be 4, if it is 4 it is a Fail. If it is less than 4 its a pass. If it is an X i want it to remain as an X in the new df created

What i have done is:
criteria = df['Column 1'] + df['Column 2'] + df['Column 3 '] + df['Column 4']
df['criteria'] = np.where(criteria == 4, 'Fail', 'Pass')

The problem is that this marks the Xs as a pass.

kitchen800
  • 197
  • 1
  • 12
  • 36

1 Answers1

0

First of all, in the code you are adding all the values. This won't work as you intend because all the values may be strings, or worse, a combination of ints and strings. This will either return a string, or raise a TypeError.

Hereafter, the values of df['Column 1'], df['Column 2'], df['Column 3'], df['Column 4'] are considered strings. (you may have to convert them to strings, if they are not already)

First, find 'X' in the df columns, and set df['criteria'] to 'X' if it does.

Second, convert the strings to numbers and add the numbers, and then do the comparison with 4.

Use this:

# make `np.array` of the concerned columns
slice = np.array((
    df['Column 1'], df['Column 2'], df['Column 3'], df['Column 4']
))

# `slice` is a transpose of `df`, so transpose again to make it same
slice = np.transpose(slice)

for i in range(len(slice)):
    row = slice[i]
    if np.array(np.where(row == 'X')).size > 0:
        df['criteria'][i] = 'X'
    else:
        row = row.astype('int')
        if sum(row) == 4:
            df['criteria'][i] = 'Fail'
        elif sum(row) < 4:
            df['criteria'][i] = 'Pass'

NOTES:

  • This assumes that a column named 'criteria' already exists and has the appropriate size to store all the values.
  • For more information about the first if statement in the for loop (line 11), see How can I check whether a numpy array is empty or not?
  • This code was tested with the following value of df
    # Note: This `DataFrame` looks different.
    # Note: The columns are horizontal instead of vertical
    df = pd.DataFrame.from_dict(data={
        'Column 1': ['0', '1', 'X', '1'],
        'Column 2': ['1', '1', 'X', '1'],
        'Column 3': ['0', '1', 'X', '0'],
        'Column 4': ['0', '1', 'X', '1'],
        'Expected': ['Pass', 'Fail', 'X', 'Pass'],
        'criteria': ['', '', '', '']
    }, dtype='str')
    
    And the expected result (df['Expected']) matched df['criteria']
sushant_padha
  • 159
  • 1
  • 8