0

This is such a simple thing yet I'm spending hours trying to figure out, when I could easily do this in SQL or Tableau.

So imagine I have an ID column and a Item1 and Item2 column. In SQL, I'd write:

CASE
WHEN Item1 IS NULL AND Item2 IS NULL
THEN '0'
WHEN Item1 IS NOT NULL AND ITEM2 IS NOT NULL
THEN '2'
WHEN Item1 IS NULL AND Item2 is NOT NULL
THEN '1'
WHEN Item1 IS NOT NULL AND Item2 IS NULL
THEN '1'
END

Any ideas how I can replicate this in pandas? To clarify, this has to be a NEW column with values either being 0, 1 or 2.

wtt85
  • 19
  • 3
  • check out [this answer](https://stackoverflow.com/a/18194448/5992438) which covers a similar case – bunji Dec 06 '17 at 01:00

2 Answers2

2

It sounds like you just want the count of non-null entries in each row. So something along the lines of

df[['Item 1', 'Item 2']].notnull().sum(axis=1)

should work: you just compute whether each element is null and sum up by row. You may have to use something like np.isnan() instead of isnotnull() if you want to detect NaN numerical values instead.

You can then assign this result to a new column in the DataFrame in the usual way.

David Z
  • 128,184
  • 27
  • 255
  • 279
  • Thanks for the response! However, I do need more than the count of non-null. I actually need a new column in the dataset to be either 0, 1, or 2 based on the conditions above! – wtt85 Dec 06 '17 at 01:04
  • 1
    Sure, you can just assign the result of this to a new column in the usual way. – David Z Dec 06 '17 at 01:12
0

Save new data in a existing column name 'item3':

for index, row in df.iterrows():

    if row['item1'] is None and row['item2'] is None:
        df.set_value(index, 'item3', 0)
    elif row['item1'] is None and row['item2'] is not None:
        df.set_value(index, 'item3', 1)
    elif row['item1'] is not None and row['item2'] is None :
        df.set_value(index, 'item3', 2)
    elif row['item1'] is not None and row['item2'] is not None:
        df.set_value(index, 'item3', 1)

More compact:

for index, row in df.iterrows():

if row['item1'] is None and row['item2'] is None:
    df.set_value(index, 'item3', 0)
elif row['item2'] is not None:
    df.set_value(index, 'item3', 1)
else:
    df.set_value(index, 'item3', 2)
Manuel
  • 698
  • 4
  • 8