0

All this is asking me to do is write a code that shows if there are any missing values where it is not the customers first order. I have provided the DataFrame. Should I use column 'Order_number" instead? Is my code wrong?

I named the DataFrame df_orders.

I thought my code would find the columns that have missing values and a greater order number than 1.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 478967 entries, 0 to 478966
Data columns (total 6 columns):
     #   Column                  Non-Null Count   Dtype  
    ---  ------                  --------------   -----  
     0   order_id                478967 non-null  int64  
     1   user_id                 478967 non-null  int64  
     2   order_number            478967 non-null  int64  
     3   order_dow               478967 non-null  int64  
     4   order_hour_of_day       478967 non-null  int64  
     5   days_since_prior_order  450148 non-null  float64
    dtypes: float64(1), int64(5)
    memory usage: 21.9 MB
    None


# Are there any missing values where it's not a customer's first order?
 m_v_fo= df_orders[df_orders['days_since_prior_order'].isna() > 1]
 print(m_v_fo.head())



Empty DataFrame
Columns: [order_id, user_id, order_number, order_dow, order_hour_of_day, 
days_since_prior_order]
Index: []
  • 1
    Hello! Please give more context of the problem, like what you're trying to accomplish, what you've tried so far, error messages, etc – netotz Oct 31 '22 at 17:49
  • 1
    What is `df_orders['days_since_prior_order'].isna() > 1` supposed to do? – wwii Oct 31 '22 at 17:54
  • 1
    Please read [mre]. [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – wwii Oct 31 '22 at 17:55

1 Answers1

0

When you say .isna() you are returning a series of True or False. So that will never be > 1

Instead, try this:

 m_v_fo= df_orders[df_orders['days_since_prior_order'].isna().sum() > 1]

If that doesn't solve the problem, then I'm not sure - try editing your question to add more detail and I can try again. :)

Update: I read your question again, and I think you're doing this out of order. First you need to filter on days_since_prior_order and then look for na.

m_v_fo = df_orders[df_orders['days_since_prior_order'] > 1].isna()
Vincent Rupp
  • 617
  • 5
  • 13