1

So I have this code p.s. (sorry, cannot provide the dataframe due to confidentiality reasons) but maybe I'm missing something here

new_df = None
new_fn = None
prev_df = None
prev_fn = None
while 1:
    msg = conn.recv()
    if len(msg) > 1:
        df = msg[0]
        file_name = msg[1]

        df['col2'] = ''
        df['col2'] = df['col2'].apply(pd.to_numeric).astype('Int64')

        if new_fn is None:
            new_df = df
            new_fn = file_name
            new_df['col2'] = new_df['col1']
        else:
            prev_df = new_df
            prev_fn = new_fn
            new_df = df
            new_fn = file_name

            new_df = prev_df.merge(new_df, on='main', how='outer', suffixes=('_prev', '_new'))
            new_df = new_df.assign(**{col: new_df[col].fillna(new_df[col.replace("_new", "_prev")])
                                      for col in new_df.columns if "_new" in col})

Than the code gets to this block below, which works on random dataframes Iv'e tested with the same characteristics, but not when bound with the code above

np.where(new_df['col2_new'].isna(),
                     new_df['col2_new'].fillna(new_df['col1_new']), new_df['col2_new'])

For some reason the fillna doesnt work and leaves col2_new with many NA values

print(new_df.isna().sum())                        print(new_df.dtypes)

main                            0          main                          object
col1_prev                     158          col1_prev                     Int64
col2_prev                     158          col2_prev                     Int64
col1_new                        0          col1_new                      Int64
col2_new                      158          col2_new                      Int64

dtype: int64                               dtype: object

Iv'e also experienced some issues with isna/isnull which seems to be the problem:

df = pd.DataFrame({'col1': str(randint(10, 100)), 'col2': randint(10, 100), 'col3': ""}, index=range(0, 3))
np.where(df['col3'].isna, df['col3'].fillna(df['col1']), df['col3'])

It was giving a correct output until just recently, but now it feels like something has broken:

print(df.count())            print(df.isna().sum())         print(df)

col1    3                    col1    0                        col1  col2 col3
col2    3                    col2    0                      0   33    38
col3    3                    col3    0                      1   33    38
dtype: int64                 dtype: int64                   2   33    38

Is it just me? am I doing something wrong? Is it the interpreter?

I appreciate any help, Thanks!

Tomer Poliakov
  • 349
  • 1
  • 3
  • 12

2 Answers2

1

np.where() doesn't change anything in-place. The result needs to be assigned back to new_df['col2_new']:

new_df['col2_new'] = np.where(
    new_df['col2_new'].isna(),
    new_df['col2_new'].fillna(new_df['col1_new']),
    new_df['col2_new'])

Also I believe you can simplify it to just use fillna() alone:

new_df['col2_new'] = new_df['col2_new'].fillna(new_df['col1_new'])
tdy
  • 36,675
  • 19
  • 86
  • 83
0

You can also use inplace=True to change the data in place

new_df['col2_new'].fillna(new_df['col1_new'], inplace=True)
Sahit
  • 53
  • 1
  • 4
  • note that `inplace` is considered harmful and will be deprecated in the near future: https://stackoverflow.com/a/60020384/13138364 – tdy Aug 03 '21 at 09:06