0

I am doing an experiment and want to observe the impact of missing values on the query results. I am doing it using Python Pandas. Consider that I have dataframe df. This dataframe is the complete data. My real data consists of many columns and thousands of rows.

I made a copy of df to df_copy. Then I do an experiment using df_copy and df is the ground truth. I put some NaN values on df_copy randomly.

I have some ideas to fix the missing values on df_copy using a heuristic ways. Currently, I can do easily using row operation in pandas. For instance, if I want to fix any rows on df_copy, I just can get the row by the id from df_copy then drop the row and replace from the df.

My question is, how can I do an operation on a cell-based in pandas? For instance, How can I get the index (x,y) from all missing values and when I want to fix a missing cell, I can just replace the value on that cell from the ground truth by calling the index (x,y)

Example:

df

df = pd.DataFrame(np.array([["x", 2, 3], ["y", 5, 6], ["z", 8, 9]]),
               columns=['a', 'b', 'c'])


    a   b   c
0   x   2   3
1   y   5   6
2   z   8   9

df_copy

df_copy = pd.DataFrame(np.array([["x", np.nan, 3], ["y", 5, np.nan], [np.nan, 8, 9]]),
                   columns=['a', 'b', 'c'])


    a   b   c
0   x   nan 3
1   y   5   nan
2   nan 8   9
rischan
  • 1,553
  • 13
  • 19

0 Answers0