i am getting nan values while do irritation to fill data in a data frame from another data frame

Question

i have a data frame df1 in which i have zero values....df1 another data frame df1 which is group by of df1 on time basisdf2 groupby. when i am trying to fill zero values of df1 by values from df1 it gives my NaN...final dataframe.

i am using append code...

for x in df2['time']:
    
   df1.loc[(df1['i1'] == 0) & (df1['time']== x),'i1'] = df2[df2['time']==x]['i1']

Hi and welcome to Stack Overflow! Please read through [this article](https://stackoverflow.com/help/how-to-ask) on how to ask and format your question. For questions on `pandas`, please do not provide example dataframes in pictures, but in reproducible format. If possible, add a start dataframe, and a result dataframe and explain the transformation you are trying to do. This way, people can easily help you out by copying working code. Example of a dataframe: `df = pd.DataFrame({'column1': ['val1', 'val2'], 'column2': ['val3', 'val4']})`. Thank you! — JarroVGIT, Feb 11 '23 at 16:59
Also see [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391). — AlexK, Feb 12 '23 at 03:35

marwan · Accepted Answer · 2023-02-11T17:05:44.003

This

df1.loc[(df1['i1'] == 0) & (df1['time']== x),'i1'] = df2[df2['time']==x]['i1']

is returning NaNs because the indices between df2 and df1 don't align.

Advise on asking technical questions - don't provide screenshots, but instead provide the code to build df1 and df2. It is much easier for someone who is trying to help you to reproduce your issue

That being said here is my best attempt at an answer for you

In [2]: df1 = pd.DataFrame({
   ...:     "time": [
   ...:         pd.Timestamp("2018-01-01 00:00:00"),
   ...:         pd.Timestamp("2018-01-01 00:00:00"),
   ...:         pd.Timestamp("2018-01-01 00:00:00"),
   ...:         pd.Timestamp("2010-01-01 00:00:10"),
   ...:         pd.Timestamp("2010-01-01 00:00:10"),
   ...:         pd.Timestamp("2010-01-01 00:00:10"),
   ...:     ],
   ...:     "indicator": [
   ...:         0, 1, 2, 0, 1, 2    ]
   ...: })

In [3]: df1
Out[3]: 
                 time  indicator
0 2018-01-01 00:00:00          0
1 2018-01-01 00:00:00          1
2 2018-01-01 00:00:00          2
3 2010-01-01 00:00:10          0
4 2010-01-01 00:00:10          1
5 2010-01-01 00:00:10          2

In [4]: df2 = df1.groupby("time").mean().reset_index()

In [5]: df2
Out[5]: 
                 time  indicator
0 2010-01-01 00:00:10        1.0
1 2018-01-01 00:00:00        1.0

In [6]: out = df1.merge(df2, on="time", suffixes=("_df1", "_df2")) # we merge to align the indices

In [7]: out
Out[7]: 
                 time  indicator_df1  indicator_df2
0 2018-01-01 00:00:00              0            1.0
1 2018-01-01 00:00:00              1            1.0
2 2018-01-01 00:00:00              2            1.0
3 2010-01-01 00:00:10              0            1.0
4 2010-01-01 00:00:10              1            1.0
5 2010-01-01 00:00:10              2            1.0

In [8]: out["indicator"] = out["indicator_df1"]

In [9]: mask = out["indicator_df1"] == 0

In [10]: out.loc[mask, "indicator"] = out.loc[mask, "indicator_df2"]

In [11]: out
Out[11]: 
                 time  indicator_df1  indicator_df2  indicator
0 2018-01-01 00:00:00              0            1.0          1
1 2018-01-01 00:00:00              1            1.0          1
2 2018-01-01 00:00:00              2            1.0          2
3 2010-01-01 00:00:10              0            1.0          1
4 2010-01-01 00:00:10              1            1.0          1
5 2010-01-01 00:00:10              2            1.0          2

What the above code does is it merges the source data with the data you want to impute and then performs the correction using a boolean mask. This will give you the correct answer and is considerably faster than running a for loop.

Note that this can be further simplified by relying on groupby.transform to avoid creating two dataframes and merging...

hi .. thanks for your help... mine data is so big... i have 8 indicator with 35lac rows approx. row... while i am using your code its working fine with first 4 indicators .... but when using with 5 or onwards its replace all values of indicators with the mean value & on some places still 0 value available. — user21193976, Feb 14 '23 at 10:10

i am getting nan values while do irritation to fill data in a data frame from another data frame

1 Answers1