0

I have two data frames.

All the NaN of df1 have to fill by df2 by matching 'Group'. Can repeat matching and filling by 'Repeat' times.

df1 df2

df1 = pd.DataFrame({'Group': ['xx', 'yy', 'zz', 'x', 'x', 'x','z','y','y','y','y'], 'Name': ['A', 'B', 'C', None, None, None, None, None, None, None, None], 'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]})

df2 = pd.DataFrame({'Name': ['A', 'A', 'B', 'B'], 'Group': ['x', 'y', 'z', 'y'], 'Repeat': [3, 2, 1, 2]})

The final data frame df will look like-

df

Also looking for the fastest run time.

I did this

for index2, row2 in df2.iterrows():
    for i in range(0,row2[2]):
        for index1 in df1.index:
            if df1.iloc[index1, 1] == row2[1] and df1.iloc[ndex1, 1] == 'NaN':
                df1.iloc[ndex1, 1] = row2[0]
                break

Looking for faster simpler solution.

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • [Do not include picture of data **only**](https://meta.stackoverflow.com/questions/285551/why-should-i-not-upload-images-of-code-data-errors) – Quang Hoang Aug 16 '23 at 00:04
  • Does this answer your question? [Filling DF's NaN/Missing data from another DF](https://stackoverflow.com/questions/76904330/filling-dfs-nan-missing-data-from-another-df) – Bracula Aug 31 '23 at 16:46

1 Answers1

1

You can enumerate the missing part and the new (repeated) data by the Group, then merge/fillna the data on Group and the enumeration:

# enumerate the replacement data
df2 = (df2.loc[df2.index.repeat(df2['Repeat'])]
          .assign(enum=lambda x: x.groupby(['Group']).cumcount()) # only groupby Group here
      )

merge = (df1.loc[df1['Name'].isna(), ['Group']]                   # filter only NaN part
            .assign(enum=lambda x: x.groupby('Group').cumcount()) # enumerate the missing data   
            .merge(df2, on=['Group','enum'], how='left')          # merge
        )

df1.loc[df1['Name'].isna(), 'Name'] = merge['Name'].to_numpy()

Output:

   Group Name  Value
0     xx    A      1
1     yy    B      2
2     zz    C      3
3      x    A      4
4      x    A      5
5      x    A      6
6      z    B      7
7      y    A      8
8      y    A      9
9      y    B     10
10     y    B     11
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • the cumcount() seems not working. All values showing 0. – parvez alam Aug 16 '23 at 06:12
  • If anyone need it: df1 = pd.DataFrame({'Group': ['xx', 'yy', 'zz', 'x', 'x', 'x','z','y','y','y','y'], 'Name': ['A', 'B', 'C', None, None, None, None, None, None, None, None], 'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]}) df2 = pd.DataFrame({'Name': ['A', 'A', 'B', 'B'], 'Group': ['x', 'y', 'z', 'y'], 'Repeat': [3, 2, 1, 2]}) – parvez alam Aug 16 '23 at 06:18
  • The cumcount() seems not working on python 3.7.9 . Sorry can't accept your code. – parvez alam Aug 18 '23 at 01:04
  • @parvezalam see updated answer. I correct the bug and it ran as expected on the sample data. – Quang Hoang Aug 18 '23 at 02:24
  • Can you update your code for filling multiple columns info. Like, if we add price column on df1 and df2. df1 = pd.DataFrame({'code': [1001, 1002, 1003, 1001, 1004, 1004,1006,1005,1005,1005,1006,1006], 'Name': ['A', 'B', 'C', None, None, None, None, None, None, None, None, None], 'Price': [1.1, 2.2, 3.3, None, None, None, None, None, None, None, None, None], 'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12]}) df2 = pd.DataFrame({'Name': ['A', 'A', 'B', 'B', 'A'], 'code': [1004, 1005, 1006, 1005, 1001], 'Price': [1.1, 1.1, 2.2, 2.2, 1.1], 'Repeat': [2, 2, 1, 2,1]}) – parvez alam Aug 21 '23 at 03:37