0

I am trying to split a dataframe using json_normalize and pd.concat

df = pd.DataFrame({
    'ROW1': ['TC', 'OD', 'GN', 'OLT'],
    'D2': [1680880134, 4, 0, [{'ID': '5771841270', 'SLX': [{'T1': '1', 'T2': '1729494797', 
                                                                   },
                                                                     {'T1': '1', 'T2': '1729445', 
                                                                      }]}]]
})

print(df)


df_transposed = df.set_index('ROW1').transpose()


df_flattened = pd.json_normalize(df_transposed['OLT'][0], 'SLX', ['ID'])
final_df = pd.concat([df_transposed.drop('OLT', axis=1), df_flattened], axis=1)

print(final_df)

I am getting all Nans here

            TC   OD   GN   T1          T2          ID
D2  1680880134    4    0  NaN         NaN         NaN
0          NaN  NaN  NaN    1  1729494797  5771841270
1          NaN  NaN  NaN    1     1729445  5771841270

Expected output

TC   OD   GN   T1          T2          ID
D2  1680880134    4    0  1  1729494797  5771841270
D2  1680880134    4    0  1     1729445  5771841270
snn
  • 45
  • 3

1 Answers1

0

You are concatenating two dataframes of different length. You can first duplicate df_transposed, then concatenate them as follows:

df_transposed = df_transposed.loc[np.repeat(df_transposed.index, 
                                            len(df_flattened))].reset_index(drop=True)
final_df = pd.concat([ df_transposed.drop('OLT', axis=1),   df_flattened], axis=1)

display(final_df)

Code of row duplication was taken from How can I replicate rows of a Pandas DataFrame?

TanjiroLL
  • 1,354
  • 1
  • 5
  • 5