I'm dealing with deeply nested json
data
. My goal is to flatten the data. I know I can do this by using the following notation in the case when the nested column I want is called attributes.id
, where id
is nested in the attributes
column:
df = df.select('attributes.id')
The problem is that there is already a column in df
called id
and since spark only keeps the last part after .
as the column name, I now have duplicated column names. What is the best way of dealing with this? Ideally the new column will be called attributes_id
as to differentiate it from the id
column.