I have a dataframe 'df' with the following schema:
root
|-- batch_key: string (nullable = true)
|-- company_id: integer (nullable = true)
|-- users_info: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- first_name: string (nullable = true)
| | |-- last_name: long (nullable = true)
| | |-- total_amount: double (nullable = true)
The column users_info is an array containing multiple structs.
I would like to change the column names such that 'batch_key' becomes 'batchKey', 'users_info' becomes 'usersInfo', 'first_name' becomes 'firstName' and etc.
I started with this code:
df2 = df
regex = new Regex("_(.)")
for (col <- df.columns) {
df2 = df2.withColumnRenamed(col, regex.replaceAllIn(col, { M => M.group(1).toUpperCase }))
}
But this code will only change the names of columns batch_key, company_id and users_info since for (col <- df.columns)
returns [batch_key, company_id, users_info]
.
The nested columns under users_info are not changed. How can I modify the above code such that I can access the nested columns and change their column names as well?