I have a df with schema :
root
|-- AddressBook: struct (nullable = true)
| |-- ContactInformationsList: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- ContactId: string (nullable = true)
| | | |-- ContactMeansDesc: string (nullable = true)
| | | |-- IsPrimaryMeans: boolean (nullable = true)
| | | |-- TypeMeansContactId: string (nullable = true)
| | | |-- Value: string (nullable = true)
| |-- PersonData: struct (nullable = true)
| | |-- BirthDate: string (nullable = true)
| | |-- CSP: string (nullable = true)
| | |-- Civility: string (nullable = true)
| | |-- FirstName: string (nullable = true)
| | |-- Gender: string (nullable = true)
| | |-- LastName: string (nullable = true)
| | |-- MaritalStatus: string (nullable = true)
| | |-- SBirthDate: string (nullable = true)
| | |-- Title: string (nullable = true)
|-- PublicId: string (nullable = true)
|-- Version: long (nullable = true)
This dataframe constains prod data, therefore I would like to change some of the personal information. Basically, replace the columns AddressBook.Persondata.Lastname
with a hash of the value.
I tried :
df.withColumn(
'AddressBook.Persondata.Lastname',
F.hash(F.col('AddressBook.Persondata.Lastname'))
)
but it just added another column:
|-- AddressBook.Persondata.Lastname: int (nullable = true)
Is there a simple way to modify my data ?