modify column inside a structfield for pyspark

Question

I have a df with schema :

root
 |-- AddressBook: struct (nullable = true)
 |    |-- ContactInformationsList: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- ContactId: string (nullable = true)
 |    |    |    |-- ContactMeansDesc: string (nullable = true)
 |    |    |    |-- IsPrimaryMeans: boolean (nullable = true)
 |    |    |    |-- TypeMeansContactId: string (nullable = true)
 |    |    |    |-- Value: string (nullable = true)
 |    |-- PersonData: struct (nullable = true)
 |    |    |-- BirthDate: string (nullable = true)
 |    |    |-- CSP: string (nullable = true)
 |    |    |-- Civility: string (nullable = true)
 |    |    |-- FirstName: string (nullable = true)
 |    |    |-- Gender: string (nullable = true)
 |    |    |-- LastName: string (nullable = true)
 |    |    |-- MaritalStatus: string (nullable = true)
 |    |    |-- SBirthDate: string (nullable = true)
 |    |    |-- Title: string (nullable = true)
 |-- PublicId: string (nullable = true)
 |-- Version: long (nullable = true)

This dataframe constains prod data, therefore I would like to change some of the personal information. Basically, replace the columns AddressBook.Persondata.Lastname with a hash of the value.

I tried :

df.withColumn(
    'AddressBook.Persondata.Lastname', 
    F.hash(F.col('AddressBook.Persondata.Lastname'))
)

but it just added another column:

|-- AddressBook.Persondata.Lastname: int (nullable = true)

Is there a simple way to modify my data ?

@philantrovert if you can translate the content to python, that'd be helpful. Not everybody speaks scala fluently. — Steven, Apr 05 '18 at 10:11
maybe this post will help you: [How do I add a column to a nested struct in a pyspark dataframe](https://stackoverflow.com/questions/48777993/how-do-i-add-a-column-to-a-nested-struct-in-a-pyspark-dataframe/). — pault, Apr 05 '18 at 14:21

modify column inside a structfield for pyspark

0 Answers0