1

I have a df with schema :

root
 |-- AddressBook: struct (nullable = true)
 |    |-- ContactInformationsList: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- ContactId: string (nullable = true)
 |    |    |    |-- ContactMeansDesc: string (nullable = true)
 |    |    |    |-- IsPrimaryMeans: boolean (nullable = true)
 |    |    |    |-- TypeMeansContactId: string (nullable = true)
 |    |    |    |-- Value: string (nullable = true)
 |    |-- PersonData: struct (nullable = true)
 |    |    |-- BirthDate: string (nullable = true)
 |    |    |-- CSP: string (nullable = true)
 |    |    |-- Civility: string (nullable = true)
 |    |    |-- FirstName: string (nullable = true)
 |    |    |-- Gender: string (nullable = true)
 |    |    |-- LastName: string (nullable = true)
 |    |    |-- MaritalStatus: string (nullable = true)
 |    |    |-- SBirthDate: string (nullable = true)
 |    |    |-- Title: string (nullable = true)
 |-- PublicId: string (nullable = true)
 |-- Version: long (nullable = true)

This dataframe constains prod data, therefore I would like to change some of the personal information. Basically, replace the columns AddressBook.Persondata.Lastname with a hash of the value.

I tried :

df.withColumn(
    'AddressBook.Persondata.Lastname', 
    F.hash(F.col('AddressBook.Persondata.Lastname'))
)

but it just added another column:

|-- AddressBook.Persondata.Lastname: int (nullable = true)

Is there a simple way to modify my data ?

Steven
  • 14,048
  • 6
  • 38
  • 73
  • @philantrovert if you can translate the content to python, that'd be helpful. Not everybody speaks scala fluently. – Steven Apr 05 '18 at 10:11
  • maybe this post will help you: [How do I add a column to a nested struct in a pyspark dataframe](https://stackoverflow.com/questions/48777993/how-do-i-add-a-column-to-a-nested-struct-in-a-pyspark-dataframe/). – pault Apr 05 '18 at 14:21
  • no one knows the answser ? – bicepjai May 20 '21 at 03:13

0 Answers0