1

I have a dataframe with new variable name(col2) and old variable name(col1).

enter image description here

I have another dataframe (tf) with columns (named as in col1)

enter image description here

Desired result (tf_new) is to convert the column names into names in col2 (abs->fc_abc)

enter image description here

I have tried to avoid udf uptil now by doing the below:

#converting df into rdd
newrdd = df.rdd
#generating a map
keypair_rdd = newrdd.map(lambda x : (x[1],x[0]))
#creating key value pair
dict = keypair_rdd.collectAsMap()

Need help with using the dict to transform tf into tf_new.

Similar solution in Python would also be of great help.

Abhi
  • 123
  • 1
  • 11

1 Answers1

1
  1. Collect the first dataframe into a Python dictionary
dict = df.agg(F.map_from_arrays(F.collect_list("col1"), 
  F.collect_list("col2"))).first()[0]
  1. Create a list of all columns of tf and rename those columns that are contained in dict
renamed_cols = [F.col(c).alias(dict[c]) if c in dict 
  else F.col(c) for c in tf.columns]
  1. Use the renamed columns to the select the data
tf.select(renamed_cols).show()
werner
  • 13,518
  • 6
  • 30
  • 45