5

I have written code in Python using Pandas that adds "VEN_" to the beginning of the column names:

Tablon.columns = "VEN_" + Tablon.columns

And it works fine, but now I'm working with PySpark and it doesn't work. I've tried:

Vaa_total.columns = ['Vaa_' + col for col in Vaa_total.columns]

or

for elemento in Vaa_total.columns:
    elemento = "Vaa_" + elemento

And other things like that but it doesn't work.

I don't want to replace the columns name, I just want to mantain it but adding a string to the beginning.

ITo
  • 55
  • 1
  • 6
  • Possible duplicate of [How to change dataframe column names in pyspark?](https://stackoverflow.com/questions/34077353/how-to-change-dataframe-column-names-in-pyspark) – vvg Jul 17 '18 at 08:40
  • I don't think so, there is explained how to replace it but I don't know how I can to add a string to my columns name, I get: AttributeError: can't set attribute. – ITo Jul 17 '18 at 08:46
  • look into option 2 or 3. It's exactly what you need. – vvg Jul 17 '18 at 08:51
  • yes, you are right! – ITo Jul 17 '18 at 08:58

3 Answers3

4

Try something like this:

for elemento in Vaa_total.columns:
    Vaa_total =Vaa_total.withColumnRenamed(elemento, "Vaa_" + elemento)
ags29
  • 2,621
  • 1
  • 8
  • 14
0

I linked similar topic in comment. Here's example adapted from that topic to your task:

dataframe.select([col(col_name).alias('VAA_' + col_name) for col_name in dataframe])
vvg
  • 6,325
  • 19
  • 36
0

Standard format of writing it:

renamed_df = df.withColumnRenamed(col_name, "insert_text" + col_name) for col_name in dataframe.columns])

sargupta
  • 953
  • 13
  • 25