-1

I am getting error while renaming a column, is there anyway i can rename it, as there are space in column name

df=df.withColumnRenamed("std deviation","stdDeviation")

Error:AnalysisException: Attribute name "std deviation" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.

I tried another way by using alias, but no success.

df=df.select(col("std deviation").alias("stdDeviation"))

is there a way I can rename columns that contain space?

Error:AnalysisException: Attribute name "std deviation" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.

  • You should use backticks ` https://stackoverflow.com/questions/33053095/how-to-express-a-column-which-name-contains-spaces-in-spark-sql – Cronenberg May 20 '21 at 13:24
  • hi @Cronenberg, I am reading a parquet file and i just want to rename the column. Right now, with the logic in the link, i am getting error that df not found. df = spark.read.option("header", "true").parquet(source_file_path) sqlContext.sql("""SELECT \`Standard deviation\` FROM df""").show error: df table not found – pankaj sharma May 20 '21 at 14:14

3 Answers3

1

It is weird that you are using

df = spark.read.option("header", "true").parquet(source_file_path)

You don't need the header option when reading parquet files.

Besides that, you should use

df = df.withColumnRenamed("`old name`", "new_name").
Cronenberg
  • 21
  • 3
0

Yes its possible.

>>> input_df.show()
+-----+
|value|
+-----+
|    1|
|    2|
|    3|
+-----+

>>> input_df = input_df.withColumnRenamed("value", "test value")
>>> input_df.show()
+----------+
|test value|
+----------+
|         1|
|         2|
|         3|
+----------+

# The other way round #
>>> input_df = input_df.withColumnRenamed("test value", "value")
>>> input_df.show()
+-----+
|value|
+-----+
|    1|
|    2|
|    3|
+-----+
Hegde
  • 234
  • 2
  • 3
  • thanks, but it is not working for parquet files. Have you tried to rename parquet file containing columns with white space? – pankaj sharma May 21 '21 at 08:31
0

No, according to the source code, there is no way to do that with Spark. One alternative way to go is using pandas.read_parquet, it should work. However, I'm not sure how big your file is and if your local computer (or cluster driver) can handle it.

pltc
  • 5,836
  • 1
  • 13
  • 31