is there a way we can rename or alias column with blank space in pyspark

Question

I am getting error while renaming a column, is there anyway i can rename it, as there are space in column name

df=df.withColumnRenamed("std deviation","stdDeviation")

Error:AnalysisException: Attribute name "std deviation" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.

I tried another way by using alias, but no success.

df=df.select(col("std deviation").alias("stdDeviation"))

is there a way I can rename columns that contain space?

Error:AnalysisException: Attribute name "std deviation" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.

You should use backticks ` https://stackoverflow.com/questions/33053095/how-to-express-a-column-which-name-contains-spaces-in-spark-sql — Cronenberg, May 20 '21 at 13:24
hi @Cronenberg, I am reading a parquet file and i just want to rename the column. Right now, with the logic in the link, i am getting error that df not found. df = spark.read.option("header", "true").parquet(source_file_path) sqlContext.sql("""SELECT \`Standard deviation\` FROM df""").show error: df table not found — pankaj sharma, May 20 '21 at 14:14

score 1 · Answer 1 · answered May 21 '21 at 15:44

1

It is weird that you are using

df = spark.read.option("header", "true").parquet(source_file_path)

You don't need the header option when reading parquet files.

Besides that, you should use

df = df.withColumnRenamed("`old name`", "new_name").

answered May 21 '21 at 15:44

Cronenberg

21
3

score 0 · Answer 2 · answered May 21 '21 at 05:53

0

Yes its possible.

>>> input_df.show()
+-----+
|value|
+-----+
|    1|
|    2|
|    3|
+-----+

>>> input_df = input_df.withColumnRenamed("value", "test value")
>>> input_df.show()
+----------+
|test value|
+----------+
|         1|
|         2|
|         3|
+----------+

# The other way round #
>>> input_df = input_df.withColumnRenamed("test value", "value")
>>> input_df.show()
+-----+
|value|
+-----+
|    1|
|    2|
|    3|
+-----+

answered May 21 '21 at 05:53

Hegde

234
2
3

thanks, but it is not working for parquet files. Have you tried to rename parquet file containing columns with white space? – pankaj sharma May 21 '21 at 08:31

score 0 · Answer 3 · answered May 21 '21 at 19:28

0

No, according to the source code, there is no way to do that with Spark. One alternative way to go is using pandas.read_parquet, it should work. However, I'm not sure how big your file is and if your local computer (or cluster driver) can handle it.

answered May 21 '21 at 19:28

pltc

5,836
1
13
31

is there a way we can rename or alias column with blank space in pyspark

3 Answers3