I'm reading a sql table in a notebook on Synapse and loading it in a pyspark dataframe:
df = spark.read.synapsesql("dbtablename")
Unfortunately some columns have a space in their name e.g.: Job Title.
I tried different methods to change the name of the columns and remove the space. I can see the new dataframes with the fixed column name but when I try to convert to Polars or Pandas dataframe the operation fails and the error message refers back to the original name of the column with the space in the first dataframe.
I tried with:
- Select Col Alias
- Select Expr
- WithColumnRenamed
- I tried
sqlContext.registerDataFrameAsTable(df, "tablename")
and thendf_new = sqlContext.sql("select JobTitle from table")
Thank you for your kind help.
df = spark.read.synapsesql("db.tablename") df_nospace = df.withColumnRenamed("Job Title", "JobTitle") pandas_df = df_nospace.toPandas() This last command results in an error referring the "Job Title" column, with the space. When I print the schema of df_nospace, "JobTitle" is the column name. To note is that if I simply use: df.withColumnRenamed("Job Title", "JobTitle"), the schema will still show "Job Title". I tried to delete and unpersist the first dataframe too but with no success. Thank you. – datatalian Mar 27 '23 at 14:28