I have a PySpark DataFrame with 11 million records. I created the DataFrame on a cluster. It is not saved on DBFS or storage account.
import pyspark.sql.functions as F
from pyspark.sql.functions import col, when, floor, expr, hour, minute, to_timestamp, explode, sequence
# Define start and end datetime for intervals
start_datetime = "1990-01-01 00:00:00"
end_datetime = "2099-12-31 23:55:00"
interval_minutes = 5 # minutes
# Create a DataFrame with sequence of intervals
df = spark.range(0, 1).select(
expr(f"sequence(to_timestamp('{start_datetime}'), to_timestamp('{end_datetime}'), interval {interval_minutes} minutes)").alias("time")
)
# Explode the sequence column to create individual intervals
df = df.select(explode(col("time")).alias("time"))
# Create a DataFrame with sequence of intervals
df = spark.range(0, 1).select(
expr(f"sequence(to_timestamp('{start_datetime}'), to_timestamp('{end_datetime}'), interval {interval_minutes} minutes)").alias("time")
)
# Explode the sequence column to create individual intervals
df = df.select(explode(col("time")).alias("time"))
# add a new column with interval_period
df = df.withColumn('Interval_Period', interval_period)
df = df.withColumn('Interval_Date', F.to_date(F.col("time")))
I want to write in on Storage Account as a Delta table by:
df.write \
.format("delta") \
.mode("overwrite") \
.option("path", "table_path") \
.saveAsTable("my_table") # External table
And I get the error:
AnalysisException: Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema.
Please upgrade your Delta table to reader version 2 and writer version 5
and change the column mapping mode to 'name' mapping. You can use the following command:
ALTER TABLE SET TBLPROPERTIES (
'delta.columnMapping.mode' = 'name',
'delta.minReaderVersion' = '2',
'delta.minWriterVersion' = '5')
Refer to table versioning at https://docs.microsoft.com/azure/databricks/delta/versioning
My question is how I can set those properties. Not that I don't have a delta table. What I have is a DataFrame created on a cluster.