0

I have a PySpark DataFrame with 11 million records. I created the DataFrame on a cluster. It is not saved on DBFS or storage account.

import pyspark.sql.functions as F
from pyspark.sql.functions import col, when, floor, expr, hour, minute, to_timestamp, explode, sequence  
 
 
# Define start and end datetime for intervals
start_datetime = "1990-01-01 00:00:00"
end_datetime = "2099-12-31 23:55:00"
interval_minutes = 5   # minutes
 
 
# Create a DataFrame with sequence of intervals
df = spark.range(0, 1).select(
    expr(f"sequence(to_timestamp('{start_datetime}'), to_timestamp('{end_datetime}'), interval {interval_minutes} minutes)").alias("time")
)
 
 
# Explode the sequence column to create individual intervals
df = df.select(explode(col("time")).alias("time"))
 
 
# Create a DataFrame with sequence of intervals
df = spark.range(0, 1).select(
    expr(f"sequence(to_timestamp('{start_datetime}'), to_timestamp('{end_datetime}'), interval {interval_minutes} minutes)").alias("time")
)
 
 
# Explode the sequence column to create individual intervals
df = df.select(explode(col("time")).alias("time"))
 
 
# add a new column with interval_period
df = df.withColumn('Interval_Period', interval_period)
df = df.withColumn('Interval_Date', F.to_date(F.col("time")))
 

I want to write in on Storage Account as a Delta table by:

df.write \
 
 .format("delta") \
 
 .mode("overwrite") \
 
 .option("path", "table_path") \
 
 .saveAsTable("my_table")  # External table

And I get the error:

AnalysisException: Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema. 
Please upgrade your Delta table to reader version 2 and writer version 5
and change the column mapping mode to 'name' mapping. You can use the following command:
 
ALTER TABLE  SET TBLPROPERTIES (
   'delta.columnMapping.mode' = 'name',
   'delta.minReaderVersion' = '2',
   'delta.minWriterVersion' = '5')
Refer to table versioning at https://docs.microsoft.com/azure/databricks/delta/versioning

My question is how I can set those properties. Not that I don't have a delta table. What I have is a DataFrame created on a cluster.

Mohammad
  • 775
  • 1
  • 14
  • 37
  • Does this [SO](https://stackoverflow.com/questions/72643210/databricks-ingesting-csv-data-to-a-delta-live-table-in-python-triggers-invalid) thread answer your question? – Dipanjan Mallick Mar 16 '23 at 12:56

0 Answers0