I am trying to save a new table from a csv. Unfortunately the way the csv is read and saved, all column types are string. The dataset contains other types and I want to specify the types when creating the table.
I have found a solution to alter the column types after creating the table, but it doesnt seem practical.
This is how I create the table:
from pyspark.sql import DataFrame
import_path = f"{st_raw}/data.csv"
sparkDF = spark.read.csv(import_path, header=True)
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {catalog}.{schema}")
tablename = f"{catalog}.{schema}.{table}"
sparkDF.write.saveAsTable(tablename)
assert spark.table(tablename).count() > 0
display(spark.table(tablename))
Printing the schema shows, that all columns are of type string:
|-- Date: string (nullable = true)
|-- Location: string (nullable = true)
|-- Country: string (nullable = true)
|-- Temperature: string (nullable = true)
|-- CO2 Emissions: string (nullable = true)
|-- Sea Level Rise: string (nullable = true)
|-- Precipitation: string (nullable = true)
|-- Humidity: string (nullable = true)
|-- Wind Speed: string (nullable = true)
I need to specifiy the correct types. How can I accomplish that?