Is there any way I can retain spaces in redshift while writing from aws glue

Question

I am trying to store space in a varchar column in redshift.

My data comes in a csv format and looks like this,

"id","first_name","last_name","doj","address"
"A1111","B1111","C1111","D111","E111"
"A2222","B22222",""," ","E22"
"A3333","  ","C333","D333","E3333"

I store it in s3 and later read it via spark in a glue job and write to redshift as it is.

My code

from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
from awsglue.job import Job
from pyspark.sql.types import StringType
import pyspark.sql.functions as f

sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)


df = spark.read\
    .format("csv")\
    .option("header", True)\
    .load("s3://bucket/file.csv")

df.show()

redshift_conn = glueContext.extract_jdbc_conf('...')
redshift_connection_options = {
        "url": redshift_conn['url'],
        "user": redshift_conn['user'],
        "password": redshift_conn['password'],
        "database": "...",
        "dbtable": "...",
        "redshiftTmpDir": "..."
    }


finalDF = DynamicFrame.fromDF(df, glueContext, "finalDF")


glueContext.write_dynamic_frame_from_jdbc_conf(
    frame=finalDF, catalog_connection="...", connection_options=redshift_connection_options)


job.commit()

When I try to run the above CSV through the script, I get the spaces trimmed in there.

I have tried to run the script in glue 3.0 and 4.0, the result is the same.

Are there any solutions to this case?

I have no issues while reading, it is while writing to redshift where my space is getting trimmed — Tushar Patil, Dec 22 '22 at 05:40

Is there any way I can retain spaces in redshift while writing from aws glue

0 Answers0