0

I am new to Spark, I have a requirement to parse the input file from mainframe and store it in HDFS.
My input mainframe file is being convert to ASCII format. During conversion EBCDIC signed fields are converted into characters.

Input file Data ASCII format: TR000000282811111}

Code to read the file:

df = sqlContext.read.text("<file location>")

parexp1 = df.select(
    df.value.substr(1,2).alias('RECORD_TYPE'),
    df.value.substr(3,10).cast('integer').alias('RECORD_COUNT'),
    df.value.substr(13,6).alias('TOTAL_NET_AMOUNT_DUE')
)

Amount field value 11111} is nothing but +111110. So, for every amount field, I need to read the last character of that field, in this case }. Based on the last character, I need to do some conversion.

I was able to get the last character by using the below syntax

lent = parexp1.select (substring(col('TOTAL_NET_AMOUNT_DUE'),-1,1))

But, I am not sure how to do the conversion and add it back to DF. Value 11111} has to be converted into 111110 and store back to the data frame.

As first step, I tried to store the last character i.e } to DF using the below syntax.

Test_udf = udf(lambda TLR_TOTAL_NET_AMOUNT_DUE: parexp1.select (substring(col('TOTAL_NET_AMOUNT_DUE'),-1,1)), StringType())

parexp1.withColumn("Test1", Test_udf(parexp1.TLR_TOTAL_NET_AMOUNT_DUE))

Above syntax gave me Method __getnewargs__([]) does not exist exception.

I tried other methods, but end up in one error or another.
My goal is to convert the data and add it back to DF.

Can you please help me how to do this in PYSPARK? I really appreciate your help.

tzrm
  • 513
  • 1
  • 8
  • 14
Ron
  • 15
  • 5
  • It's unclear to me what you're asking. Please read [ask] and try to provide an [mcve]. Also, take a look at this post on [how to create good reproducible apache spark dataframe examples](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-dataframe-examples). – pault Feb 26 '18 at 14:45
  • @Pault thanks, i have added more details to the issue. Please have a look. – Ron Feb 28 '18 at 14:42

0 Answers0