0

I have a simple use case of performing join on two dataframes, I am using spark 1.6.3 version. The issue is that while trying to cast the string type to integer type using cast method the resulting column is all null values.

I have already tried all solutions mentioned here How to cast a column in dataframe? but all questions have answers for scala api and I could not find anyone that works with java api.

DataFrame dataFromDB = getDataFromDB("(select * from schema.table where 
col1 is not null)"); //This method uses spark sql 
                    //to connect to a db2 data base and get the data

//I perform the cast operation as
dataFromDB.withColumn("INCOME_DATA", dataFromDB.col("INCOME_DATA")
                                    .cast(DataTypes.IntegerType));
//but the above results in null values
//other things I tried based on the link above is below
dataFromDB.selectExpr(cast("INCOME_DATA" as integer")) //this too produces null values

//I tried to remove the whitespaces from income data column with no success
dataFromDB.select(dataFromDB.col("INCOME_DATA").toString().replaceAll("\\s+", ""); //this does not remove any whitespace

I am unable to find the solution for it, also the column I am trying to convert is of String type and may contain trailing whitespaces, could this be an issue? if yes then how can i remove them, I tried to remove them as below but doesnot seem to work. This is my first ever working with spark dataframe so any help with this is deeply appreciated. Thanks!

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
humblecoder
  • 137
  • 1
  • 7

1 Answers1

0

Can you try something like this for your last line?

import org.apache.spark.sql.functions._
dataFromDB.withColumn("INCOME_DATA", regexp_replace($"INCOME_DATA", "\\s+", "")).select("INCOME_DATA")

In Java:

dataFromDB.withColumn("INCOME_DATA", functions.regexp_replace(functions.col("INCOME_DATA"), "\\s+", "")).select("INCOME_DATA");
mayurc
  • 267
  • 4
  • 13
  • I don't think the solution you provided works for Java. Is it scala solution? I want the java way. Thank you for your reply! – humblecoder Dec 20 '18 at 18:50