1

I'm working with a string column which is 38 characters long and is actually numerical.

for e.g. id = '678868938393937838947477478778877.....' ( 38 characters long).

How do I cast it into a long integer ? I have tried cast function with IntegerType, LongType and DoubleType and when i try to show the column it yields Nulls.

The reason I want to do this is because I need to do some inner joins using this column and doing it as String is giving me Java Heap Space Errors.

Any suggestions on how to cast it as a Long Integer ? { This question tries to cast a string into a Long Integer }

ML_Passion
  • 1,031
  • 3
  • 15
  • 33

1 Answers1

3

Long story short you simply don't. Spark DataFrame is a JVM object which uses following types mapping:

  • IntegerType -> Integer with MAX_VALUE equal 2 ** 31 - 1
  • LongType -> Long with MaxValue equal 2 ** 63 - 1

You could try to use DecimalType with maximum allowed precission (38).

df = sc.parallelize([("9" * 38, "9" * 39)]).toDF(["x", "y"])
df.select(col("x").cast("decimal(38, 0)")).show(1, False)

## +--------------------------------------+
## |x                                     |
## +--------------------------------------+
## |99999999999999999999999999999999999999|
## +---------------------------------------

With larger numbers you can cast to double but not without a loss of precision:

df.select(
    col("y").cast("decimal(38, 0)"), col("y").cast("double")).show(1, False)

## +----+------+
## |y   |y     |
## +----+------+
## |null|1.0E39|
## +----+------+

That being said casting to numeric types won't help you with memory errors.

zero323
  • 322,348
  • 103
  • 959
  • 935
  • The method works for me. @zero323 , I'll look at your last suggestion that casting to numeric types won't help with memory errors. – ML_Passion Aug 16 '16 at 17:52