0

I am trying to use spark sql from spark 2 on cloudera environment and getting the folowing error:

'pyspark.sql.utils.AnalysisException: u'Cannot up cast other_column_from_table from decimal(32,22) to decimal(30,22) as it may truncate\n;''

We not use this column other_column_from_table that SPARK SQL tries to cast in the select statement, and it is the cause of error. Below is the code:

enter code herespark2-submit /home/adonnert/teste_alexandre.py pyspark --deploy-mode cluster --driver-cores 2 --driver-memory 4G --executor-cores 2 --executor-memory 6G --name --master --conf "spark.sql.parquet.writeLegacyFormat=true"


import sys
from pyspark import SparkConf, SparkContext
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession, DataFrame
from pyspark.sql.functions import *
from pyspark.sql.functions import coalesce
from pyspark.sql.functions import from_unixtime
import time
import traceback
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from datetime import datetime, timedelta, date
from pyspark.sql.types import StructType,StructField, StringType, IntegerType,ArrayType,MapType

spark = SparkSession.builder.appName("PySparkSQL_VRJ_EC_GDC_ALE") \
    .enableHiveSupport()                                                \
    .config('hive.exec.dynamic.partition', 'True')                      \
    .config('hive.exec.dynamic.partition.mode','nonstrict')             \
  .config("spark.debug.maxToStringFields","200") \
  .config("spark.sql.shuffle.partition", "200") \
  .config("spark.sql.inMemoryColumnarStorage.compressed", True) \
  .config("spark.sql.inMemoryColumnarStorage.batchSize",10000) \
  .config("spark.sql.codegen",True) \
    .getOrCreate()

df_lead = spark.sql("""
SELECT 
                    my_id, 
                    value_number
                FROM owner.table 
                WHERE date >= CAST(DATE_FORMAT(ADD_MONTHS(current_timestamp(),-13),'yyyyMM') AS BIGINT)
""").show(10)

Is there a way to deal with it as not allow the spark sql make a cast of column that is not called? It does not even generate the df to use some schema.

0 Answers0