0

i am trying to analyse CSV file using spark Scala but the problem is my CSV file contain column with null value also so while reading data from CSV file i am getting error as java.lang.ArrayIndexOutOfBoundException :12

my total column in CSV file is 13 but 1 column contains null value.enter image description hereplease find below attachment of my code snippets . Thanks in advance

arya
  • 11
  • 4
  • 1
    Welcome to StackOverflow. Screen images of code and error messages are not as helpful as pasting the code and error text into your question. Text in an image cannot be copied and pasted into a development environment for validation and testing. – jwvh Jul 01 '17 at 07:36

1 Answers1

0

I will suggest to use databricks CSV library for this. Please use below maven dependency for Scala 2.11

<!-- https://mvnrepository.com/artifact/com.databricks/spark-csv_2.11 -->
<dependency>
    <groupId>com.databricks</groupId>
    <artifactId>spark-csv_2.11</artifactId>
    <version>1.0.3</version>
</dependency>

Sample code:

import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)
val df = sqlContext.read
    .format("com.databricks.spark.csv")
    .option("header", "true") // Use first line of all files as header
    .option("inferSchema", "true") // Automatically infer data types
    .load("cars.csv")

Reference: https://github.com/databricks/spark-csv

Sandeep Singh
  • 7,790
  • 4
  • 43
  • 68