Highest Voted 'spark-csv' Questions

316

votes

17 answers

How to show full column content in a Spark Dataframe?

I am using spark-csv to load data into a DataFrame. I want to do a simple query and display the content: val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("my.csv") df.registerTempTable("tasks") results =…

asked Nov 16 '15 at 19:17

tracer

3,265
2
15
6

170

votes

16 answers

Write single CSV file using spark-csv

I am using https://github.com/databricks/spark-csv , I am trying to write a single CSV, but not able to, it is making a folder. Need a Scala function which will take parameter like path and file name and write that CSV file.

scala csv apache-spark spark-csv

asked Jul 28 '15 at 11:08

user1735076

3,225
7
19
16

84

votes

13 answers

Provide schema while reading csv file as a dataframe in Scala Spark

I am trying to read a csv file into a dataframe. I know what the schema of my dataframe should be since I know my csv file. Also I am using spark csv package to read the file. I trying to specify the schema like below. val pagecount =…

scala apache-spark dataframe apache-spark-sql spark-csv

asked Oct 07 '16 at 22:02

Pa1

861
1
7
6

25

votes

2 answers

How to estimate dataframe real size in pyspark?

How to determine a dataframe size? Right now I estimate the real size of a dataframe as follows: headers_size = key for key in df.first().asDict() rows_size = df.map(lambda row: len(value for key, value in row.asDict()).sum() total_size =…

python apache-spark dataframe spark-csv

asked May 06 '16 at 16:38

TheSilence

342
1
3
11

20

votes

7 answers

How to read only n rows of large CSV file on HDFS using spark-csv package?

I have a big distributed file on HDFS and each time I use sqlContext with spark-csv package, it first loads the entire file which takes quite some time. df = sqlContext.read.format('com.databricks.spark.csv').options(header='true',…

apache-spark pyspark hdfs apache-spark-sql spark-csv

asked May 31 '17 at 06:15

Abhishek

3,337
4
32
51

15

votes

2 answers

How to parse a csv that uses ^A (i.e. \001) as the delimiter with spark-csv?

Terribly new to spark and hive and big data and scala and all. I'm trying to write a simple function that takes an sqlContext, loads a csv file from s3 and returns a DataFrame. The problem is that this particular csv uses the ^A (i.e. \001)…

scala apache-spark hive delimiter spark-csv

asked Mar 15 '16 at 09:47

user2535982

14

votes

3 answers

Can I read a CSV represented as a string into Apache Spark using spark-csv

I know how to read a csv file into spark using spark-csv (https://github.com/databricks/spark-csv), but I already have the csv file represented as a string and would like to convert this string directly to dataframe. Is this possible?

apache-spark apache-spark-sql spark-csv

asked Aug 23 '16 at 22:53

Gary Sharpe

2,369
8
30
51

13

votes

1 answer

inferSchema in spark-csv package

When CSV is read as dataframe in spark, all the columns are read as string. Is there any way to get the actual type of column? I have the following csv file Name,Department,years_of_experience,DOB Sam,Software,5,1990-10-10 Alex,Data…

scala apache-spark apache-spark-sql spark-csv

asked Jul 30 '15 at 09:08

sag

5,333
8
54
91

9

votes

2 answers

Spark fails to read CSV when last column name contains spaces

scala csv apache-spark apache-commons spark-csv

asked May 22 '18 at 23:33

Sam Malayek

3,595
3
30
46

8

votes

2 answers

How to force inferSchema for CSV to consider integers as dates (with "dateFormat" option)?

I use Spark 2.2.0 I am reading a csv file as follows: val dataFrame = spark.read.option("inferSchema", "true") .option("header", true) .option("dateFormat", "yyyyMMdd") …

apache-spark dataframe apache-spark-sql spark-csv

asked Oct 02 '17 at 16:08

Rami

8,044
18
66
108

8

votes

1 answer

Spark schema from case class with correct nullability

For a custom Estimator`s transformSchema method I need to be able to compare the schema of a input data frame to the schema defined in a case class. Usually this could be performed like Generate a Spark StructType / Schema from a case class as…

apache-spark apache-spark-sql apache-spark-ml apache-spark-dataset spark-csv

asked Nov 27 '16 at 14:43

Georg Heiler

16,916
36
162
292

8

votes

2 answers

Getting NullPointerException using spark-csv with DataFrames

Running through the spark-csv README there's sample Java code like this import org.apache.spark.sql.SQLContext; import org.apache.spark.sql.types.*; SQLContext sqlContext = new SQLContext(sc); StructType customSchema = new StructType( new…

apache-spark apache-spark-sql spark-csv

asked Dec 21 '15 at 03:50

Dennis Huo

10,517
27
43

7

votes

1 answer

Add UUID to spark dataset

I am trying to add a UUID column to my dataset. getDataset(Transaction.class)).withColumn("uniqueId", functions.lit(UUID.randomUUID().toString())).show(false); But the result is all the rows have the same UUID. How can i make it…

apache-spark apache-spark-dataset spark-csv

asked Apr 09 '18 at 14:57

Adiant

859
4
16
34

7

votes

3 answers

Spark DataFrame handing empty String in OneHotEncoder

I am importing a CSV file (using spark-csv) into a DataFrame which has empty String values. When applied the OneHotEncoder, the application crashes with error requirement failed: Cannot have an empty string for name.. Is there a way I can get around…

scala apache-spark apache-spark-mllib apache-spark-ml spark-csv

asked Oct 12 '15 at 20:36

Nikhil J Joshi

1,177
2
12
25

6

votes

3 answers

Is there an explanation when spark-csv won't save a DataFrame to file?

dataFrame.coalesce(1).write().save("path") sometimes writes only _SUCCESS and ._SUCCESS.crc files without an expected *.csv.gz even on non-empty input DataFrame file save code: private static void writeCsvToDirectory(Dataset dataFrame, Path…

apache-spark spark-csv

asked Oct 16 '19 at 05:45

Makrushin Evgenii

953
2
9
20

Questions tagged [spark-csv]