I am using spark-csv to load data into a DataFrame. I want to do a simple query and display the content:
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("my.csv")
df.registerTempTable("tasks")
results =…
I am using https://github.com/databricks/spark-csv , I am trying to write a single CSV, but not able to, it is making a folder.
Need a Scala function which will take parameter like path and file name and write that CSV file.
I am trying to read a csv file into a dataframe. I know what the schema of my dataframe should be since I know my csv file. Also I am using spark csv package to read the file. I trying to specify the schema like below.
val pagecount =…
How to determine a dataframe size?
Right now I estimate the real size of a dataframe as follows:
headers_size = key for key in df.first().asDict()
rows_size = df.map(lambda row: len(value for key, value in row.asDict()).sum()
total_size =…
I have a big distributed file on HDFS and each time I use sqlContext with spark-csv package, it first loads the entire file which takes quite some time.
df = sqlContext.read.format('com.databricks.spark.csv').options(header='true',…
Terribly new to spark and hive and big data and scala and all. I'm trying to write a simple function that takes an sqlContext, loads a csv file from s3 and returns a DataFrame. The problem is that this particular csv uses the ^A (i.e. \001)…
I know how to read a csv file into spark using spark-csv (https://github.com/databricks/spark-csv), but I already have the csv file represented as a string and would like to convert this string directly to dataframe. Is this possible?
When CSV is read as dataframe in spark, all the columns are read as string. Is there any way to get the actual type of column?
I have the following csv file
Name,Department,years_of_experience,DOB
Sam,Software,5,1990-10-10
Alex,Data…
I have a CSV that looks like this:
+-----------------+-----------------+-----------------+
| Column One | Column Two | Column Three |
+-----------------+-----------------+-----------------+
| This is a value | This is a value | This is…
I use Spark 2.2.0
I am reading a csv file as follows:
val dataFrame = spark.read.option("inferSchema", "true")
.option("header", true)
.option("dateFormat", "yyyyMMdd")
…
For a custom Estimator`s transformSchema method I need to be able to compare the schema of a input data frame to the schema defined in a case class. Usually this could be performed like Generate a Spark StructType / Schema from a case class as…
Running through the spark-csv README there's sample Java code like this import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.types.*;
SQLContext sqlContext = new SQLContext(sc);
StructType customSchema = new StructType(
new…
I am trying to add a UUID column to my dataset.
getDataset(Transaction.class)).withColumn("uniqueId", functions.lit(UUID.randomUUID().toString())).show(false);
But the result is all the rows have the same UUID. How can i make it…
I am importing a CSV file (using spark-csv) into a DataFrame which has empty String values. When applied the OneHotEncoder, the application crashes with error requirement failed: Cannot have an empty string for name.. Is there a way I can get around…
dataFrame.coalesce(1).write().save("path") sometimes writes only _SUCCESS and ._SUCCESS.crc files without an expected *.csv.gz even on non-empty input DataFrame
file save code:
private static void writeCsvToDirectory(Dataset dataFrame, Path…