I want to create dataframe from two csv files have same schema but the folder path is different from the another
Asked
Active
Viewed 445 times
2 Answers
3
In Spark 2.x :
Single DataFrame from CSV files stored in different directories
val df = spark.read.option("header", "true").option("inferSchema", "true").csv(path1,path2)
Single Dataframe from CSV files stored in directory in a recursive way (using wildcard characters)
val df = spark.read.option("header", "true").option("inferSchema", "true").csv(parent-directory/\*/*)

koiralo
- 22,594
- 6
- 51
- 72

PoojanKothari
- 89
- 1
- 6
-
thanks for providing this. was not sure if you noted it, but the question in this case is tagged with pyspark and the OP's question uses Python syntax. Your scala code is pretty similar to what we might use with Python, but could still be confusing to readers seeking Python guidance. – E. Ducateme Mar 01 '18 at 00:22
0
You can give list of string paths
while you read the csv files using sqlContext
sqlContext.read.format("com.databricks.spark.csv").csv(["path1", "path2"]).show(truncate=False)
or using load
sqlContext.read.format("com.databricks.spark.csv").load(["path1", "path2"]).show(truncate=False)
You can play with other options as header
inferSchema
etc...

Ramesh Maharjan
- 41,071
- 6
- 69
- 97