0

I want to create dataframe from two csv files have same schema but the folder path is different from the another

zero323
  • 322,348
  • 103
  • 959
  • 935
Ahmad Senousi
  • 613
  • 2
  • 12
  • 24

2 Answers2

3

In Spark 2.x :

  • Single DataFrame from CSV files stored in different directories

    val df = spark.read.option("header", "true").option("inferSchema", "true").csv(path1,path2)

Dataframe from multiple file paths

  • Single Dataframe from CSV files stored in directory in a recursive way (using wildcard characters)

    val df = spark.read.option("header", "true").option("inferSchema", "true").csv(parent-directory/\*/*)

Dataframe from recursive directories

koiralo
  • 22,594
  • 6
  • 51
  • 72
PoojanKothari
  • 89
  • 1
  • 6
  • thanks for providing this. was not sure if you noted it, but the question in this case is tagged with pyspark and the OP's question uses Python syntax. Your scala code is pretty similar to what we might use with Python, but could still be confusing to readers seeking Python guidance. – E. Ducateme Mar 01 '18 at 00:22
0

You can give list of string paths while you read the csv files using sqlContext

sqlContext.read.format("com.databricks.spark.csv").csv(["path1", "path2"]).show(truncate=False)

or using load

sqlContext.read.format("com.databricks.spark.csv").load(["path1", "path2"]).show(truncate=False)

You can play with other options as header inferSchema etc...

Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97