I'm using a Databricks notebook to extract gz-zipped csv files and loading into a dataframe object. I'm having trouble with part 2 below.
Part 1: Loading zipped files into dataframe is running fine...
%python
df1 = spark.read.option("header",True).option("delimiter", "|").csv("dbfs:/model/.../file_1.csv.gz")
df2 = spark.read.option("header",True).option("delimiter", "|").csv("dbfs:/model/.../file_2.csv.gz")
Part 2: Trying to merge the dataframes...
%python
import pandas as pd
df =pd.concat([df1, df2], ignore_index=True)
df.show(truncate=False)
... returns the following error:
TypeError: cannot concatenate object of type '<class 'pyspark.sql.dataframe.DataFrame'>'; only Series and DataFrame objs are valid
Any suggestions for trying to modify how I'm merging the dataframes? I will have up to 20 files to merge, where all columns are the same.