I am trying to read files from S3 Bucket and write the dataframe to postgresql table using pyspark- but am encountering the following error
Code:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('sample_v2').getOrCreate()
path = ['s3a://path/sample_data.csv']
df = spark.read.csv(path, sep=',',inferSchema=True, header=True)
print(df.show()) #works until here, df has data
df.write.format("jdbc").option("driver","org.postgresql.Driver").option("url","jdbc:postgres://********************rds.amazonaws.com:5432;database=abc;user=abcde;password=abcdef").insertInto("test_result")
Error:
22/04/06 12:15:31 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
22/04/06 12:15:31 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
22/04/06 12:15:34 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
22/04/06 12:15:34 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore UNKNOWN@192.168.29.14
22/04/06 12:15:34 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Softwares\spark-3.2.1-bin-hadoop3.2\spark\python\pyspark\sql\readwriter.py", line 762, in insertInto
self._jwrite.insertInto(tableName)
File "C:\Softwares\spark-3.2.1-bin-hadoop3.2\spark\python\lib\py4j-0.10.9.3-src.zip\py4j\java_gateway.py", line 1321, in __call__
File "C:\Softwares\spark-3.2.1-bin-hadoop3.2\spark\python\pyspark\sql\utils.py", line 117, in deco
raise converted from None
pyspark.sql.utils.AnalysisException: Table not found: test_result;
'InsertIntoStatement 'UnresolvedRelation [test_result], [], false, false, false
How to resolve this?