i am trying to transfer a table from Postgres to Sql Server. My database is located in remote server. So i have used the following approach
import pandas as pd
from pyspark.sql import SparkSession
from sqlalchemy import create_engine
appName = "PySpark PostgreSQL Example - via psycopg2"
master = "11.220.202.11"
spark = SparkSession.builder.master(master).appName(appName).getOrCreate()
engine = create_engine(
"postgresql+psycopg2://postgres:password@11.220.202.11/testdb?client_encoding=utf8")
pdf = pd.read_sql('select * from test_table', engine)
# Convert Pandas dataframe to spark DataFrame
df = spark.createDataFrame(pdf)
after this step i was planning to dump the data in Sql Server.
But the problem i am facing probably memory issue, as my system hangs and it has shutdown twice, i have 8 GB RAM and the table size is around 3 GB. Is there a way to directly load the data into sql server instead of loading it to pyspark dataframe. Please do suggest improvements and alternatives.
TIA