I have multiple thousands of csvs and each csv has over 10000 records. I'm looking for the most efficient way to dump this data into tables in Postgres DB with minimal time and effort.
Asked
Active
Viewed 1,396 times
-1
-
4What methods have you found on your searches? How did they work out? – roganjosh Aug 23 '18 at 14:23
-
Similar questions have been answered already. Please look at - https://stackoverflow.com/questions/30050097/copy-data-from-csv-to-postgresql-using-python and also look at https://stackoverflow.com/questions/12646305/efficient-way-to-import-a-lot-of-csv-files-into-postgresql-db – sulabh chaturvedi Aug 23 '18 at 14:32
-
It s not the same as the previously opened questions because: The old questions give you a way to import a SINGLE csv file into postgres. But in my case, I want to automate the import of a very large number of files where there are 2 manual processes involved: 1. Create a new table 2. Import the csv into this new table. I want to accomplish these 2 steps in one procedure for multiple thousands of files through automation. – Priya Sreetharan Aug 23 '18 at 15:12
-
In other words, I want to be able to create the tables on the fly and assign table names as source file names and then import data from source file into the table created for a bulk of files. – Priya Sreetharan Aug 23 '18 at 15:16
-
@sulabhchaturvedi the links you have provided have answers to tackle a single file import into a new table with manual table creation. But my question is different. – Priya Sreetharan Aug 23 '18 at 15:17
-
@roganjosh most of the methods I found out on research only ell me how to import a single csv at a time into a manually created table. What I need is this: A script/method that would automate the following steps for 10000 files: 1. Create a new table with the name of the data file 2. Import data from the data file into the new table created in step 1. – Priya Sreetharan Aug 23 '18 at 15:20
-
@PriyaSreetharan -. - Please go through the link carefully. - https://stackoverflow.com/questions/12646305/efficient-way-to-import-a-lot-of-csv-files-into-postgresql-db. Here the user is talking about multiple cSV import, This is the extract of question ---->> " I see plenty of examples of importing a CSV into a PostgreSQL db, but what I need is an efficient way to import 500,000 CSV's into a single PostgreSQL db. Each CSV is a bit over 500KB (so grand total of approx 272GB of data)" <<--- – sulabh chaturvedi Aug 23 '18 at 15:22
-
@sulabhchaturvedi Thanks for sharing the link firstly. I went through the same, but it won't be the same as my use case as the data structure of my table is not identical as his case was. – Priya Sreetharan Aug 23 '18 at 15:58
-
@PriyaSreetharan - Okay. Then, I would suggest write a new question and elaborate it to an extent where people can understand what exactly are you looking for. Also, share the research you have done like you just commented. This will help you to get an answer and others to answer more appropriately - https://stackoverflow.com/help/how-to-ask – sulabh chaturvedi Aug 23 '18 at 16:02
-
@sulabhchaturvedi I have opened another ticket explaining the problem in depth. Thanks! – Priya Sreetharan Aug 24 '18 at 15:02
2 Answers
1
COPY is usually the best solution. Depends of your constraints.
COPY table_name FROM 'path_readable_by_postgres/file.cvs';
And you can cat
your files in a big one to import data quickly.
Look ta https://www.postgresql.org/docs/current/static/sql-copy.html for more details.

Hervé Piedvache
- 782
- 4
- 9
-
Answered also here: https://stackoverflow.com/questions/2987433/how-to-import-csv-file-data-into-a-postgresql-table – zeevb Aug 23 '18 at 14:32
-
as @zeevb pointed out this is same answer as provided by a different user in the link – Inder Aug 23 '18 at 14:33
-
I have multiple thousands of files. I don't want to manually add all the file names. ALso, for this command to work, I already need to have the table created, which I don't have. If I were to manually create, I'd have to create multiple thousands of table and it would be extremely time consuming. – Priya Sreetharan Aug 23 '18 at 14:50
1
You can use pandas library to read and transform data (if needed), sqlalchemy to create postgres engine and psycopg2 to load data into postgresql. I assume, that you've already created tables in Postgres DB. Try something like the code below
import pandas as pd
from sqlalchemy import create_engine
import pandas as pd
import psycopg2
# Drop "Unnamed: 0", as it often causes problems in writing to table
pd.read_csv({path/to/file.csv}, index_col={index_column}).drop(["Unnamed: 0"], axis=1)
# Now simply load your data into database
engine = create_engine('postgresql://user:password@host:port/database')
try:
pd_table.to_sql({'name_of_table_in_postgres_db'}, engine, if_exists='append')
except (Exception, psycopg2.DatabaseError) as error:
print(error)
finally:
print('Closed connection to the database')

enoted
- 577
- 7
- 21
-
So, no I haven't yet created the tables. The code above reads one csv and dumps it to a new table and I can loop it out for all the files? – Priya Sreetharan Aug 23 '18 at 14:46
-
-
-
`engine = try:` --- sorry, I have lost some code, now it should be okay – enoted Aug 25 '18 at 12:35
-
The code above adds one csv into postgresql table, created earlier. You can loop it to add all csv files into table. – enoted Aug 25 '18 at 12:38