1

I have a csv file with 2550 columns and I want to import it into postgreSQL.

For importing csv in postgreSQL at first I should create the table and them use \copy to copy from csv to table. but what if the table has huge amount of columns like my case that I cannot create table manually.

any solution?

Update

Data structure is as following: dZ(01) till dZ(2550) are basically between -50 to +50:

id | date    | time      | localtime | pid | dZ(0)..dZ(1)......dZ(2550)|
---|---------|-----------|-----------|-----|---------------------------|
17|11-11-2014| 16:33:21 | 1.45E+15   |2375 |0 0 0 0 0 -1 0 -1 0 -5 -10|

CSV structure: (I used '';' delimiter)

17;13-11-2014;08:09:37;1.45E+15;4098;0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4 3 0 -2 3 -2 1 0 0 1 1 3 -2 3 4 2 -2 -2 ....

This is one line of data.

TylerH
  • 20,799
  • 66
  • 75
  • 101
SillyPerson
  • 589
  • 2
  • 7
  • 30
  • 3
    I'm afraid you are limited to 1600 columns in table, so you probably will ned to try setting up custom delimiter and import the whole line as one column, so later you could select `split_part` or regex into several columns. but anyway you wont be able to import all 2550 to same table – Vao Tsun Mar 29 '17 at 09:33
  • 4
    How large is each individual line? Maybe you could import it into a table with only one `text` column and process it in the database. A table with that many columns doesn't make much sense in a relational database. – Laurenz Albe Mar 29 '17 at 09:36
  • Looks like repeating group. Maybe store it into an array? BTW: date+time should be one (timestamp) field. – wildplasser Mar 29 '17 at 12:49
  • You could use the [file_textarray_fdw](https://pgxn.org/dist/file_textarray_fdw/) foreign data wrapper which maps the line of a CSV file to a single array in Postgres. You can then use the Postgres array functions to split the table or get a subset of the "columns" –  Mar 30 '17 at 12:01

3 Answers3

3

Import the dZ column into a text column and later turn it into an array:

Create the temporary table:

create table s (
    id int,
    date date,
    time time,
    localt double precision,
    pid int,
    dz text
);

Set the date style:

set datestyle = 'DMY';

Import into the temporary table;

\copy s from '/home/cpn/codigo/file.csv' with (format csv, delimiter ';')

Create the definitive table merging date and time into a timestamp and turning dZ into an array:

create table t (
    id int,
    datetime timestamp,
    localt double precision,
    pid int,
    dz integer[]
);

Populate the definitive from the temporary:

insert into t (id, datetime, localt, pid, dz)
select
    id, date + time, localt, pid,
    regexp_split_to_array(dz, '\s')::int[]
from s
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • Table with 2500 columns smells bad! I like your array solution. Perhaps the destination table structure should be normalized to have 1 value per line or something. Unless the array really does make sense. – bobflux Mar 29 '17 at 13:52
  • I tried this solution but the array is empty in both s and t tables i updated my question with data structure, any idea? – SillyPerson Mar 30 '17 at 07:33
  • @Safariba I will check it later. – Clodoaldo Neto Mar 30 '17 at 09:19
  • @Safariba Changed the delimiter to `;` and fixed a syntax error in the `\copy` command. Changed the data type of the `localtime` column to `double precision`. Now it works for me with the provided data. – Clodoaldo Neto Mar 30 '17 at 11:43
1

Bit old to reply, just incase for those who still looking. This can be done using python.

Considering your data in a DataFrame as df.

df= df.rename(columns=str.lower)

from sqlalchemy import create_engine
engine = create_engine('postgresql://username:password@localhost:5432/mydatabase')

df.to_sql('table_name', engine)

Read more here

Mario Varchmin
  • 3,704
  • 4
  • 18
  • 33
0

Considering that the maximum is 1600 columns, so i devided the csv into two tables of each 1000, I wrote a python code to create dz columns and it works fine:

import psycopg2
sqlstring="CREATE TABLE z(id bigint, date date, time time,"
for i in range(1001):
    if(i<1000):
        sqlstring+="dz"+str(i)+ " int,"
    elif i==1000:
        sqlstring+="dz"+str(i)+ " int"        
sqlstring += ");"

connection = psycopg2.connect("dbname='mydb' user='qfsa' host='localhost' password='123' port=5433")
cursor=connection.cursor();
cursor.execute(sqlstring)
connection.commit()

now i could import csv into the table.

SillyPerson
  • 589
  • 2
  • 7
  • 30