SQLITE3 Import .csv via Mac Terminal and Skip the First Row

Question

I have thousands of .csv files that I'd like to import into a sqlite3 database. Each .csv file has labels on the first row.

My idea is to run a for loop over the file names and import them using the .import command:

    sqlite3 options.db ".mode csv data" ".import $f data"

The problem is that this will import the first row that contains the labels.

Question: How do I .import the .csv files without importing the first row of each file?

Thanks for helping! :D

I generally just import it all and then delete the header rows. You could also use `grep` and `cat` and maybe some `sed` ,`awk` to clean the files up first. — Kevin, Aug 18 '17 at 04:08
@Kevin The thing is that the headers are all text, while some of the data types that I'm using are integers or reals. Would .import still work properly? — Guilherme Salomé, Aug 18 '17 at 04:09
@Kevin Any ideas on how to "pipe" (i'm a noob sorry) `sed 1d $f` into the `.import`? — Guilherme Salomé, Aug 18 '17 at 04:11
Are you against just quickly scripting it in Python? I can see that being reasonably easy as existing libraries allow you to ignore the first row — Jonathan Holland, Aug 18 '17 at 04:12
sqlite will work fine importing the text into an integer type column. It is dynamically typed on each `value` not each `column` — Kevin, Aug 18 '17 at 04:15
I understand that from the tagging, just pointing out there are other options if the goal isn't to deliberately achieve the goal only in bash :) — Jonathan Holland, Aug 18 '17 at 04:16
@JonathanHolland Indeed, that is a great option :D It's just that I don't remember anything of Python, and I haven't used it in a couple of years. But I should probably relearn the basics at least for my future sanity. — Guilherme Salomé, Aug 18 '17 at 04:18

score 1 · Answer 1 · answered Aug 18 '17 at 04:20

1

You could install the Spatialite extension for sqlite and create a virtual table from each csv - it will automatically use the first row as the column names - and then simply create a new table, or import into an existing one by selecting all the records from the virtual table you created. Obviously you'd want to create a script that does this since you have thousands of csv files to import.

Alternatively you could write a script that uses the Unix cat function but skips the first row, if you're importing all the files into the same table.

Or write a script that parses the first row to define the columns in a table create command and then iterates through the rest of the rows to populate the new table.

Do any of those options meet your needs?

answered Aug 18 '17 at 04:20

Momus

394
2
13

All good suggestions. However if all the files are going in one table, they all need to have the same number of columns. Sqlite3 will choke on mismatched imports. – Kevin Aug 18 '17 at 04:27
I'm importing all files into the same table. It seems right now that the easiest solution might be to just import everything and afterwards remove the header rows. – Guilherme Salomé Aug 18 '17 at 04:32
If you're importing them all into the same table then use option 2 - write a script that 'cat's the csv files one after the other, skipping the first row, and then importing them. @Kevin is right though, the import will choke and die if you have different numbers of columns in the files. – Momus Aug 18 '17 at 04:52
@Kevin hmmm that is kind of a problem. Some of the files have 18 columns, while other have 23. Sqlite3 does seem to import them, but gives me a warning "expected 23 but found 18 - filling the rest with NULL". – Guilherme Salomé Aug 18 '17 at 04:58
I do need to do something about some files having 18 while others have 23 columns, otherwise sqlite3 just prints a bunch of warnings and it takes forever to go over one file. Any ideas? – Guilherme Salomé Aug 18 '17 at 05:04
Are the they always the same 18 columns and the same 23 columns? You could write the import script to check the number of columns and use a different insert command depending on the column count. If they're not always the same columns then you're going to need to write something a bit more sophisticated. – Momus Aug 18 '17 at 05:23
Yes, some rows have 18 columns, other have 23 columns, but they are always the same. – Guilherme Salomé Aug 18 '17 at 05:26
Some rows in the same file have 18 while others have 23, or some files have 18 columns per row while other files have 23 columns per row? – Momus Aug 18 '17 at 05:32

James Brown · Answer 2 · 2017-08-18T07:25:52.450

You can use GNU awk (I don't have access to Mac awk so I can't test) to skip the first row and add the missing columns:

$ cat test
1,1
2,2
$ awk '
BEGIN { FS=OFS="," }        # set separators
FNR==1 { next }             # skip the first record(s)
NF==2 { NF=4 }              # if field count is 2 set it to 4
1                           # output
' test # > newpath/newfile  # you can use * and then some
2,2,,

Replace the NF values to suit your needs (NF==18 { NF=23 }) if I understood correctly from the comments to another answer. If the NF=4 part doesn't work in Mac awk, you can replace it with print $0 OFS OFS OFS OFS OFS; next. Modern GNU awks have the inplace edit possibility. See for example this,

I didn't know that command, thanks for bringing it to my attention! — Guilherme Salomé, Aug 25 '17 at 17:17

score 1 · Accepted Answer · answered Aug 20 '17 at 19:55

Assume file1.csv looks like this:

File1,Line1
File1,Line2
File1,Line3
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
1,2,3,4,5,6

And file2.csv looks like this:

File2,Line1
File2,Line2
File2,Line3
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
1,2,3,4,5,6

Then you can make awk print all your CSV files whilst skipping the first line (where FNR, i.e. the file line number, is 1) like this:

awk -F, 'FNR==1{next} 1' *csv

Output

File1,Line2
File1,Line3
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
1,2,3,4,5,6
File2,Line2
File2,Line3
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
1,2,3,4,5,6

Then you can check the number of fields present on each line, and if it is not 23, add as many commas as necessary to make it 23:

awk -F, 'FNR==1{next} NF!=23{$0=$0 substr(",,,,,,,,,,,,,,,,,,,,,,",1,23-NF)}1' *csv

Output

File1,Line2,,,,,,,,,,,,,,,,,,,,,
File1,Line3,,,,,,,,,,,,,,,,,,,,,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,,,,,
1,2,3,4,5,6,,,,,,,,,,,,,,,,,
File2,Line2,,,,,,,,,,,,,,,,,,,,,
File2,Line3,,,,,,,,,,,,,,,,,,,,,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,,,,,
1,2,3,4,5,6,,,,,,,,,,,,,,,,,

So, as an actual complete solution, you could do this:

awk -F, 'FNR==1{next} NF!=23{$0=$0 substr(",,,,,,,,,,,,,,,,,,,,,,",1,23-NF)}1' *csv > NewBigFile.csv

and then import NewBigFile.csv into sqlite.

I ended up importing everything, suppressing the errors (not a good thing), and deleting the rows via SQL afterwards. But this looks like a better solution to my problem, and one that doesn't require me to ignore errors. Thanks! — Guilherme Salomé, Aug 25 '17 at 17:17

SQLITE3 Import .csv via Mac Terminal and Skip the First Row

3 Answers3