0

Issue

I'm creating a script that blends a load of tsv files with the aim of then importing the data into my sqlite3 table. However, I have noticed that one of my files doesn't seem to get imported with my script like all the others. It also doesn't appear to import when I run it manually.

As the data is able to get into the temp table without issue, I thought the issue may lie with the date column, but even with amending it in the data to be in Y-M-D format, for some reason it still doesn't import.

Can someone confirm what seems to be wrong with the sample data as all other tsv files appear to import without issue?

sample.tsv

company1    account1    platform1   05/05/2020  28823   client1 191361
company2    account2    platform2   05/05/2020  47188   cleint2 16597
company3    account3    platform3   05/06/2020  28823   client1 191903
company4    account4    platform4   05/06/2020  47188   cleint2 15407
company5    account5    platform5   05/07/2020  28823   client1 196450
company6    account6    platform6   05/07/2020  47188   cleint2 14678
company7    account7    platform7   05/08/2020  28823   client1 191809
company8    account8    platform8   05/08/2020  47188   cleint2 14489
company9    account9    platform9   05/09/2020  28823   client1 195405

script.sh

#!/bin/bash
set -eu

cd final/client/

for f in $(find -mindepth 1 -maxdepth 1 -type f -name "*.tsv" -printf "%f\n"); do

sqlite3 -batch ~/SQL/.databases/database.db <<EOF
CREATE TABLE IF NOT EXISTS companydata_client(
Company  TEXT NOT NULL,
Account TEXT NOT NULL,
Platform        TEXT NOT NULL,
Impression_Date_Delivered  DATE NOT NULL,
Client_ID   TEXT NOT NULL,
Client      TEXT NOT NULL,
Impressions     INTEGER,
UNIQUE(Company,Account,Platform,Impression_Date_Delivered,Client_ID, Client,Impressions)
);

CREATE TEMPORARY TABLE wand(
Company  TEXT NOT NULL,
Account TEXT NOT NULL,
Platform        TEXT NOT NULL,
Impression_Date_Delivered  DATE NOT NULL,
Client_ID   TEXT NOT NULL,
Client      TEXT NOT NULL,
Impressions     INTEGER
);

.mode tabs
.separator \t
.import $f wand

INSERT OR IGNORE INTO companydata_client (Company,Account,Platform,Impression_Date_Delivered,Client_ID,Client,Impressions)

SELECT
        CAST (Company AS TEXT),
        CAST (Account AS TEXT),
        CAST (Platform AS TEXT),
        strftime('%d-%m-%Y',Impression_Date_Delivered),
        CAST (Client_ID AS Text),
        CAST (Client AS TEXT),
        CAST (Impressions AS Integer)
FROM wand;
EOF

done

cd ../../
Dycey
  • 4,767
  • 5
  • 47
  • 86
El_Birdo
  • 315
  • 4
  • 19
  • Your date separators area forward slashes in your tsv file, so I guess using dashes in your code will not work. – 9769953 Jun 03 '20 at 13:10
  • @00 I thought the same but even when converting `/` to `-` and changing the date order it didn't solve the issue – El_Birdo Jun 03 '20 at 13:11
  • @00 yes sorry that was a mistake in the post. Can confirm this has now been amended – El_Birdo Jun 03 '20 at 13:25
  • I've put in a vote to close as a duplicate, even though I've answered this question separately. Sometimes, that is just clearer, but linking it to a duplicate connects the questions (and answers), which may be useful for others (since the suggested duplicate question has a variety of answers). – 9769953 Jun 03 '20 at 13:58

1 Answers1

1

Sqlite appears to be somewhat unfortunate in this case, as the strftime is to convert dates to strings, but there seems to be no equivalent strptime, that would convert strings to dates. (In fact, all dates are effectively strings in Sqlite. I think they are only parsed and interpreted as dates for the relevant functionality.)

Going by the answer to this question, you would end up with something like this instead:

INSERT OR IGNORE INTO companydata_client (Company,Account,Platform,Impression_Date_Delivered,Client\
_ID,Client,Impressions)
SELECT
        CAST (Company AS TEXT),
        CAST (Account AS TEXT),
        CAST (Platform AS TEXT),
        substr(Impression_Date_Delivered, 7) || '-' ||
            substr(Impression_Date_Delivered, 4, 2) || '-' ||
            substr(Impression_Date_Delivered, 1, 2),
        CAST (Client_ID AS Text),
        CAST (Client AS TEXT),
        CAST (Impressions AS Integer)
FROM wand;
EOF

Not the most readable line, but that seems to be the way to go (until someone else comes up with a better answer). The formatting should yield a string that Sqlite can interpret as a date.

Aside: I changed the type of wand.Impression_Date_Delivered to TEXT, since that is what it is at that point: it is only inserted as a "date" in companydata_client.

9769953
  • 10,344
  • 3
  • 26
  • 37