0

EDIT: turns out that sqlite3 was invoking an outdated version that came with CentOS that couldn't import CSVs into a new table. Specifically invoking my local version fixed the problem.

I'm trying to import a csv into a SQLite db on a computing cluster. I want to submit this as a job because the dataset is quite large, and I don't want to burden the login node. The cluster runs CentOS, and SQLite is installed in a /bin folder in my home directory. Data sample below.

I've tried doing this with the following command, submitted as a shell script:

#!/bin/bash

## ingests the raw twitter data csv into sqlite table

sqlite3 -csv ./iratweets.sql  '.import ../ira_tweets_csv_unhashed.csv tweets'

The database is currently empty. This returns the error: no such table: tweets. This is confusing to me because as I understand SQLite, it should be able to import a csv into a brand new table so long as the first row contains the headers, which is true for this dataset.

Is this something about how I'm trying to script it, or am I fundamentally misunderstanding how .import works?

Data sample:

"","user_id","status_id","created_at","screen_name","text","source","display_text_width","reply_to_status_id","reply_to_user_id","reply_to_screen_name","is_quote","is_retweet","favorite_count","retweet_count","quote_count","reply_count","hashtags","symbols","urls_url","urls_t.co","urls_expanded_url","media_url","media_t.co","media_expanded_url","media_type","ext_media_url","ext_media_t.co","ext_media_expanded_url","ext_media_type","mentions_user_id","mentions_screen_name","lang","quoted_status_id","quoted_text","quoted_created_at","quoted_source","quoted_favorite_count","quoted_retweet_count","quoted_user_id","quoted_screen_name","quoted_name","quoted_followers_count","quoted_friends_count","quoted_statuses_count","quoted_location","quoted_description","quoted_verified","retweet_status_id","retweet_text","retweet_created_at","retweet_source","retweet_favorite_count","retweet_retweet_count","retweet_user_id","retweet_screen_name","retweet_name","retweet_followers_count","retweet_friends_count","retweet_statuses_count","retweet_location","retweet_description","retweet_verified","place_url","place_name","place_full_name","place_type","country","country_code","geo_coords","coords_coords","bbox_coords","status_url","name","location","description","url","protected","followers_count","friends_count","listed_count","statuses_count","favourites_count","account_created_at","verified","profile_url","profile_expanded_url","account_lang","profile_banner_url","profile_background_url","profile_image_url"
"1","916778612568752128","1160176884522655744",2019-08-10 13:11:38,"FreeRussia2018","Погода и ветер не помешает нам поддержать Москву! #Допускай","Twitter Web App",NA,NA,NA,NA,FALSE,TRUE,0,0,0,0,
Sean Norton
  • 277
  • 1
  • 12
  • From https://sqlite.org/cli.html : "when the table does not previously exist, the table is automatically created and the content of the first row of the input CSV file is used to determine the name of all the columns in the table. In other words, if the table does not previously exist, the first row of the CSV file is interpreted to be column names and the actual data starts on the second row of the CSV file." Do you think that you first line is suitable? Please show sample input. – Yunnosch Sep 25 '19 at 20:18
  • 1
    You're literally trying to name the table `table-name`? – Shawn Sep 25 '19 at 20:21
  • 2
    Also you can use just `sqlite3 -csv database.db '.import raw.csv tablename'` or a heredoc in a script. No need for another file with the commands. – Shawn Sep 25 '19 at 20:24
  • Edited to reflect the real table name, was trying to be too general. The first row is the headers, data starts on the second row. Cannot share specific data due to data privacy agreement, but can dupe some when back at my computer. – Sean Norton Sep 25 '19 at 21:06
  • Unable to duplicate, btw. Works fine for me. – Shawn Sep 25 '19 at 21:16
  • Updated to include data sample - tried the one liner code, still getting the same error, – Sean Norton Sep 26 '19 at 14:51

0 Answers0