2

I am trying to import a dataset of 217,000 records (Jeopardy Dataset) into MonetDB through the MonetDB.R interface.

The file is a CSV file with top two lines as folows:

show_nos, air_dt, rnd, category, prize, ques, ans,x1,x2,x3
4680,12/31/2004,Jeopardy!,THE COMPANY LINE,$200 ,"In 1963, live on ""The Art Linkletter Show"", this company served its billionth burger",McDonald's,,,

4680,12/31/2004,Jeopardy!,EPITAPHS & TRIBUTES,$200 ,"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States",John Adams,,,

The problem I face is while importing the ques column (data between " "). That column has multiple commas and punctuations, and monet.read.csv is unable to import that column.

I tried importing a few records without the ques column, and it works perfectly.

Can you please suggest on how to import such columns with free flow text in monetdb? Once imported I intend to perform some text analysis on the column.

Anthony Damico
  • 5,779
  • 7
  • 46
  • 77
Jjs
  • 21
  • 2
  • hi, could you please re-write your question as a reproducible example? that means providing syntax up to the point that things break. if you can cause the breakage that you do not understand using only the first 100 rows of `jeopardy_csv.csv` then that would be preferrable. thanks – Anthony Damico Dec 24 '15 at 20:47
  • Hi Anthony, there was some issue with the dataset and I was able to resolve the error. But could not upload a column of free flow text in the dataset. So have changed the entire question, and wanted help on uploading such data to monetDb. – Jjs Dec 28 '15 at 17:07

1 Answers1

1

use monet.read.csv

i also prefer MonetDBLite for easier setup but monet.read.csv does work with just MonetDB.R thanks

mylines <-
    c("show_nos, air_dt, rnd, category, prize, ques, ans,x1,x2,x3", 
    "4680,12/31/2004,Jeopardy!,THE COMPANY LINE,$200 ,\"In 1963, live on \"\"The Art Linkletter Show\"\", this company served its billionth burger\",McDonald's,,,", 
    "4680,12/31/2004,Jeopardy!,EPITAPHS & TRIBUTES,$200 ,\"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States\",John Adams,,,")

tf <- tempfile()
dbfolder <- tempdir()

writeLines( mylines , tf )

library(MonetDBLite)
library(MonetDB.R)

db <- dbConnect( MonetDBLite() , dbfolder )

monet.read.csv( db , tf , 'mytable' )

# looks ok to me
dbReadTable( db , 'mytable' )
Anthony Damico
  • 5,779
  • 7
  • 46
  • 77
  • I tried using MonetDBLite, and it worked for a while, but now I am getting an error. Please see the error generated: Loading required package: MonetDBLite !ERROR: GDKcreatedir: cannot create directory E: !OS: The data is invalid. Error in MonetDBLite::monetdb_embedded_startup(embedded, !getOption("monetdb.debug.embedded", : Failed to initialize embedded MonetDB !FATAL: BBPinit: could not write bat\BBP.dir. Please check whether your disk is full or write-protected – Jjs Jan 02 '16 at 03:02
  • hi @Jjs, this is a completely separate issue and belongs in a different SO thread. if you post it, please provide a `reproducible` example detailing each of the steps you took _exactly_ until you hit this problem. it's a common error so you need to be explicit about what you did leading up to the error thanks – Anthony Damico Jan 02 '16 at 11:18