0

I have an SQL query result exported to csv format with ";" as a separator. it should contain 5 columns:

order_id (str), 
user_rating (int), 
user_feedback (str), 
created_by (str), 
created_on (date)

Problem is that users can have comments spanning multiple lines in the user_feedback field. While in SQL it is coerced to a nice single line view. But for the exported csv the dataset would not read in properly.

fread() from data.table and read.table() both failed. I finally got it to read in as 1 column using:

data1 <- read.table('myFile.csv', stringsAsFactors = F, header = F, 
sep = "\t", allowEscapes = T, quote = "")

The problem now is as follows: the query result contains 50000 rows, but the csv is read in as 50267 rows. From checking the data, it seems it took user_feedback that contained 'enters' and read those in as separate rows.

How do I import this data properly, so with the user_feedback data for every row into a single string field?

Jaap
  • 81,064
  • 34
  • 182
  • 193
DaReal
  • 597
  • 3
  • 10
  • 24
  • If you have `;` as a separater, you should use `sep = ";"`. Did you try that? Furthermore: it is alsways good to include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This makes it easier for others to help you. – Jaap Nov 05 '16 at 17:26
  • I tried using the actual separator first, but that makes it break: `Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 79 did not have 5 elements` Since the problem is actually getting the dataset into R, I'm not quite sure how to create a reproducible example of the data. – DaReal Nov 05 '16 at 17:53
  • What happens if you include the parameter `fill = TRUE`? – Jaap Nov 05 '16 at 17:56
  • Thanks, I tried it, but yielded the same result unfortunately. – DaReal Nov 05 '16 at 18:01
  • As I said in my first comment: if you don't include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610), that will make it a lot harder for people to help you. – Jaap Nov 05 '16 at 18:05
  • I know, I'll see if I can create a sample and upload that to an S3 bucket. – DaReal Nov 05 '16 at 18:06
  • Why not just include it in the question? – Jaap Nov 05 '16 at 18:08
  • @DaReal can simply `dput(head(df))` and paste the sample here . – vagabond Nov 05 '16 at 18:26
  • Also have a close look at line 79 of your csv file, since (as the error tells you) that's where `read.table` is getting tripped up. – rosscova Nov 06 '16 at 01:18
  • Use `readLines`. The using `count.fields` determine line numbers where the number of fields is short. Append those to the preceding line without the "\n" and then read with `read.table`. Pretty sure there are worked examples in SO and Rhelp. – IRTFM Nov 06 '16 at 06:49

0 Answers0