1

I'm attempting to load a csv into a data frame but not getting the same number of rows when executing nrow vs Excel.

Looking at the max row in R I can see there are ASCII control characters in a varchar column when viewing it in Notepad++. For example: CAN, EM, STX, ETC

Is this causing the problem for incomplete load?

I've tried changing the encoding in the data import but this didn't work.

Data import code: claims <- read.csv(file="C:\\Users\\User\\Desktop\\claims - barbican.csv", sep="\t")

I'm expecting the row count in R to equal the row count in Excel/Notepad++

Vitali Avagyan
  • 1,193
  • 1
  • 7
  • 17
Brian
  • 83
  • 12
  • If there are commas in your fields, that will cause problems when reading in the data. Also, is this a CSV or TSV? You have `sep = '\t'` which would be a TSV – svenhalvorson Nov 08 '19 at 16:24
  • Hi. It's a tab separated file saved as .csv. There are commas in the data which is why I created a tab delimited file in SSIS – Brian Nov 08 '19 at 16:29
  • 1
    See if this helps: https://stackoverflow.com/questions/57909150/why-cant-r-read-this-csv-file/57909913#57909913 – G. Grothendieck Nov 08 '19 at 16:35
  • 1
    Besides trying out `data.table::fread` (which has better error reporting) you could try out a tool like [CSVed](https://csved.sjfrancke.nl/) to load and examine the CSV data file for errors... – R Yoda Nov 08 '19 at 21:38
  • Hi. Using fread has worked much better than read.csv. I now have 99% of the rows in RStudio.Annoyingly I'm getting an error saying "Stopped reading at empty line 18815 but text exists afterwards (discarded)". Line 18814 has a CRLF at the end so not sure why the last line is being ignore. Why has fread worked better than read.csv and why is the last line being ignored? – Brian Nov 11 '19 at 09:26

0 Answers0