0

I have 2 large CSV files, which contains same data. However, their file sizes vary slightly. I'm guessing this is due to different quote argument used while generating those files using data.table's fwrite().

How do I determine in R if text entries in CSV files are surrounded by quotes? I cannot open them in Notepad++ due to file size.

Ashrith Reddy
  • 1,022
  • 1
  • 13
  • 26
  • 2
    ``head file.csv -n 2`` in terminal would show the first two lines of the files, might be helpful. ([or if you're on Windows without cygwin/subsystem](https://stackoverflow.com/questions/9682024/how-to-do-what-head-tail-more-less-sed-do-in-powershell)) – runr Dec 06 '18 at 09:33
  • 1
    use `fread` with sep="", to read in the first couple of lines as-is example: `fread("./temp.csv", sep="", nrows = 2, header = FALSE)` – Wimpel Dec 06 '18 at 09:47

2 Answers2

0

you don't have to parse the entire file! read in the first couple of lines to learn about the structure:

fread("pathtofile.csv", 
       nrows= 10,      ## read first 10 lines
       header = TRUE,  ## if the csv contains a header
       sep = "," )     ## specfiy the separator; "," for comma separated
safex
  • 2,398
  • 17
  • 40
-1

readLines('file.csv', n = 2) would read the first two lines of a file.

runr
  • 1,142
  • 1
  • 9
  • 25
  • This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - [From Review](/review/low-quality-posts/21613710) – Rui Barradas Dec 06 '18 at 13:51
  • @RuiBarradas as far as I understand, the question was``How do I determine in R if text entries in CSV files are surrounded by quotes``, and reading the first ``n`` lines in most cases allow for a quick determination of how do the text values actually look inside the file. I really doubt the OP looks for a function ``getQuoteType()`` for a given file. I will delete this answer if you think it's necessary.. – runr Dec 06 '18 at 14:12
  • As for requesting the clarification from the author -- after commenting suggestions for a general solution, an edit was made to the question that solution ``in R`` is preferred, without any further communicaton, which doesn't feel engaging for any further working-out of the solution. But that's just my opinion, I don't have enough experience in SO, maybe this is common :) – runr Dec 06 '18 at 14:13
  • OK, I will explain better. This came up in a review queue, so someone flag it as VLQ. I believe that you should add your first comment to the answer. As is it's really terse, to the point of beeing an answer for those who already know how to solve the problem. – Rui Barradas Dec 06 '18 at 16:07