I've been trying to read in a pipe delimited csv file containing 96 variables about some volunteer water quality data. Randomly within the file, there's single and double quotation marks as well as semi-colons, dashes, slashes, and likely other special characters
Name: Jonathan "Joe" Smith; Jerry; Emily; etc.
From the output of several variables (such as IsNewVolunteer
), it seems that r is having issues reading in the data. IsNewVolunteer
should always be Y
or N
, but numbers are appearing and when I queried those lines it appears that the data is getting shifted. Variables that are clearly not names are in the Firstname
and lastname
column.
The original data format makes it a little difficult to see and troubleshoot, especially due to extra variables. I would find a way to remove them, but the goal of the work with R
is to provide code that will be able to run on a dataset that is frequently updated.
I've tried
read.table("dnrvisualstream.csv",sep="|",stringsAsFactors = FALSE,quote="")
But that produces the following error:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 132 did not have 94 elements
However, there's nothing out of the ordinary that I've noticed about line 132. I've had more success with
read.csv("dnrvisualstream.csv",sep="|",stringsAsFactors = FALSE,quote="")
but that still produces offsets and errors as discussed above. Is there something I'm doing incorrectly? Any information would be helpful.