Problems reading in Pipe delimited csv with special characters into R

Question

I've been trying to read in a pipe delimited csv file containing 96 variables about some volunteer water quality data. Randomly within the file, there's single and double quotation marks as well as semi-colons, dashes, slashes, and likely other special characters

Name: Jonathan "Joe" Smith; Jerry; Emily; etc.

From the output of several variables (such as IsNewVolunteer), it seems that r is having issues reading in the data. IsNewVolunteer should always be Y or N, but numbers are appearing and when I queried those lines it appears that the data is getting shifted. Variables that are clearly not names are in the Firstname and lastname column.

The original data format makes it a little difficult to see and troubleshoot, especially due to extra variables. I would find a way to remove them, but the goal of the work with R is to provide code that will be able to run on a dataset that is frequently updated.

I've tried

read.table("dnrvisualstream.csv",sep="|",stringsAsFactors = FALSE,quote="")

But that produces the following error:

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 132 did not have 94 elements

However, there's nothing out of the ordinary that I've noticed about line 132. I've had more success with

read.csv("dnrvisualstream.csv",sep="|",stringsAsFactors = FALSE,quote="")

but that still produces offsets and errors as discussed above. Is there something I'm doing incorrectly? Any information would be helpful.

Please take a look at [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), to modify your question, with a smaller sample taken from your data (check `?dput()`). Posting images of your data or no data makes it difficult to impossible for us to help you! — massisenergy, Mar 26 '20 at 12:51
Have you tried specifying the encoding? Something like: `read.csv("dnrvisualstream.csv",sep="|",stringsAsFactors = FALSE,quote="", encoding = "UTF-8")`. You also have multiple `||||` separators in rows 2,3,4, and 21,22 that are visible in your screenshot. You might need to edit your .csv either in R or manually to strip some of them out. — Matt, Mar 26 '20 at 13:29

score 0 · Answer 1 · answered Mar 26 '20 at 13:42

I think it's one of two issues:

Encoding is either UTF-8 or UTF-16:

Try this...

read.csv("dnrvisualstream.csv", sep = "|", stringsAsFactors = FALSE, quote = "", encoding = UTF-8)

or this...

read.csv("dnrvisualstream.csv", sep = "|", stringsAsFactors = FALSE, quote = "", encoding = UTF-16)

Too many separators:

If this doesn't work, right-click on your .csv file and open it in a text editor. You have multiple |||| separators in rows 2,3,4, and 21,22 that are visible in your screenshot. Press CTRL+H to find and replace:

Find: ||||
Replace: |

Save the new file and try to open in R again.

Problems reading in Pipe delimited csv with special characters into R

1 Answers1