Can fread
from "data.table" be forced to successfully use "."
as a sep
value?
I'm trying to use fread
to speed up my concat.split
functions in "splitstackshape". See this Gist for the general approach I'm taking, and this question for why I want to make the switch.
The problem I'm running into is treating a dot ("."
) as a value for sep
. Whenever I do so, I get an "unexpected character" error.
The following simplified example demonstrates the problem.
library(data.table)
y <- paste("192.168.1.", 1:10, sep = "")
x1 <- tempfile()
writeLines(y, x1)
fread(x1, sep = ".", header = FALSE)
# Error in fread(x1, sep = ".", header = FALSE) : Unexpected character (
# 192) ending field 2 of line 1
The workaround I have in my current function is to substitute "."
with another character that is hopefully not present in the original data, say "|"
, but that seems risky to me since I can't predict what is in someone else's dataset. Here's the workaround in action.
x2 <- tempfile()
z <- gsub(".", "|", y, fixed=TRUE)
writeLines(z, x2)
fread(x2, sep = "|", header = FALSE)
# V1 V2 V3 V4
# 1: 192 168 1 1
# 2: 192 168 1 2
# 3: 192 168 1 3
# 4: 192 168 1 4
# 5: 192 168 1 5
# 6: 192 168 1 6
# 7: 192 168 1 7
# 8: 192 168 1 8
# 9: 192 168 1 9
# 10: 192 168 1 10
For the purposes of this question, assume that the data are balanced (each line will have the same number of "sep
" characters). I'm aware that using a "."
as a separator is not the best idea, but I'm just trying to account for what other users might have in their datasets, based on other questions I've answered here on SO.