Here are a few solutions. The first 5 do not use any packages. nc
(number of columns) and cn
(column names) defined in (1) are used in the others as well.
1) read.fwf Using the input DF
shown reproducibly in the Note at the end count the maximum number of characters in a line and divide by 3 to get the number of columns nc
. Next compute the column names cn
. Finally use read.fwf
to read them in. No packages are used.
nc <- max(nchar(DF[[1]]))/3
cn <- paste0("col", head(LETTERS, nc))
read.fwf(textConnection(as.character(DF[[1]])), rep(3, length = nc),
col.names = cn)
giving:
colA colB colC colD
1 123 456 NA NA
2 123 456 789 NA
3 123 456 789 123
2) formatC A variation on the above would be to use formatC
to insert commas after every 3 characters giving the character vector ch
and then read that in using read.csv
.
ch <- formatC(DF[[1]], format= "f", digits = 0, big.mark = ",")
read.csv(text = ch, header = FALSE, col.names = cn)
3) strsplit Another variation would be to split the column using strsplit
and the indicated regular expression to split by and then use toString to put the split components into a comma separated string vector, ch
. Finally use read.csv
as before.
ch <- sapply(strsplit(as.character(DF[[1]]), "(?<=...)", perl = TRUE), toString)
read.csv(text = ch, header = FALSE, col.names = cn)
4) gsub Yet another variation is to use gsub
to insert commas every 3 characters and then use read.csv
as in (2) and (3).
ch <- gsub("(...)(?=.)", "\\1,", DF[[1]], perl = TRUE)
read.csv(text = ch, header = FALSE, col.names = cn)
5) strcapture This one does not use any read.* routine. It also uses only base R.
strcapture(strrep("(...)?", nc), DF[[1]], setNames(double(nc), cn))
6) strapplyc This is the only variation that uses a package. strapplyc
can be used to pick off successive 3 character subsets. It uses a simpler regular expression than some of our other solutions. read.csv
is used as in some of the other solutions.
library(gsubfn)
ch <- sapply(strapplyc(DF[[1]], "..."), toString)
read.csv(text = ch, header = FALSE, col.names = cn)
Note
The input in reproducible form:
Lines <- "
123456
123456789
123456789123"
DF <- read.table(text = Lines)