Replace semicolon-separated values to tab

Question

I am trying to convert the data which I have in txt file:

4.0945725440979;4.07999897003174;4.0686674118042;4.05960083007813;4.05218315124512;...

to a column (table) where the values are separated by tab.

4.0945725440979
4.07999897003174
4.0686674118042...

So far I tried

mydata <- read.table("1.txt", header = FALSE)
separate_data<- strsplit(as.character(mydata), ";")

But it does not work. separate_data in this case consist only of 1 element:

[[1]]
[1] "1"

Check the documentation of `read.csv` by typing `?read.csv`. There you will find what @akrun mentioned. (or `?read.table`, of course) — Manuel Bickel, Dec 12 '17 at 11:34
It looks like you're not dealing with rectangular data here. Therefore I recommend `scan(file = "path/to/myfile.txt", sep = ";")`. Afterwards just use `data.frame` to put the result in the column of a data.frame — talat, Dec 12 '17 at 11:37
Possible duplicate https://stackoverflow.com/questions/13773770/split-comma-separated-column-into-separate-rows — zx8754, Dec 12 '17 at 11:38

Len Greski · Answer 1 · 2017-12-12T12:37:38.720

Based on the OP, it's not directly stated whether the raw data file contains multiple observations of a single variable, or should be broken into n-tuples. Since the OP does state that read.table results in a single row where s/he expects it to contain multiple rows, we can conclude that the correct technique is to use scan(), not read.table().

If the data in the raw data file represents a single variable, then the solution posted in comments by @docendo works without additional effort. Otherwise, additional work is required to tidy the data.

Here is an approach using scan() that reads the file into a vector, and breaks it into observations containing 5 variables.

rawData <- "4.0945725440979;4.07999897003174;4.0686674118042;4.05960083007813;4.05218315124512;4.0945725440979;4.07999897003174;4.0686674118042;4.05960083007813;4.05218315124512"

value <- scan(textConnection(rawData),sep=";")
columns <- 5 # set desired # of columns 
observations <- length(aVector) / columns 
observation <- unlist(lapply(1:observations,function(x) rep(x,times=columns)))
variable <- rep(1:columns,times=observations)

data.frame(observation,variable,value)

...and the output:

> data.frame(observation,variable,value)
   observation variable    value
1            1        1 4.094573
2            1        2 4.079999
3            1        3 4.068667
4            1        4 4.059601
5            1        5 4.052183
6            2        1 4.094573
7            2        2 4.079999
8            2        3 4.068667
9            2        4 4.059601
10           2        5 4.052183
>

At this point the data can be converted into a wide format tidy data set with reshape2::dcast().

Note that this solution requires that the number of data values in the raw data file is evenly divisible by the number of variables.

Replace semicolon-separated values to tab

1 Answers1