I'd like to split a dataset made of character strings into columns specified by start and end.
My dataset looks something like this:
>head(templines,3)
[1] "201801 1 78"
[2] "201801 2 67"
[3] "201801 1 13"
and i'd like to split it by specifying my columns using the data dictionary:
>dictionary
col_name col_start col_end
year 1 4
week 5 6
gender 8 8
age 11 12
so it becomes:
year week gender age
2018 01 1 78
2018 01 2 67
2018 01 1 13
In reality the data comes from a long running survey and the white spaces between some columns represent variables that are no longer collected. It has many variables so i need a solution that would scale.
In tidyr::separate
it looks like you can only split by specifying the position to split at, rather than the start and end positions. Is there a way to use start / end?
I thought of doing this with read_fwf
but I can't seem to be able to use it on my already loaded dataset. I only managed to get it to work by first exporting as a txt and then reading from this .txt:
write_lines(templines,"t1.txt")
read_fwf("t1.txt",
fwf_positions(start = dictionary$col_start,
end = dictionary$col_end,
col_names = dictionary$col_name)
is it possible to use read_fwf
on an already loaded dataset?