This is not a easy problem for me, to be honest. I have searched quite a long time but there seems no similar question.
Here's how a few rows and columns of my data looks like:
V1 V2 V3
1 74c1c25f4b283fa74a5514307b0d0278 1#11:2241 1#10:249
2 08f5b445ec6b29deba62e6fd8b0325a6 20#7:249 20#5:83
3 4b7f6f4e2bf237b6cc58f57142bea5c0 4#16:249 24:913
So, the cells are in a format like "class(#subclass):value". I want to make a table like this:
V1 1#10 1#11 4#16 20#5 20#7 24
1 74c1c25f4b283fa74a5514307b0d0278 249 2241 0 0 0 0
2 08f5b445ec6b29deba62e6fd8b0325a6 0 0 0 83 249 0
3 4b7f6f4e2bf237b6cc58f57142bea5c0 0 0 249 0 0 913
Because I haven't met this kind of data structure before, I am not sure if this is the best way to store it. But so far, this is the only table format I could come up with. If you have any suggestion about it, please leave a comment.
Then, I first parsed it as the following:
V1 V2_1_1 V2_1_2 V2_2_1 V3_1_1 V3_1_2 V3_2_1
1 74c1c25f4b283fa74a5514307b0d0278 1 11 2241 1 10 249
2 08f5b445ec6b29deba62e6fd8b0325a6 20 7 249 20 5 83
3 4b7f6f4e2bf237b6cc58f57142bea5c0 4 16 249 24 NA 913
Now, I don't know how to convert it to the table format I want. Any package in R can I use to do it?
two links are attached below
original data: https://www.dropbox.com/s/aqay5dn4r3m3kdp/temp1TrainPoiFile.R?dl=0
parsed data: https://www.dropbox.com/s/0oj8ic1pd2rew0h/temp3TrainPoiFile.R?dl=0
Thank you very much for you help. Please leave a comment if there's any question about it.
Thanks for Walt's and Jack's answer. I used tidyr
to solve the problem. Below is how I did it.
Read file
source("temp1TrainPoiFile.R")
gather columns to key-value pair
temp2TrainPoiFile <- temp1TrainPoiFile %>% gather( key=V1, value=data, -V1)
extract to two columns
temp3TrainPoiFile <- temp2TrainPoiFile %>% extract(col=data, into=c("class","value"), regex="(.*):(.*)")
adding row numbers
row <- 1:nrow(temp3TrainPoiFile)
temp3TrainPoiFile <- cbind(row, temp3TrainPoiFile)
spread key-value to two columns
TrainPoiFile <- temp3TrainPoiFile %>% spread(key=class, value=value, fill=0)