I'm relatively new to R, so I don't think as clearly in vector-space as more-experienced users do. I have a data frame that's formatted like so:
metric timestamp value tag1 tag2 tag3 tag4 tag5 tag6 tag7 tag8 tag9 tag10
1 dummy.random.unif 1367848802 0.9936670064926147 host=localhost blah=foo NA NA NA NA NA NA NA NA
2 dummy.random.unif 1367848822 0.19621700048446655 host=localhost blah=bar NA NA NA NA NA NA NA NA
3 dummy.linear 1367848842 97.6 shmoo=whatever NA NA NA NA NA NA NA NA NA
4 dummy.random.unif 1367848862 0.3171229958534241 host=localhost blah=foo NA NA NA NA NA NA NA NA
5 dummy.linear 1367848882 97.7 shmoo=whatever NA NA NA NA NA NA NA NA NA
6 dummy.random.unif 1367848902 0.2197140008211136 host=localhost blah=foo NA NA NA NA NA NA NA NA
As you can see, the columns tag1:tag10
contain key-value pairs. But not always the same keys, and not always the same number of keys. I want to convert this data frame to something more like this, which is more convenient for consumption:
metric timestamp value tag.host tag.blah tag.shmoo
1 dummy.random.unif 1367848802 0.9936670064926147 localhost foo NA
2 dummy.random.unif 1367848822 0.19621700048446655 localhost bar NA
3 dummy.linear 1367848842 97.6 NA NA whatever
4 dummy.random.unif 1367848862 0.3171229958534241 localhost foo NA
5 dummy.linear 1367848882 97.7 NA NA whatever
6 dummy.random.unif 1367848902 0.2197140008211136 localhost foo whatever
Now I know I could do this procedurally, but it would be clunky, and I've heard that the correct way to use R is to think about operations on entire vectors (rather than looping over them). I've spent a few hours trying to figure out the right permutation of do.call
, daply
, strsplit
, and so on, but I'm not getting anywhere.
What is a clean, R-ish way to solve this problem?