I have a messy file that I'm attempting to parse into numeric data in R. The data is contained in a file that is not XML, but follows a specific format:
"{"metrics":{"skin_temp":{"min":81.5,"max":96.8,"sum":93480.6,
"summary":{"max_skin_temp_per_minute":null,"min_skin_temp_per_minute":null},
"values":[93.2,93.2,93.3,93.3]],"stdev":0.9,"avg":2.1},
"gsr":{"min":0.000149,"max":31.5,"sum":10300.0,
"summary":{"max_gsr_per_minute":null,"min_gsr_per_minute":null},
"values":[1.22,1.23,1.2,1.2],"stdev":9.630000000000001,"avg":10.1},
"steps":{"min":0,"max":104,"sum":4202,
"summary":{"max_steps_per_minute":null,"min_steps_per_minute":null},
"values":[0,0,0,0]],"stdev":13.8,"avg":4}}"
All I'm interested in is the code that comes in chunks after the "values"
labels (this information is included by the website I'm pulling the data from, but I can easily compute summary statistics in R if I want them).
I know there is an easier way, but the code I have so far looks like this:
raw_data <- gsub('\\"', '', raw_data)
analysis_data <- c()
positioner <- 0
for (x in 1:3) {
# find where the data starts (and add 8 more for the 'values' text)
data_start <- regexpr("values:[", substring(raw_data, positioner),
fixed=TRUE)[[1]] + 8 + positioner
data_end <- regexpr("]", substring(raw_data, data_start),
fixed=TRUE)[[1]] + data_start - 2
data_col <- as.numeric(strsplit(substring(raw_data, data_start,
data_end), ", ")[[1]])
analysis_data <- cbind(analysis_data, data_col)
positioner <- positioner + data_end
}
Sometimes this works, but sometimes the positioner
variable gets tricked. Is there a simpler way to pull this code?