Parse Regular expression in R

Question

I'm very new to Regular expressions and have some JSON data in payload.hours:

These represent business hours for each day of the week for different places. What's the cleanest way of extracting this data into new columns of the same data-frame? I want each day's business hours to be extracted into a separate column.

Attaching data here - :

structure(list(payload.name = c("Windsor Castle", "Penrhyn Castle", 
"Barcaldine Castle", "University of Birmingham", "Kensington Gardens", 
"Bristol Zoo Gardens", "Alton Towers Resort", "The Royal Shakespeare Company", 
"University of Surrey", "Hampton Court Palace", "BFI Imax", "Usher Gallery", 
"The Cambridge Chop House", "The Balmoral Hotel", "Scottish National Portrait Gallery", 
"Old Trafford Stadium", "Exeter Cathedral", "Excel London", "Cavern Club"
), payload.hours_display = c("Mon-Sat 11:30 AM-11:00 PM; Sun 12:00 PM-11:00 PM", 
"Open Daily 11:00 AM-5:00 PM", "Open Daily 9:00 AM-10:00 PM", 
"Mon-Fri 8:00 AM-7:00 PM; Sat-Sun 10:00 AM-3:00 PM", "Open Daily 5:00 AM-11:59 PM", 
"Open Daily 9:00 AM-5:00 PM", "Open Daily 10:30 AM-5:00 PM", 
"Mon-Sat 11:30 AM-11:00 PM; Sun 11:30 AM-4:30 PM", "Mon-Fri 8:30 AM-11:00 PM; Sat-Sun 10:00 AM-11:00 PM", 
"Mon-Sat 12:00 PM-11:00 PM; Sun 12:00 PM-10:30 PM", "Mon 1:30 PM-11:00 PM; Tue 10:30 AM-11:00 PM; Wed-Fri 1:30 PM-11:00 PM; Sat-Sun 11:00 AM-11:00 PM", 
"Open Daily 10:00 AM-4:00 PM", "Mon-Thu 12:00 PM-10:30 PM; Fri-Sat 12:00 PM-11:00 PM; Sun 12:00 PM-9:30 PM", 
"Open Daily 12:00 AM-12:00 AM", "Mon-Wed 10:00 AM-5:00 PM; Thu 10:00 AM-7:00 PM; Fri-Sun 10:00 AM-5:00 PM", 
"Open Daily 9:30 AM-5:00 PM", "Mon-Sat 9:30 AM-4:45 PM; Sun 11:30 AM-3:30 PM", 
"Open Daily 5:30 AM-10:00 PM", "Mon 12:00 AM-12:30 AM, 11:00 AM-8:00 PM; Tue-Wed 11:00 AM-8:00 PM; Thu 11:00 AM-11:59 PM; Fri 12:00 AM-2:00 AM, 11:00 AM-11:59 PM; Sat-Sun 12:00 AM-2:30 AM, 11:00 AM-11:59 PM"
), payload.hours = c("{\"monday\":[[\"11:30\",\"23:00\"]],\"tuesday\":[[\"11:30\",\"23:00\"]],\"wednesday\":[[\"11:30\",\"23:00\"]],\"thursday\":[[\"11:30\",\"23:00\"]],\"friday\":[[\"11:30\",\"23:00\"]],\"saturday\":[[\"11:30\",\"23:00\"]],\"sunday\":[[\"12:00\",\"23:00\"]]}", 
"{\"monday\":[[\"11:00\",\"17:00\"]],\"tuesday\":[[\"11:00\",\"17:00\"]],\"wednesday\":[[\"11:00\",\"17:00\"]],\"thursday\":[[\"11:00\",\"17:00\"]],\"friday\":[[\"11:00\",\"17:00\"]],\"saturday\":[[\"11:00\",\"17:00\"]],\"sunday\":[[\"11:00\",\"17:00\"]]}", 
"{\"monday\":[[\"9:00\",\"22:00\"]],\"tuesday\":[[\"9:00\",\"22:00\"]],\"wednesday\":[[\"9:00\",\"22:00\"]],\"thursday\":[[\"9:00\",\"22:00\"]],\"friday\":[[\"9:00\",\"22:00\"]],\"saturday\":[[\"9:00\",\"22:00\"]],\"sunday\":[[\"9:00\",\"22:00\"]]}", 
"{\"monday\":[[\"8:00\",\"19:00\"]],\"tuesday\":[[\"8:00\",\"19:00\"]],\"wednesday\":[[\"8:00\",\"19:00\"]],\"thursday\":[[\"8:00\",\"19:00\"]],\"friday\":[[\"8:00\",\"19:00\"]],\"saturday\":[[\"10:00\",\"15:00\"]],\"sunday\":[[\"10:00\",\"15:00\"]]}", 
"{\"monday\":[[\"5:00\",\"23:59\"]],\"tuesday\":[[\"5:00\",\"23:59\"]],\"wednesday\":[[\"5:00\",\"23:59\"]],\"thursday\":[[\"5:00\",\"23:59\"]],\"friday\":[[\"5:00\",\"23:59\"]],\"saturday\":[[\"5:00\",\"23:59\"]],\"sunday\":[[\"5:00\",\"23:59\"]]}", 
"{\"monday\":[[\"9:00\",\"17:00\"]],\"tuesday\":[[\"9:00\",\"17:00\"]],\"wednesday\":[[\"9:00\",\"17:00\"]],\"thursday\":[[\"9:00\",\"17:00\"]],\"friday\":[[\"9:00\",\"17:00\"]],\"saturday\":[[\"9:00\",\"17:00\"]],\"sunday\":[[\"9:00\",\"17:00\"]]}", 
"{\"monday\":[[\"10:30\",\"17:00\"]],\"tuesday\":[[\"10:30\",\"17:00\"]],\"wednesday\":[[\"10:30\",\"17:00\"]],\"thursday\":[[\"10:30\",\"17:00\"]],\"friday\":[[\"10:30\",\"17:00\"]],\"saturday\":[[\"10:30\",\"17:00\"]],\"sunday\":[[\"10:30\",\"17:00\"]]}", 
"{\"monday\":[[\"11:30\",\"23:00\"]],\"tuesday\":[[\"11:30\",\"23:00\"]],\"wednesday\":[[\"11:30\",\"23:00\"]],\"thursday\":[[\"11:30\",\"23:00\"]],\"friday\":[[\"11:30\",\"23:00\"]],\"saturday\":[[\"11:30\",\"23:00\"]],\"sunday\":[[\"11:30\",\"16:30\"]]}", 
"{\"monday\":[[\"8:30\",\"23:00\"]],\"tuesday\":[[\"8:30\",\"23:00\"]],\"wednesday\":[[\"8:30\",\"23:00\"]],\"thursday\":[[\"8:30\",\"23:00\"]],\"friday\":[[\"8:30\",\"23:00\"]],\"saturday\":[[\"10:00\",\"23:00\"]],\"sunday\":[[\"10:00\",\"23:00\"]]}", 
"{\"monday\":[[\"12:00\",\"23:00\"]],\"tuesday\":[[\"12:00\",\"23:00\"]],\"wednesday\":[[\"12:00\",\"23:00\"]],\"thursday\":[[\"12:00\",\"23:00\"]],\"friday\":[[\"12:00\",\"23:00\"]],\"saturday\":[[\"12:00\",\"23:00\"]],\"sunday\":[[\"12:00\",\"22:30\"]]}", 
"{\"monday\":[[\"13:30\",\"23:00\"]],\"tuesday\":[[\"10:30\",\"23:00\"]],\"wednesday\":[[\"13:30\",\"23:00\"]],\"thursday\":[[\"13:30\",\"23:00\"]],\"friday\":[[\"13:30\",\"23:00\"]],\"saturday\":[[\"11:00\",\"23:00\"]],\"sunday\":[[\"11:00\",\"23:00\"]]}", 
"{\"monday\":[[\"10:00\",\"16:00\"]],\"tuesday\":[[\"10:00\",\"16:00\"]],\"wednesday\":[[\"10:00\",\"16:00\"]],\"thursday\":[[\"10:00\",\"16:00\"]],\"friday\":[[\"10:00\",\"16:00\"]],\"saturday\":[[\"10:00\",\"16:00\"]],\"sunday\":[[\"10:00\",\"16:00\"]]}", 
"{\"monday\":[[\"12:00\",\"22:30\"]],\"tuesday\":[[\"12:00\",\"22:30\"]],\"wednesday\":[[\"12:00\",\"22:30\"]],\"thursday\":[[\"12:00\",\"22:30\"]],\"friday\":[[\"12:00\",\"23:00\"]],\"saturday\":[[\"12:00\",\"23:00\"]],\"sunday\":[[\"12:00\",\"21:30\"]]}", 
"{\"monday\":[[\"00:00\",\"24:00\"]],\"tuesday\":[[\"00:00\",\"24:00\"]],\"wednesday\":[[\"00:00\",\"24:00\"]],\"thursday\":[[\"00:00\",\"24:00\"]],\"friday\":[[\"00:00\",\"24:00\"]],\"saturday\":[[\"00:00\",\"24:00\"]],\"sunday\":[[\"00:00\",\"24:00\"]]}", 
"{\"monday\":[[\"10:00\",\"17:00\"]],\"tuesday\":[[\"10:00\",\"17:00\"]],\"wednesday\":[[\"10:00\",\"17:00\"]],\"thursday\":[[\"10:00\",\"19:00\"]],\"friday\":[[\"10:00\",\"17:00\"]],\"saturday\":[[\"10:00\",\"17:00\"]],\"sunday\":[[\"10:00\",\"17:00\"]]}", 
"{\"monday\":[[\"9:30\",\"17:00\"]],\"tuesday\":[[\"9:30\",\"17:00\"]],\"wednesday\":[[\"9:30\",\"17:00\"]],\"thursday\":[[\"9:30\",\"17:00\"]],\"friday\":[[\"9:30\",\"17:00\"]],\"saturday\":[[\"9:30\",\"17:00\"]],\"sunday\":[[\"9:30\",\"17:00\"]]}", 
"{\"monday\":[[\"9:30\",\"16:45\"]],\"tuesday\":[[\"9:30\",\"16:45\"]],\"wednesday\":[[\"9:30\",\"16:45\"]],\"thursday\":[[\"9:30\",\"16:45\"]],\"friday\":[[\"9:30\",\"16:45\"]],\"saturday\":[[\"9:30\",\"16:45\"]],\"sunday\":[[\"11:30\",\"15:30\"]]}", 
"{\"monday\":[[\"5:30\",\"22:00\"]],\"tuesday\":[[\"5:30\",\"22:00\"]],\"wednesday\":[[\"5:30\",\"22:00\"]],\"thursday\":[[\"5:30\",\"22:00\"]],\"friday\":[[\"5:30\",\"22:00\"]],\"saturday\":[[\"5:30\",\"22:00\"]],\"sunday\":[[\"5:30\",\"22:00\"]]}", 
"{\"monday\":[[\"00:00\",\"00:30\"],[\"11:00\",\"20:00\"]],\"tuesday\":[[\"11:00\",\"20:00\"]],\"wednesday\":[[\"11:00\",\"20:00\"]],\"thursday\":[[\"11:00\",\"23:59\"]],\"friday\":[[\"00:00\",\"2:00\"],[\"11:00\",\"23:59\"]],\"saturday\":[[\"00:00\",\"2:30\"],[\"11:00\",\"23:59\"]],\"sunday\":[[\"00:00\",\"2:30\"],[\"11:00\",\"23:59\"]]}"
)), .Names = c("payload.name", "payload.hours_display", "payload.hours"
), row.names = c(3L, 5L, 10L, 14L, 18L, 19L, 24L, 32L, 33L, 35L, 
36L, 37L, 38L, 42L, 43L, 44L, 45L, 47L, 54L), class = "data.frame")

I tried to use the plyr::ldply() function to flatten this JSON out based on the answer here - How to convert a list consisting of vector of different lengths to a usable data frame in R? and came pretty close but the times are not arranged in proper order (Some days have multiple operating hours)

Ultimately I'd like to manipulate the data-frame by extracting businesses that open before 12 pm on a Sunday for instance.

This works for this particular value of x, but when I try to apply it to the entire column (updated question), I get a parse error: trailing garbage — Ashwin Ramesh, Nov 28 '17 at 20:47
The content within `$payload.hours` is not immediately `data.frame`able. Can you include what your expected output would be for the first row of your initial data? (It probably won't be a very-wide-single-row for each JSON payload, you may do well to use the `tidyr` package, dealing with nested data initially until you can unnest it smartly.) — r2evans, Nov 28 '17 at 21:35
`dplyr::bind_cols(xdf, jsonlite::stream_in(textConnection(paste0(xdf$payload.hours, collapse="\n"))))` assuming `xdf` is the data frame name. — hrbrmstr, Nov 28 '17 at 21:58

score 1 · Answer 1 · answered Nov 28 '17 at 21:56

UPDATE

I've managed to extract what I need into separate columns by following the below steps.

mydf = cbind(mydf, do.call(plyr::rbind.fill, lapply(paste0("
[",mydf$payload.hours,"]"), function(x) jsonlite::fromJSON(x))))

This creates new columns by extracting values for each day as a list of strings.

Next I unlist the values and flatten the data-frame

x = stri_list2matrix(mydf$sunday, byrow=TRUE).
colnames(x) = c("From.1", "To.1", "From.2", "To.2")

Finally I bind this to the original data-frame to obtain the result.

mydf = data.frame(mydf,x)

After cleaning up, it looks like this (Shown for a particular day):

score 1 · Accepted Answer · answered Nov 28 '17 at 21:58

My feeling is that if you can use the JSON libraries cited above, that would be the way to go. I too, am not familiar with JSON data. Assuming you choose to do what I would in a pinch and just work with what you've got, here's a couple ideas:

So, using your example data (say, df1):

library(splitstackshape)
library(reshape2)
# form your payload.hours column into columns:
df2 <- cbind(df,cSplit(df,splitCols='payload.hours',sep='],"',direction='wide'))
# your example actually comes out neatly to 7 columns.  I'd put it into a long format for easier logical operations:
df3 <- melt(df2,id.vars=c('payload.name','payload.hours_display','payload.hours'))
# Now, you can do your regexp if desired more easily:
df3$start <- gsub(x=df3$value,pattern='^.*y.*"(.*)","(.*)"]$',replacement='\\1')
df3$end <- gsub(x=df3$value,pattern='^.*y.*"(.*)","(.*)"]$',replacement='\\2')

At which point you could use standard POSIXct, strftime, or strptime functions to transform the strings into times and manipulate the formatting as desired.

The code above seems to basically work for your example data with the exception of sunday, but I have to run now. I'm happy to edit later if this is helpful and you have any questions.

Parse Regular expression in R

2 Answers2