1

I have a column that holds values of arrays like this:

 [[["0.10", "35"], ["0.2", "36"]], [["5.1", "2"], ["90.2", "2"]]]

I need the last two in 4 separate columns (in this case: [["5.1", "2"], ["90.2", "2"]]) but only their values:

5.1 2 90.2 and 2 (in separate columns)

I understand that I can achieve this with tidyR like described here: split character data into numbers and letters

    df %>%
  separate(mycol, 
           into = c("text", "num"), 
           sep = "(?<=[A-Za-z])(?=[0-9])"
           )

but every trial and every attempt failed so far. I cannot manage to access only the last 2 (or 4) items.

I'd be grateful for any idea.. Thanks

IndiPsi
  • 37
  • 5

2 Answers2

0

We could group by rows (rowwise), then convert the 'mycol' element with fromJSON to a list of matrixes, unlist to vector, convert the vector to a data.frame with 4 columns using as.data.frame.list, wrap it in a list, then we ungroup and unnest the list column with unnest_wider (from tidyr) and finally, convert the column types based on its value with type.convert

library(dplyr)
library(jsonlite)
library(tidyr)
d %>%
  rowwise %>%
  mutate(newcol = list(setNames(as.data.frame.list(unlist(fromJSON(mycol, 
             simplifyVector  = FALSE)[[2]] )), paste0("X", 1:4)))) %>%
  ungroup %>%
  unnest_wider(c(newcol))   %>%
  type.convert(as.is = TRUE)

-output

# A tibble: 3 x 5
#  mycol                                                                                 X1    X2    X3    X4
#  <chr>                                                                              <dbl> <int> <dbl> <int>
#1 "[[[\"0.10\", \"35\"], [\"0.2\", \"36\"]], [[\"5.1\", \"2\"], [\"90.2\", \"2\"]]]"   5.1     2  90.2     2
#2 "[[[\"0.10\", \"35\"], [\"0.2\", \"36\"]], [[\"5.1\", \"2\"], [\"90.2\", \"2\"]]]"   5.1     2  90.2     2
#3 "[[[\"0.10\", \"35\"], [\"0.2\", \"36\"]], [[\"5.1\", \"2\"], [\"90.2\", \"2\"]]]"   5.1     2  90.2     2

data

d <- structure(list(mycol = c("[[[\"0.10\", \"35\"], [\"0.2\", \"36\"]], [[\"5.1\", \"2\"], [\"90.2\", \"2\"]]]", 
"[[[\"0.10\", \"35\"], [\"0.2\", \"36\"]], [[\"5.1\", \"2\"], [\"90.2\", \"2\"]]]", 
"[[[\"0.10\", \"35\"], [\"0.2\", \"36\"]], [[\"5.1\", \"2\"], [\"90.2\", \"2\"]]]"
)), class = "data.frame", row.names = c(NA, -3L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you for this suggestion, but this solution does not work for whatever reason. Does not even through an error, just keeps computing... – IndiPsi Mar 09 '21 at 17:37
  • @IndiPsi sorry, a closing `)` was missing in the code. Can you test now. I added a reproducible example – akrun Mar 09 '21 at 17:42
  • Hi, this worked. Amazing! Can you explain me a little what the function exactly does? I understand that JSON somehow can deal with the arrays, I guess? – IndiPsi Mar 09 '21 at 17:55
  • @IndiPsi I added some explanations. Hope it works – akrun Mar 09 '21 at 17:58
  • 1
    Thank you soooooooo much! :) – IndiPsi Mar 09 '21 at 18:06
0

Here's a base R solution based on regex and @akrun's data:

d1 <- sapply(strsplit(d$mycol, ","), function(x) gsub("(?!\\.)\\D", "", x, perl = T))

We first split d at the comma and pass the result to a gsubfunction which removes anything that is not a digit (\\D) and not a .. We transpose the resulting dataframe d1 to turn columns into rows and select the data of interest:

d2 <- as.data.frame(t(d1[5:8,]))
d2
   V1 V2   V3 V4
1 5.1  2 90.2  2
2 5.1  2 90.2  2
3 5.1  2 90.2  2

If you want to have the result together with the original data, then cbind and change the column names as suits you:

d3 <- cbind(d, d2)
names(d3) <- c("mycol", "x1", "x2", "x3", "x4")

Result:

d3
                                                             mycol  x1 x2   x3 x4
1 [[["0.10", "35"], ["0.2", "36"]], [["5.1", "2"], ["90.2", "2"]]] 5.1  2 90.2  2
2 [[["0.10", "35"], ["0.2", "36"]], [["5.1", "2"], ["90.2", "2"]]] 5.1  2 90.2  2
3 [[["0.10", "35"], ["0.2", "36"]], [["5.1", "2"], ["90.2", "2"]]] 5.1  2 90.2  2
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • It does and it is a little more economical. If it helps plz consider upvoting/accepting it. – Chris Ruehlemann Mar 09 '21 at 21:14
  • Hi, I upvoted it - may take some time until it is counted since I am new to Stackoverflow. I actually find the other solution a little practical in my case as I can concatenate it with more functions, for example I modified it to do the same thing with some more arrays. However, I am just grateful that I received these 2 perfect solutions for my problem that already cost my hours. So, thank you very much! – IndiPsi Mar 09 '21 at 21:30