R: filter value from series of values

Question

I have dataframe like this

situations <- c("{17: '', 80: '', 55: '693', 29: '', 103: '19', 233: '872', 6: '', 20: '', 230: '99.3', 215: '', 102: '47.7', 56: 'Center', 146: '85.1', 147: '40.6', 23: '', 231: '47.8'}", "{103: '1.9', 18: '', 154: '', 147: '48.6', 22: '', 233: '879', 76: '', 459: '', 55: '719', 29: '', 102: '54.2', 56: 'Center', 328: '', 146: '94.7', 20: ''}", "{215: '', 22: '', 56: 'Center', 233: '731', 103: '19', 78: '', 230: '97.7', 146: '78.2', 20: '', 102: '50.4', 29: '', 18: '', 55: '899', 147: '43.3', 82: '', 231: '48.7'}")

events <- c("A", "B", "C")

df <- data.frame(situations, events)

And I want to filter out rows containing 6 (or 6: ''). Could you help me please?

I've tried grepl to select rows, but output is not desired.

df$filter <- as.integer(grepl('6:', df$situations))

It leaves all values containing 6. Like 76, 146, 56 etc

In Python I use such simple code for this task, but cannot find something similar in R.

df['is_ok'] = df['situations'].apply(lambda x: True if 6 in x else False)

Solution

I've found the desired output thanks to @MrFlick

as.integer(grepl('\\b6:', df$situations))

Thanks everyone

Is that JSON? You need to parse it first and then filter. Have a look at [this](https://stackoverflow.com/questions/2061897/parse-json-with-r) — Sotos, Mar 20 '23 at 14:31
Do you want to filter the rows out or do you want to create a new column with an indicator value? There is no `df$qualifiers` in your example. Did you mean `df$situations `? It seem the data in python must be in a different format. Like you have actually parsed the string into a dict but in R you just have a string. You could use `as.integer(grepl('\\b6:', df$situations))` to make sure the "6:" is proceeded by a word boundary — MrFlick, Mar 20 '23 at 14:33
@MrFlick ahhh yes. That looks more like a python dictionary rather than JSON. In that case read it in a pandas and save it as csv? — Sotos, Mar 20 '23 at 14:36
@Sotos I don't think it is JSON, otherwise the keys would have to be strings? I think MrFlick is correct that it's a dict from Python that hasn't been converted to a native R data type like a list. — joran, Mar 20 '23 at 14:36
It's not legal JSON, but it can be coerced easily. What is your expected output? It's not really a frame-like object; at best it's a list of frames, each with 1 row and 15-16 columns. — r2evans, Mar 20 '23 at 14:37
@MrFlick yes, I mean df$situations (already edited). Sorry. Actually I want to filter the rows. I've tried to create new column with an indicator value as first step of filtering. Maybe there is more simple way. I've tried your suggestion but it returns with 0 rows with indicator 1. — Delopera, Mar 20 '23 at 14:44
@Sotos no, it's not JSON. I have large csv with thousand of rows and 15 columns. "situations" – just one of this columns. — Delopera, Mar 20 '23 at 14:45
@r2evans I want to exclude rows containing 6 (or 6: '' to be precise) — Delopera, Mar 20 '23 at 14:48
So something like this? `df %>% separate_rows(situations, sep = ', ') %>% separate(situations, into = c('v1', 'v2'), sep = ':') %>% filter(v1 != 6 & v2 != '') ` ... (Unpolished) — Sotos, Mar 20 '23 at 15:01
@Delopera When I use `as.integer(grepl('\\b6:', df$situations))` on the sample in your question I get `1 0 0` as output. Are you sure you don't get any 1's? Did you copy/paste the input above? — MrFlick, Mar 20 '23 at 15:09

r2evans · Answer 1 · 2023-03-20T14:55:08.667

It's not truly JSON, but we can coerce it with a little regex.

situations |>
  gsub("'", '"', x = _) |>
  gsub("([^ {,]+):", '"\\1":', x = _) |>
  lapply(jsonlite::fromJSON) |>
  Filter(function(z) !"6" %in% names(z), x = _) |>
  str()
# List of 2
#  $ :List of 15
#   ..$ 103: chr "1.9"
#   ..$ 18 : chr ""
#   ..$ 154: chr ""
#   ..$ 147: chr "48.6"
#   ..$ 22 : chr ""
#   ..$ 233: chr "879"
#   ..$ 76 : chr ""
#   ..$ 459: chr ""
#   ..$ 55 : chr "719"
#   ..$ 29 : chr ""
#   ..$ 102: chr "54.2"
#   ..$ 56 : chr "Center"
#   ..$ 328: chr ""
#   ..$ 146: chr "94.7"
#   ..$ 20 : chr ""
#  $ :List of 16
#   ..$ 215: chr ""
#   ..$ 22 : chr ""
#   ..$ 56 : chr "Center"
#   ..$ 233: chr "731"
#   ..$ 103: chr "19"
#   ..$ 78 : chr ""
#   ..$ 230: chr "97.7"
#   ..$ 146: chr "78.2"
#   ..$ 20 : chr ""
#   ..$ 102: chr "50.4"
#   ..$ 29 : chr ""
#   ..$ 18 : chr ""
#   ..$ 55 : chr "899"
#   ..$ 147: chr "43.3"
#   ..$ 82 : chr ""
#   ..$ 231: chr "48.7"

(Requires R >= 4.2 for the _ special argument.)

If you need numbers to be numbers, we can use type.convert:

situations |>
  gsub("'", '"', x = _) |>
  gsub("([^ {,]+):", '"\\1":', x = _) |>
  lapply(function(z) type.convert(jsonlite::fromJSON(z), as.is=TRUE)) |>
  Filter(function(z) !"6" %in% names(z), x = _) |>
  str()
# List of 2
#  $ :List of 15
#   ..$ 103: num 1.9
#   ..$ 18 : logi NA
#   ..$ 154: logi NA
#   ..$ 147: num 48.6
#   ..$ 22 : logi NA
#   ..$ 233: int 879
#   ..$ 76 : logi NA
#   ..$ 459: logi NA
#   ..$ 55 : int 719
#   ..$ 29 : logi NA
#   ..$ 102: num 54.2
#   ..$ 56 : chr "Center"
#   ..$ 328: logi NA
#   ..$ 146: num 94.7
#   ..$ 20 : logi NA
#  $ :List of 16
#   ..$ 215: logi NA
#   ..$ 22 : logi NA
#   ..$ 56 : chr "Center"
#   ..$ 233: int 731
#   ..$ 103: int 19
#   ..$ 78 : logi NA
#   ..$ 230: num 97.7
#   ..$ 146: num 78.2
#   ..$ 20 : logi NA
#   ..$ 102: num 50.4
#   ..$ 29 : logi NA
#   ..$ 18 : logi NA
#   ..$ 55 : int 899
#   ..$ 147: num 43.3
#   ..$ 82 : logi NA
#   ..$ 231: num 48.7

Sorry for not clarifying this in the question but I have another columns in my dataframe. And I would not like to add another 15-16. Because of 6 can be on different positions. My expected output is dataframe without rows with 6 in "situations". — Delopera, Mar 20 '23 at 15:33

score 0 · Answer 2 · answered Mar 20 '23 at 15:28

Probably you can try py_eval from reticulate to parse the string vector situations, e.g.,

library(reticulate)
lapply(
  situations,
  function(x) {
    d <- py_eval(x)
    d[endsWith(names(d), "6")]
  }
)

and you will obtain

[[1]]
[[1]]$`6`
[1] ""

[[1]]$`56`
[1] "Center"

[[1]]$`146`
[1] "85.1"


[[2]]
[[2]]$`76`
[1] ""

[[2]]$`56`
[1] "Center"

[[2]]$`146`
[1] "94.7"


[[3]]
[[3]]$`56`
[1] "Center"

[[3]]$`146`
[1] "78.2"

R: filter value from series of values

2 Answers2