Replacing a dataframe entry with the last value in a listed entry in R

Question

I have a dataframe that looks like the below:

BaseRating    contRating Participant
5,4,6,3,2,4       5        01       
4                 4        01

I would first like to run some code that looks to see whether there are any commas in the dataframe, and returns a column number of where that is. I have tried some of the solutions in the questions below, which don't seem to work when looking for a comma instead of a string/whole value? I'm probably missing something simple here but any help appreciated!

Selecting data frame rows based on partial string match in a column

Filter rows which contain a certain string

Check if value is in data frame

Having determined whether there are commas in my data, I then want to extract just the last number in the list separated by commas in that entry, and replace the entry with that value. For instance, I want the first row in the BaseRating column to become '4' because it is last in that list.

Is there a way to do this in R without manually changing the number?

PaulS · Accepted Answer · 2022-01-25T17:42:20.273

A possible solution is bellow.

EXPLANATION

In what follows, I will explain the regex expression used in str_extract function, as asked for by @milsandhills:

The symbol | in the middle means the logical OR operator.
We use that because BaseRating can have multiple numbers or only one number — hence the need to use |, to treat each case separately.
The left-hand side of | means a number formed by one or more digits (\\d+), which starts (^) and finishes the string ($).
The right-hand side of | means a number formed by one or more digits (\\d+), which finishes the string ($). And (?<=\\,) is used to guarantee that the number is preceded by a comma.

You can find more details at the stringr cheat sheet.

library(tidyverse)

df <- data.frame(
  BaseRating = c("5,4,6,3,2,4", "4"),
  contRating = c(5L, 4L),
  Participant = c(1L, 1L)
)

df %>% 
  mutate(BaseRating = sapply(BaseRating, 
         function(x) str_extract(x, "^\\d+$|(?<=\\,)\\d+$") %>% as.integer))

#>   BaseRating contRating Participant
#> 1          4          5           1
#> 2          4          4           1

Or:

library(tidyverse)

df %>% 
  separate_rows(BaseRating, sep = ",", convert = TRUE) %>% 
  group_by(contRating, Participant) %>% 
  summarise(BaseRating = last(BaseRating), .groups = "drop") %>% 
  relocate(BaseRating, .before = 1)

#> # A tibble: 2 × 3
#>   BaseRating contRating Participant
#>        <int>      <int>       <int>
#> 1          4          4           1
#> 2          4          5           1

Thanks, this is really helpful! My actual data frame has a lot more columns than in my user-friendly example, so I wondered if there is a quicker way than grouping by all my existing columns? — milsandhills, Jan 24 '22 at 16:21
Maybe the new solution I just added to my answer is what you are looking for, @milsandhills. — PaulS, Jan 24 '22 at 16:26
Your second option works wonderfully. Could you perhaps explain a bit more about what the code within the str_extract function is actually doing, as I'm a bit puzzled by all the symbols? — milsandhills, Jan 25 '22 at 15:40
I have just added an explanation to my answer, @milsandhills. Hope that helps you! If you have any further question, please do not hesitate and ask me. — PaulS, Jan 25 '22 at 17:16

score 0 · Answer 2 · answered Jan 24 '22 at 17:16

If we want a quick option, we can use trimws from base R

df$BaseRating <- as.numeric(trimws(df$BaseRating, whitespace = ".*,"))

-output

> df
  BaseRating contRating Participant
1          4          5           1
2          4          4           1

Or another option is stri_extract_last

library(stringi)
df$BaseRating <- as.numeric(stri_extract_last_regex(df$BaseRating, "\\d+"))

data

df <- structure(list(BaseRating = c("5,4,6,3,2,4", "4"), contRating = 5:4, 
    Participant = c(1L, 1L)), class = "data.frame", row.names = c(NA, 
-2L))

Replacing a dataframe entry with the last value in a listed entry in R

2 Answers2

data