2

This issue is giving me a lot of trouble, even though it should be fixed eaily. I have a dataset with the columns id and poster. I want to change the poster's value if the id value contains a certain string. See data below:

test_df

id                   poster
143537222999_2054    Kevin
143115551234_2049    Dave
14334_5334           Eric
1456322_4334         Mandy
143115551234_445633  Patrick
143115551234_4321    Lars
143537222999_56743   Iris

I would like to get

test_df

id                   poster
143537222999_2054    User
143115551234_2049    User
14334_5334           Eric
1456322_4334         Mandy
143115551234_445633  User
143115551234_4321    User
143537222999_56743   User

Both the columns are characters. I would like to change the poster's value to "User" if id value contains "143537222999", OR "143115551234". I have tried the following codes:

Match within/which

test_df <- within(test_df, poster[match('143115551234', test_df$id) | match('143537222999', test_df$id)] <- 'User')

This code gave me no errors, but it didn't change any of the values in the poster column. When I replace within for which, I get the error:

test_df <- which(test_df, poster[match('143115551234', test_df$id) | match('143537222999', test_df$id)] <- 'User')
Error in which(test_df, poster[match("143115551234", test_df$id) |  : 
  argument to 'which' is not logical

Match different variant

test_df <- test_df[match(id, test_df, "143115551234") | match(id, test_df, "143537222999"), test_df$poster] <- 'User'

This code gives me the error:

Error in `[<-.data.frame`(`*tmp*`, match(id, test_df, "143115551234") |  : 
  missing values are not allowed in subscripted assignments of data frames
In addition: Warning messages:
1: In match(id, test_df, "143115551234") :
  NAs introduced by coercion to integer range
2: In match(id, test_df, "143537222999") :
  NAs introduced by coercion to integer range

After looking up this error I found out that the integers in R are 32-bits and the maximum value of an integer is 2147483647. I'm not sure why i'm getting this error because R states that my column is a character.

> lapply(test_df, class)

$poster
[1] "character"

$id
[1] "character"

Grepl

test_df[grepl("143115551234", id | "143537222999", id), poster := "User"]

This code raises the error:

Error in `:=`(poster, "User") : could not find function ":="

I'm not sure what the best way is to fix this error, I have tried multiple variaties and keep getting across different errors.

I have tried multiple answers from multiple questions that were asked before on here, but I still can't get to fix some errors.

Dennis Loos
  • 113
  • 2
  • 9

2 Answers2

1

Use grepl with ifelse:

df$poster <- ifelse(grepl("143537222999|143115551234", df$id), "User", df$poster)

enter image description here

Demo

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

You may try this using grepl.

df[grepl('143115551234|143537222999', df$id),"poster"] <- "User"

So, all the true for above matched in poster column getting replaced by "User"

> df[grepl('143115551234|143537222999', df$id),"poster"] <- "User"
> df
                   id poster
1   143537222999_2054   User
2   143115551234_2049   User
3          14334_5334   Eric
4        1456322_4334  Mandy
5 143115551234_445633   User
6   143115551234_4321   User
7  143537222999_56743   User
PKumar
  • 10,971
  • 6
  • 37
  • 52