I am doing a content analysis of french politician's twitter posts dealing with immigration. As I only recently started working with strings, I am currently facing some problems regarding the exclusion of word combinations. Notably, I defined the word "identité" (or words with the same word stem) as an indicator of a tweet dealing with immigration. However, the word combination "carte d'identité" (ID card) is never actually used in this context. Therefore i would like to exclude it.
The original code looks like this:
mutate(identit = str_detect(full_text, "identit"))
So far, I tried to exclude it by using the hat operator.
mutate(identit = str_detect(full_text, "[^carte d']identité"))
which however actually includes it and articles like l'immigration and d'immigration, whereas words without articles identité or identitaire are excluded.
edit: In order to make it replicable:
df <- data.frame(text = c("Ma carte d\'identité","Notre identité", "ce n'est pas l'identité du pays", "d'identité", "tasty buns"))
df_detect <- df %>% mutate(identit = str_detect(text, "*???*"))
(Basically, in this dataframe I'd like str_detect to only detect ,"Notre identité", "ce n'est pas l'identité du pays", "d'identité")