0

I apologise if the answer is simple; I'm new to R but couldn't find a solution

I have a data frame, and my column of interest ('subjects') has a list of words in each row.

title  subject
-----  -----------------------------------------------
 A     c("health sciences", "life sciences")
 B     c("biochemistry", "medicine", "life sciences")
 C     c("physics and astronomy", "mathematics")

I want to replace (and therefore classify) all biology-related words for each title with "biology". So basically, if any title has a list of subjects that are biology-related, then their subject will be replaced with the much more simple 'biology'.

So that my data frame looks like this:

title  subject
-----  -----------------------------------------------
 A     biology
 B     biology
 C     c("physics and astronomy", "mathematics")

How would I replace all words beginning with key prefixes (such as "bio", "health", "med", life" etc) to 'biology' ?

zx8754
  • 52,746
  • 12
  • 114
  • 209
Ismail Jan
  • 11
  • 1
  • You could for example do: `df$subject <- gsub("bio", "biology", df$subject)` – Anonymous Mar 21 '18 at 15:30
  • For 'health', 'med', etc. you'd have to do the same. Does that work in your df? Piece of advice: provide your question with a reproducible example. Then we can test it quickly out ourselves before posting an answer :) – Anonymous Mar 21 '18 at 15:31
  • I asked a somewhat similar question, it requires data.table https://stackoverflow.com/questions/48629202/change-value-of-all-strings-in-column-based-on-condition – tshurtz Mar 21 '18 at 15:33

1 Answers1

1

Try to adapt code:

A toy data.frame

df<-data.frame(title=c("A","B"),subject=c("health sciences", "other stuffs"))

The solution:

toMatch<-c("^bio","^health", "^med", "^life")
df$subject<-as.character(df$subject)
df[grepl(paste(toMatch,collapse="|"),df$subject ),"subject"]<-"biology"

Your output:

df
  title      subject
1     A      biology
2     B other stuffs
Terru_theTerror
  • 4,918
  • 2
  • 20
  • 39