-1

I want to replace certain values in R data frame(data1) . I am doing data cleaning.

there are n columns in the data frame data1. In one of the column Article_Description I want to do following operation. how can this be done in R

if data1$Article_Description in ('snova glide 4m','SNOVA Glide 4M','SNova Glide 4 M') then data1$Article_Description='SNOVA Glide 4M'; if data1$Article_Description in ('aSTAR Ride 4M','astar ride 4m') then data1$Article_Description='astar ride 4m'; if data1$Article_Description in ('CC Fresh M','cc fresh m') then data1$Article_Description='CC Fresh M'; if data1$Article_Description in ('cc ride m','CC Ride M') then data1$Article_Description='CC Ride M'; if data1$Article_Description in ('astar solution 2m','aSTAR Solution 2M') then data1$Article_Description='astar solution 2m'; if data1$Article_Description in ('astar salvation 3m','aSTAR Salvation 3M') then data1$Article_Description='astar salvation 3m'; if data1$Article_Description in ('cc chill m','CC Chill M') then data1$Article_Description='CC Chill M';

  • Please do provide some reproducible example http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – akrun Dec 29 '14 at 07:04
  • Have you looked at my solution? It just updates the whole column in one step rather than in different steps. I created a data based on what you showed. – akrun Dec 29 '14 at 08:54

2 Answers2

0

Two problems: 1) you need to use %in% rather than in, and 2) the if function is not vectorized so you cannot get useful results from passing a full vector to it. Use either ifelse or logical indexing with {<-

I'll just do the first couple, since the pattern should be clear (and I get bored easily):

data[ data1$Article_Description %in% ('snova glide 4m','SNOVA Glide 4M','SNova Glide 4 M'), 
      "Article_Description"] <- 'SNOVA Glide 4M'

data[ data1$Article_Description %in% ('aSTAR Ride 4M','astar ride 4m'), 
      "Article_Description"] <-  'astar ride 4m'; 
IRTFM
  • 258,963
  • 21
  • 364
  • 487
0

You could try this:

v1 <- sub('(?<=\\d) (?=[a-z])', '', tolower(data1[,1]), perl=TRUE)
lvls <- levels(factor(v1))

data1$NewArticle_Description <- setNames(c(lvls[1:3], 'CC Chill M', 
   'CC Fresh M', 'CC Ride M', 'SNOVA Glide 4M') ,lvls)[v1]

 head(data1)
 #  Article_Description        Val NewArticle_Description
 #1          cc fresh m  0.1528656              CC Fresh M
 #2   aSTAR Solution 2M  0.4666355       astar solution 2m
 #3     SNova Glide 4 M -1.3486217          SNOVA Glide 4M
 #4          cc chill m -0.3713309              CC Chill M
 #5      SNOVA Glide 4M  2.0481950          SNOVA Glide 4M
 #6          CC Chill M -1.0303537              CC Chill M

data

set.seed(25)
data1 <- data.frame(Article_Description= sample(c('snova glide 4m',
'SNOVA Glide 4M','SNova Glide 4 M', 'aSTAR Ride 4M','astar ride 4m', 
'CC Fresh M','cc fresh m','cc ride m','CC Ride M',  'astar solution 2m',
'aSTAR Solution 2M', 'astar salvation 3m','aSTAR Salvation 3M', 'cc chill m',
'CC Chill M'), 100, replace=TRUE), Val=rnorm(100), stringsAsFactors=FALSE) 
akrun
  • 874,273
  • 37
  • 540
  • 662