1

I have a plain text format file data_table_complete of size of 13 GB with over 100 columns in which 1 there is a column related to color.

When I used the command levels(data_table_complete$color), there were 544 levels.

On primary search I found 1 level named as "OTHERS", containing some 4000 odd items and another one as "OTHETRS", containing some 600 odd items, which is possibly the spelling mistake of the former one.

So I thought to merge them as "OTHERS" but I found that there was possible data loss.

Can anyone guide me how to accomplish this task?

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
heybhai
  • 77
  • 2
  • 9
  • Please see [how to post a minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – hrbrmstr Apr 08 '14 at 18:26
  • I suspect that you are not using the best tool here: for such a big file, I would go with `awk` rather than with `R`. See in particular if are able to adapt this to your problem: http://stackoverflow.com/questions/9705940/awk-replace-and-write-a-column-value-in-the-input-file – Jealie Apr 08 '14 at 18:28
  • Hi Jealie I could have used awk or some better tools but as it's being asked by client to accomplish this task I am bound to do it his way.Thanks @hrbrmstr for suggestions. – heybhai Apr 10 '14 at 16:57

1 Answers1

1

So lets say this is your data frame

df <- data.frame(color = factor(c(rep("red",4), rep("OTHERS", 4),rep("blue", 5), rep("OTHETRS",5))))
table(df$color)
#blue  OTHERS OTHETRS     red 
#   5       4       5       4 

You can simply do

df$color <- factor(ifelse(df$color == "OTHERS" | df$color == "OTHETRS", "OTHETRS", as.character(df$color)))
table(df$color)
#blue OTHETRS     red 
#   5       9       4 
David Arenburg
  • 91,361
  • 17
  • 137
  • 196