Merging of 2 factors in R with large no. of levels

Question

I have a plain text format file data_table_complete of size of 13 GB with over 100 columns in which 1 there is a column related to color.

When I used the command levels(data_table_complete$color), there were 544 levels.

On primary search I found 1 level named as "OTHERS", containing some 4000 odd items and another one as "OTHETRS", containing some 600 odd items, which is possibly the spelling mistake of the former one.

So I thought to merge them as "OTHERS" but I found that there was possible data loss.

Can anyone guide me how to accomplish this task?

Please see [how to post a minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — hrbrmstr, Apr 08 '14 at 18:26
I suspect that you are not using the best tool here: for such a big file, I would go with `awk` rather than with `R`. See in particular if are able to adapt this to your problem: http://stackoverflow.com/questions/9705940/awk-replace-and-write-a-column-value-in-the-input-file — Jealie, Apr 08 '14 at 18:28
Hi Jealie I could have used awk or some better tools but as it's being asked by client to accomplish this task I am bound to do it his way.Thanks @hrbrmstr for suggestions. — heybhai, Apr 10 '14 at 16:57

David Arenburg · Accepted Answer · 2014-04-08T19:52:41.740

1

So lets say this is your data frame

df <- data.frame(color = factor(c(rep("red",4), rep("OTHERS", 4),rep("blue", 5), rep("OTHETRS",5))))
table(df$color)
#blue  OTHERS OTHETRS     red 
#   5       4       5       4

You can simply do

df$color <- factor(ifelse(df$color == "OTHERS" | df$color == "OTHETRS", "OTHETRS", as.character(df$color)))
table(df$color)
#blue OTHETRS     red 
#   5       9       4

edited Apr 08 '14 at 19:52

answered Apr 08 '14 at 19:24

David Arenburg

91,361
17
137
196

Later I accomplished the task on **Java** using replace function – heybhai Apr 11 '14 at 11:36

Merging of 2 factors in R with large no. of levels

1 Answers1