R replace identical column character items with increasing number

Question

I have a data frame with 60000 obs. of 4 variables in the following format:

I need to replace all character items in the first column with the same character with the number 1. So "101-startups" is 1, "10i10-aps" is 2, 10x is 3 and all 10x-fund-lp are 4 and so on. The same for the second column.

How do I achieve this?

Please read [how to make a reproducible example in r](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — M--, May 02 '17 at 21:59

score 1 · Accepted Answer · edited May 02 '17 at 22:11

1

If I'm understanding your question correctly, all you need to do is something like:

my_data$col_1 <- as.integer(factor(my_data$col1, levels = unique(my_data$col1))
my_data$col_2 <- as.integer(factor(my_data$col2, levels = unique(my_data$col2))

Probably a good idea to read up on factors

edited May 02 '17 at 22:11

Gregor Thomas

136,190
20
167
294

answered May 02 '17 at 21:35

nwknoblauch

548
4
12

1

Made some edits - as the second column clearly isn't in alpha order already, need to make sure that `factor` doesn't reorder them. – Gregor Thomas May 02 '17 at 22:13

score 0 · Answer 2 · answered May 02 '17 at 21:35

Try building a separate dataframe from the unique entries of that column, then use the row names (which will be consecutive integers). If your dataframe is df and that first column is v1, something like

x = data.frame(v1 = unique(df$v1))
x$numbers = row.names(x)

Then you can do some kind of merge

final.df = merge(x, df, by = "v1")

and then using something like dplyr to select/drop/rearrange columns

R replace identical column character items with increasing number

2 Answers2