2

I have a data frame with 60000 obs. of 4 variables in the following format:

enter image description here

I need to replace all character items in the first column with the same character with the number 1. So "101-startups" is 1, "10i10-aps" is 2, 10x is 3 and all 10x-fund-lp are 4 and so on. The same for the second column.

How do I achieve this?

  • Please read [how to make a reproducible example in r](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – M-- May 02 '17 at 21:59

2 Answers2

1

If I'm understanding your question correctly, all you need to do is something like:

my_data$col_1 <- as.integer(factor(my_data$col1, levels = unique(my_data$col1))
my_data$col_2 <- as.integer(factor(my_data$col2, levels = unique(my_data$col2))

Probably a good idea to read up on factors

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
nwknoblauch
  • 548
  • 4
  • 12
  • 1
    Made some edits - as the second column clearly isn't in alpha order already, need to make sure that `factor` doesn't reorder them. – Gregor Thomas May 02 '17 at 22:13
0

Try building a separate dataframe from the unique entries of that column, then use the row names (which will be consecutive integers). If your dataframe is df and that first column is v1, something like

x = data.frame(v1 = unique(df$v1))
x$numbers = row.names(x)

Then you can do some kind of merge

final.df = merge(x, df, by = "v1")

and then using something like dplyr to select/drop/rearrange columns

lebelinoz
  • 4,890
  • 10
  • 33
  • 56