-1

i have one column in a dataset with long string, i want to convert the particular column to simple integer values(hashing or indexing)in R so that I can easily join other tables with that specific column, can anyone suggest something on this?

library(tidyverse)
mpg

mpg <- mpg %>% mutate(displ = as.character(displ), year = 
as.character(year)) %>%
mutate(matcher = as.character(paste(model, displ, year, sep = ""))) 


View(mpg)

If you see the matcher column, it has a long string values as a character vector, I want to map those values to simple integers like 1, 2, 3 and so on. How can I do that?

  • Hi Jemima, welcome. Can you provide us a minimum reproducible example so we can cut and paste a small sample of your data into our own R sessions, and the code you are attempting to perform this task, so we can get the same errors? It is much quicker to get answers here when you do that. Thanks and good luck :) – mysteRious Nov 08 '18 at 03:59
  • Hi mysteRious Thanks, Now I have edited the question with the example code. – Jemima Jeyakumar Nov 08 '18 at 04:46
  • Thanks very much. Do you have any rules for the mapping? Looks like there are 141 unique strings in 234 rows of `matcher`. – mysteRious Nov 08 '18 at 05:20
  • Thanks so much, it worked. Now want to join another table with that new column called matchnum. Thanks again – Jemima Jeyakumar Nov 08 '18 at 05:43
  • I think that next step may be something you can do with `purrr` library and the `reduce` command. Check out https://stackoverflow.com/questions/8091303/simultaneously-merge-multiple-data-frames-in-a-list for ideas :) – mysteRious Nov 08 '18 at 05:50
  • I have to do the same thing for some other table, but when I do it I gets different matchnum that means both table have similar matcher values but it maps with different matchnum values. for an example [table 1] has matcher = a41.81999 and [table 2] has matcher = a41.81999 but the matchnum is different in these 2 tables. I want to generate the same matchnum values so that I can easily join 2 tables with the matchnum column. Can you please help me with this? – Jemima Jeyakumar Nov 08 '18 at 06:09
  • That's right, it is only picking up uniques in each individual table. You should post a new question with example data from the two tables you are trying to combine so that the merging problem is minimally reproducible. Someone should be able to help with that part then :) – mysteRious Nov 08 '18 at 16:22

1 Answers1

0

If you just want to catch each unique string and assign it a number, you can do that like this. I will edit the answer if you describe a more complex mapping:

> z <- transform(mpg, matchnum=as.integer(factor(matcher, unique(matcher))))
> head(z)
  manufacturer model displ year cyl      trans drv cty hwy fl   class   matcher matchnum
1         audi    a4   1.8 1999   4   auto(l5)   f  18  29  p compact a41.81999        1
2         audi    a4   1.8 1999   4 manual(m5)   f  21  29  p compact a41.81999        1
3         audi    a4     2 2008   4 manual(m6)   f  20  31  p compact   a422008        2
4         audi    a4     2 2008   4   auto(av)   f  21  30  p compact   a422008        2
5         audi    a4   2.8 1999   6   auto(l5)   f  16  26  p compact a42.81999        3
6         audi    a4   2.8 1999   6 manual(m5)   f  18  26  p compact a42.81999        3
mysteRious
  • 4,102
  • 2
  • 16
  • 36