0

Let's say I have the following data frames below. How would one use "Partial_Match" of df2 to replace "CS" of df1 with "Name" from df2? What I am looking for is an output that looks like df3. Thanks.

old <- c("CS1.1.length", "CS1.7.length", "CS1.10.length", "CS1.12.length", "CS2.4.length", "CS2.6.length", "CS2.9.length", "CS2.11.length")
new <- c("Bob", "Alex", "Gary", "Taylor", "Tom", "John", "Pat", "Mary")
partial_match <- c("CS1.1", "CS1.7", "CS1.10", "CS1.12", "CS2.4", "CS2.6", "CS2.9", "CS2.ll")
df1 <- data.frame(CS = old, Num = sample(50,8))
df2 <- data.frame(Partial_Match = partial_match, Name = new)
df3 <- data.frame(df2[2],df1[2])
nniloc
  • 4,128
  • 2
  • 11
  • 22
user13589693
  • 369
  • 1
  • 8

1 Answers1

0

If old is a good representation of your full dataset, you can drop the .length part of the string and then perform the join.

Mary does not get joined here because partial_match[8] is CS2.ll rather than CS2.11.

Edit: this solution requires tidyverse

library(tidyverse)

df1 %>%
  mutate(Partial_Match = tools::file_path_sans_ext(CS)) %>%
  left_join(df2) %>%
  select(Name, Num)

Edit 2: same thing but with base R

df1$Partial_Match <- tools::file_path_sans_ext(df1$CS)
df3 <- merge(df1, df2)
df3 <- df3[,c('Name', 'Num')]
nniloc
  • 4,128
  • 2
  • 11
  • 22
  • Sorry, I should have prefaced that "old" is a small subset of data. My entire dataframe will contain much more than .length (i.e. CS1.1.width, CS1.1.height, etc.), but the CS#.#. will remain the same throughout. I am looking for a solution that will replace the CS#.#. with the "new" label. – user13589693 May 27 '20 at 18:41
  • If it is the same pattern, instead of matching `.length` you can pretend everything after the last period is the 'file extension' and remove it using this base R function `tools::file_path_sans_ext(old)`. – nniloc May 27 '20 at 19:35
  • For the sake of completeness, [here](https://stackoverflow.com/a/52471077/12400385) is another way to remove everything after the last period with regex. E.g. `mutate(Partial_Match = str_remove(CS, "\\.[^.]*$"))` or `gsub("\\.[^.]*$","", df1$CS)`. – nniloc May 27 '20 at 20:28