-1

I a dataframe (xlsx file) in the form: (this dataframe has 152069 rows).

Source Target
DORTMUND ANTWERP
MUMBAI SPIJKENISSE
XIOALAN BEILUN
ETTRINGEN BREMERHAVEN
HILTER BREMERHAVEN

and I also have another dataframe with the Ids and Names: (this dataframe has 10200 rows with unique id's for each name).

ID Name
2678 DORTMUND
6049 MUMBAI
9873 XIOALAN
3014 ETTRINGEN
4055 HILTER
338 ANTWERP
8323 SPIJKENISSE
824 BEILUN
1272 BREMERHAVEN

I would like to replace the data of the first dataframe with their appropriate id (in the second dataframe). Do you have any suggestions on how to do this? Thank you in advance.

mariag
  • 1
  • 1
  • This sounds like a common merge/join operation. You can see [the FAQ here](https://stackoverflow.com/q/1299871/903061). If you need more help, please provide more details: Where are you stuck? Have you read the data into R yet, or is that part of the problem? If you have the data in R, please share a small sample of each data frame as copy/pasteable code that we can work with, not as a picture. The `dput()` function makes a copy/pasteable version of an R object, e.g., you can share `dput(your_data[1:5, ])` for the first 5 rows of `your_data`. – Gregor Thomas Apr 23 '22 at 03:10
  • 1
    The reason low rep users are not given the authority to post images is to prevent this sort of questin. Learn to make a small example that illustrates the data and doesn't require is re-entering the data. (I suspect this is a duplicate of the huge number of questions for which the answer turns out to depend on `merge`. You also need to explain whether this operation needs to result in another Excel file or can it all be done in R after data import. (My closevote is based on my sense that this has no [MCVE] and even if it did is nor really a discrete question. Needs more focus. – IRTFM Apr 23 '22 at 03:10
  • @GregorThomas I do not need to merge them, but replace the values with the appropriate Ids. I already have the files as dataframes in R. I tried doing what you said with dput(), I was not able to post it because it says that the code is nor properly formatted. – mariag Apr 23 '22 at 03:25

2 Answers2

0

Use match of first dfrm data agains second dfrm Name to create an index to apply to the ID column of the second dataframe. Obviously this should be done on R object for which you have adequate backups.

txt1 <-"| Source   | Target         |
+ | DORTMUND | ANTWERP        |
+ | MUMBAI   | SPIJKENISSE    |
+ | XIOALAN  | BEILUN         |
+ |ETTRINGEN |BREMERHAVEN     |
+ |HILTER    |BREMERHAVEN     |"

 txt2 <- "| ID       | Name           |
+ | 2678     | DORTMUND       |
+ | 6049     | MUMBAI         |
+ | 9873     | XIOALAN        |
+ | 3014     | ETTRINGEN      |
+ | 4055     | HILTER         |
+ | 338      | ANTWERP        |
+ | 8323     | SPIJKENISSE    |
+ | 824      | BEILUN         |
+ | 1272     | BREMERHAVEN    |"
  inp1 <-read.delim(text=txt1, sep="|")[,2:3]
  inp2 <-read.delim(text=txt2, sep="|")[,2:3]

> inp1[] <- lapply(inp1,trimws) 
> inp2[] <- lapply(inp2,trimws)

> inp1[] <- lapply(inp1, function(col){inp2$ID[match(col,inp2$Name)]})
> inp1
  Source Target
1   2678    338
2   6049   8323
3   9873    824
4   3014   1272
5   4055   1272
IRTFM
  • 258,963
  • 21
  • 364
  • 487
0

With base R, you can use match to pair the names together then we can get the ID for those to replace for each column:

df1$Source <- df2$ID[match(df1$Source, df2$Name)]
df1$Target <- df2$ID[match(df1$Target, df2$Name)]

Output

  Source Target
1   2678    338
2   6049   8323
3   9873    824
4   3014   1272
5   4055   1272

Another option would be to mutate across and use match again:

library(tidyverse)

df1 %>% 
  mutate(across(everything(), ~ df2$ID[match(.x, df2$Name)]))

Another option would be to pivot to long form, then join the data together then pivot back wide (but not very efficient).

df1 %>% 
  pivot_longer(everything()) %>% 
  left_join(., df2, by = c("value" = "Name")) %>% 
  select(-value) %>% 
  group_by(grp = ceiling(row_number()/2)) %>% 
  pivot_wider(names_from = "name", values_from = "ID") %>% 
  select(-grp)

Data

df1 <- structure(list(Source = c("DORTMUND", "MUMBAI", "XIOALAN", "ETTRINGEN", 
"HILTER"), Target = c("ANTWERP", "SPIJKENISSE", "BEILUN", "BREMERHAVEN", 
"BREMERHAVEN")), class = "data.frame", row.names = c(NA, -5L))

df2 <- structure(list(ID = c(2678L, 6049L, 9873L, 3014L, 4055L, 338L, 
8323L, 824L, 1272L), Name = c("DORTMUND", "MUMBAI", "XIOALAN", 
"ETTRINGEN", "HILTER", "ANTWERP", "SPIJKENISSE", "BEILUN", "BREMERHAVEN"
)), class = "data.frame", row.names = c(NA, -9L))
AndrewGB
  • 16,126
  • 5
  • 18
  • 49