0

I am working in big data frames with a lot of information and I need to get specific information from on to merge it into my main data frame. I made a small example here with smaller samples:

ID <- c(12,44,56,16,18,29)
people <- data.frame(ID)

  ID
1 12
2 44
3 56
4 16
5 18
6 29

sex<-c("f","f","m","f","m","m")
age <- c(12,34,55,23,43,32)
ID2 <- c(44,12,16,18,56,29)
info <- data.frame(ID2,age,sex)

  ID2 age sex
1  44  12   f
2  12  34   f
3  16  55   m
4  18  23   f
5  56  43   m
6  29  32   m

My goal here is to merge the information of "info" into "people" while considering the ID. For this, I used a for loop like this:

for (i in 1:nrow(people)) {
  for (j in 1:nrow(info)){
    if(people$ID[i] == info$ID2[j]){
      people$age[i] <- info$age[j]
      people$sex[i] <- info$sex[j]
    }
  }
}

My code works fine but it seems like when I apply it in a bigger sample, the calculation time is very high. Is there an alternative to this loop?

Pom
  • 45
  • 4
  • Have you heard of the merge(s)? [How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right/1300618#1300618) – user2974951 Aug 19 '22 at 11:11
  • Try: `dplyr::left_join(people, info, by = c("ID" = "ID2"))` – harre Aug 19 '22 at 11:12
  • Is there a reason you want/need to use a loop instead of a variant of ```merge()``` or ```join()```? – Omniswitcher Aug 19 '22 at 11:12
  • You neither need a `for` loop nor a package for that. Just do `merge(people, info, by.x='ID', by.y='ID2')`. – jay.sf Aug 19 '22 at 11:15
  • That seems a lot more efficient ! May I ask tho, my real "info" data has some extra columns which I do not need. How you specify if you want to merge only specific columns to "people"? – Pom Aug 19 '22 at 11:21
  • @Pom Try `merge(people, subset(info, select=-c(X1, X2)), by.x='ID', by.y='ID2')` to drop columns. – jay.sf Aug 19 '22 at 11:22

1 Answers1

0

You could use base merge? Should do what you need.

ID <- c(12,44,56,16,18,29)
people <- data.frame(ID)

sex<-c("f","f","m","f","m","m")
age <- c(12,34,55,23,43,32)
ID2 <- c(44,12,16,18,56,29)
info <- data.frame(ID2,age,sex)

merge(people, 
      info, 
      by.x = "ID", 
      by.y = "ID2")
#>   ID age sex
#> 1 12  34   f
#> 2 16  55   m
#> 3 18  23   f
#> 4 29  32   m
#> 5 44  12   f
#> 6 56  43   m
VvdL
  • 2,799
  • 1
  • 3
  • 14
  • OH That's great ! May I ask how you specify if you want to merge only specific columns? For instance, my real "info" data frame has some columns which I do not need. – Pom Aug 19 '22 at 11:16
  • You can select certain columns in many different ways. Two ways of doing it: `info[c("col1", "col2", "col8", "col9")]` or using dplyr: `dplyr::select(info, col1, col2, col8, col9)` – VvdL Aug 19 '22 at 11:21