0

So I have two datasets with one having a value ID and say variables a,b and c; the other ones alo has IDs but with variables c,d,e. These two datasets have individuals in common but not all of them. I tried a dyplr::left join but it removes the rows from the second datset that do not match the ones in the first. So I just created and ordered columns in both datasets so that I can just do a rbind. Problem is I now have somethhing like this (example with made up data):

  index  b  c  d
1     A  B  A  A
2     B NA  C  D
3     B  B NA NA
4     C  D  E  C
5     D  D  D  D
6     E NA  E NA
7     E NA NA  F
8     E  G NA NA
9     F  F  F  F

and I would like to have:

  index  b  c  d
1     A  B  A  A
2     B  B  C  D
3     C  D  E  C
4     D  D  D  D
5     E  G  E  F
6     F  F  F  F 

Apparently, a similar question has been asked in Push up and tighten Dataframe. General solution, but it was wih numbers, the sum function does not work in my case. I have tried adapting it replacing by paste and paste0 but it does not work. I also tried the second solution in this Merge rows in one data.frame (setDT+ lapply) it works for the example but I have "Error in eval(bysub, x, parent.frame()) : object 'A' not found" in the true dataset.

The solution as df1 %>% group_by(index) %>% summarise_all(na.omit) with dplyr proposed in the first comment also works in the example but not on true data because of the error: "Error in summarise_impl(.data, dots) : Column Nom must be length 1 (a summary value), not 2" with "Nom" being the first column after the index made of strings of several words with spaces and punctuation. Doing length(Nom) outputs the number of lines.

Thanks in advance

  • 1
    Try `library(dplyr);df1 %>% group_by(index) %>% summarise_all(na.omit)` – akrun Jun 22 '18 at 07:16
  • Thanks for this answer.It works on the example but when I try to apply it to my true data I get "Error in summarise_impl(.data, dots) : Column `Nom` must be length 1 (a summary value), not 2" with `Nom`being the name of my first column after the index column. – user9973985 Jun 22 '18 at 07:29
  • Although this question does not appear to be a duplicate, it is still an incomplete question. Please make sure you provide enough information/data/code to fully reproduce the (incorrect) output that you included in the question. – BenBarnes Jun 22 '18 at 12:15
  • I can't really share the data as it is private :/, I'm ivestigating further the details of what's happening with the length of my columns and will edit for any update. – user9973985 Jun 22 '18 at 12:37
  • `df1<-aggregate.data.frame(df,list(index), function(x) paste(x[!is.na(x)])) df1$index<-NULL df1` . By doing that I got something working in both datasets, I think the problem comes from that I have indexes for which the same column is filled both times which creates vectors in the merged column, that would explain why the dplyr doesn't work. It's progress I just have to find a way to transform a vector with 2 vales into a unique one now. – user9973985 Jun 22 '18 at 13:27

0 Answers0