0

Ok folks,

I want to join two data frames via one identifier (key as same column name, is different can be renamed, doesn't make much of a difference) in R and I have tried the normal merge function, dplyr left_join, data.table and get the same error each and every time:

all the values for the rows that DO match and get returned have NA as value for the metric (see second screenshot). To just be sure it is no problem with the data I also did a simple VLOOKUP in excel and there it works flawless, I get the proper values. Now wanting to replicate and outmating this via an R script, it seemingly works but just either returns NA values or somehow turns the values into NA's. Note: I know that the second dataframe (some 6000 observations, see also screenshot) does have a lot of non-matching rows as well, but that shouldn't be a problem, right? I also tried changing the orientation, so instead of a left join with table x with table y, I tried table y with table x.

It shouldn't make a difference, since I just want the additional rows where the identifier (column name) fits. I even tried with a full join (also without pre-selecting columns) but it results in the same problem. No error or even warning log I could rely on as well. I for the life of me can not figure out why that would happen or what I am doing wrong. I checked the columns and names, they are not factors, they have class chr, so this should not be an issue for joining (I found similar problems online where this seemed to be an issue).

Thankful for any pointers.

Have a nice evening everyone!

#import and load reports
skraart2019<-read_xlsx("skraart2019.xlsx", sheet="ArticleDownloads256", skip=11)
savedrecs<-read_xls("savedrecs.xls", skip=28)
# join reports by DOI, filter by columns you need
skrafin<- savedrecs %>% 
   full_join(skraart2019,by="DOI") %>%
    filter(`Publication Year`=="2018") %>%
    select(-(22:57),-(79:90)) 
# whitespace removal just in case, mutate and data.table joins
skraart2019 <-skraart2019%>% mutate(DOI= trimws(as.character(DOI))); savedrecs <-  savedrecs %>% mutate(DOI = trimws(as.character(DOI))); skrafin3 <- left_join(skraart2019,savedrecs)
joined<- merge(savedrecs,skraart2019,by.x = "DOI",all.x = TRUE)
library(data.table)
skraartdt<- setDT(skraart2019)
saveddt<- setDT(savedrecs)
complete<-skraartdt[saveddt,on="DOI"]
 
[RStudio][1][NA values][2]


  [1]: https://i.stack.imgur.com/S2EX7.png
  [2]: https://i.stack.imgur.com/P23zp.png

project files

https://drive.google.com/file/d/1At9lSqs5YzbbOKeNCZ1d3K8xLngGMC98/view?usp=sharing
lexboreas
  • 1
  • 1
  • screenshots? Could you share your files. I would be easier to debug. It might be an issue with data classes. Also check `sum(savedrecs$DOI %in% skraart2019$DOI)` to find out how many index matches. – Bruno Tremblay Oct 17 '20 at 18:48
  • I have saved the screens and a link to my files, but for some reason it does not show the edits. Is it because I am a newbie and these features aren't unlocked for me? I'm still new to stack overflow and how it works. Alternatively, here are the files: https://drive.google.com/file/d/1At9lSqs5YzbbOKeNCZ1d3K8xLngGMC98/view?usp=sharing – lexboreas Oct 19 '20 at 18:51
  • one table DOI use lowercase S, the other one uppercase. Simple fix. Add `skraart2019$Item_DOI <- toupper(skraart2019$Item_DOI); savedrecs$DOI <- toupper(savedrecs$DOI)` after you imported the tables. – Bruno Tremblay Oct 20 '20 at 00:59
  • Sample : 10.1007/S00256-006-0158-5 vs 10.1007/s00256-006-0158-5, they are not the same – Bruno Tremblay Oct 20 '20 at 01:04
  • Thanks, I really overlooked that. I managed to merge by Title instead of DOI already and wanted to close the topic, but wondered what was the reason the DOI was not working. The reason I didn't want to that initially is that Titles can be faulty (for the same reasons, case sensitivity, typos, etc.), the DOI should always be the same in principle. Case sensitivity makes sense and I haven't thought about that before. At first I thought the special characters would be problematic, but since it is a a string, they should be allowed. – lexboreas Oct 21 '20 at 17:27

0 Answers0