0

I have a data table, first_median,which includes a column location. Another data table that has location and name of the city in it.

I want to merge them so the initial data table, first_median, gets the city names.

The problem is that it produces NAs for some of those. To be more clear, the coordinate 44.03125_-123.09375 has the name Eugene. After merging, the first two repetition of 44.03125_-123.09375 are mapped to Eugene, but the rest are mapped to NA.

Next weird part is that I convert the first_median to data frame, (as.data.frame(first_median), and then back to data table, data.table(first_median), and then I do the merge, then it works!!!

Please take a look at the image.

Any idea what is going on?

enter image description here

Also, I changed the code to

first_medians_merged_before <- merge(first_medians, LOI, by="location", 

all.x=T)
dput(head(first_medians_merged_before, 5))

first_medians <- as.data.frame(first_medians)
first_medians <- data.table(first_medians)
first_medians_merged_after <- merge(first_medians, LOI, by="location", all.x=T)
dput(head(first_medians_merged_after, 5))

To be more clear, and the outputs of the dput are below:

> dput(head(first_medians_merged_before, 5))
structure(list(location = c("44.03125_-123.09375", "44.03125_-123.09375", 
"44.03125_-123.09375", "44.03125_-123.09375", "44.03125_-123.09375"
), time_period = c("1950-2005", "1950-2005", "1979-2015", "1979-2015", 
"2006-2025"), emission = c("RCP 4.5", "RCP 8.5", "RCP 4.5", "RCP 8.5", 
"RCP 4.5"), median = c(72, 72, 68, 68, 78), city = c("Eugene", 
"Eugene", NA, NA, NA)), sorted = "location", class = c("data.table", 
"data.frame"), row.names = c(NA, -5L), .internal.selfref = <pointer: 0x1028114e0>)

> dput(head(first_medians_merged_after, 5))
structure(list(location = c("44.03125_-123.09375", "44.03125_-123.09375", 
"44.03125_-123.09375", "44.03125_-123.09375", "44.03125_-123.09375"
), time_period = c("1950-2005", "1950-2005", "1979-2015", "1979-2015", 
"2006-2025"), emission = c("RCP 4.5", "RCP 8.5", "RCP 4.5", "RCP 8.5", 
"RCP 4.5"), median = c(72, 72, 68, 68, 78), city = c("Eugene", 
"Eugene", "Eugene", "Eugene", "Eugene")), sorted = "location", class = c("data.table", 
"data.frame"), row.names = c(NA, -5L), .internal.selfref = <pointer: 0x1028114e0>)
> 
OverFlow Police
  • 861
  • 6
  • 23
  • 1
    Please check if the values are same or if there are leading/lagging spaces – akrun May 24 '19 at 15:34
  • 4
    [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. We can't recreate a problem with your code from a picture of it. – camille May 24 '19 at 15:57
  • That is not it. I have checked, also, converting the type back and forth between data frame and table will not kill the spaces in strings, if there is any. I just hoped, with low probability, someone have had this problem before! – OverFlow Police May 24 '19 at 17:24
  • 1
    Might location be a factor in the original `first_median`? It would help if you could include in your question the output of running `dput(head(first_medians_merged, 5))` so that we could inspect the object, vs. how it prints. Some objects print the same but are different under the hood. – Jon Spring May 24 '19 at 18:19
  • @JonSpring The answer to your first question is no, it is not a factor, had checked that, and I edited the question to include the `dput` part of your question. I will get back to this at night, and see, if I can produce a "reproducible" form of this problem, and investigate more! – OverFlow Police May 24 '19 at 18:33
  • I got `TRUE` from `identical(first_medians_merged_before[1,1], first_medians_merged_before[3,1])`, so the problem seems to be upstream of or during the merge. Possibly warned about in ?merge: "This is intended to work with data frames with vector-like columns: some aspects work with data frames containing matrices, but not all." – Jon Spring May 24 '19 at 18:55

0 Answers0