0

I have read this How to find common rows between two dataframe in R?

I have two data

df1 <- structure(list(V1 = structure(c(1L, 3L, 2L, 4L), .Label = c("AMH5", 
"BBHD", "DHE3", "NF1"), class = "factor")), .Names = "V1", class = c("data.table", 
"data.frame"), row.names = c(NA, -4L), .internal.selfref = <pointer: 0x103007b78>)

and

df2<- structure(list(V1 = structure(c(4L, 2L, 3L, 1L), .Label = c("AMH5 ", 
"BBDQ ", "DHE3", "TBB5 "), class = "factor")), .Names = "V1", class = c("data.table", 
"data.frame"), row.names = c(NA, -4L), .internal.selfref = <pointer: 0x103007b78>)

unfortunatelly I cannot find where the problem is when I have several similar strings while not all detected. For example when I do this

library(data.table)
fintersect(setDT(df1), setDT(df2))

It shows only one

V1
1: DHE3
wibeasley
  • 5,000
  • 3
  • 34
  • 62
nik
  • 2,500
  • 5
  • 21
  • 48

1 Answers1

0

In your data, little bit of cleaning is required.

# convert to character (if needed)
df1 <- df1[, lapply(.SD, as.character)]
df2 <- df2[, lapply(.SD, as.character)]

# trim whitespace
library(stringr)
df1 <- df1[, lapply(.SD, str_trim)]
df2 <- df2[, lapply(.SD, str_trim)]

# get output
fintersect(df1, df2)

     V1
1: DHE3
2: AMH5
YOLO
  • 20,181
  • 5
  • 20
  • 40
  • I get some error `df1 <- df1[, lapply(.SD, as.character)] Error in .subset(x, j) : invalid subscript type 'list' ` – nik Mar 16 '18 at 20:51
  • That's weird. I just ran it. Could you reload your df1 and df2 and try again. – YOLO Mar 16 '18 at 20:53
  • do you know what was the problem? I should have converted them into data.table first before I convert to character . So it solved the issue and I accepted your answer . thanks – nik Mar 16 '18 at 20:59
  • convert to data.table ? But, the format you gave in question is already in data.table format. Strange! Anyways, glad your problem is solved. – YOLO Mar 16 '18 at 21:09