finding common strings across two data frame

Question

I have read this How to find common rows between two dataframe in R?

I have two data

df1 <- structure(list(V1 = structure(c(1L, 3L, 2L, 4L), .Label = c("AMH5", 
"BBHD", "DHE3", "NF1"), class = "factor")), .Names = "V1", class = c("data.table", 
"data.frame"), row.names = c(NA, -4L), .internal.selfref = <pointer: 0x103007b78>)

and

df2<- structure(list(V1 = structure(c(4L, 2L, 3L, 1L), .Label = c("AMH5 ", 
"BBDQ ", "DHE3", "TBB5 "), class = "factor")), .Names = "V1", class = c("data.table", 
"data.frame"), row.names = c(NA, -4L), .internal.selfref = <pointer: 0x103007b78>)

unfortunatelly I cannot find where the problem is when I have several similar strings while not all detected. For example when I do this

library(data.table)
fintersect(setDT(df1), setDT(df2))

It shows only one

V1
1: DHE3

YOLO · Accepted Answer · 2018-03-16T20:54:12.587

0

In your data, little bit of cleaning is required.

# convert to character (if needed)
df1 <- df1[, lapply(.SD, as.character)]
df2 <- df2[, lapply(.SD, as.character)]

# trim whitespace
library(stringr)
df1 <- df1[, lapply(.SD, str_trim)]
df2 <- df2[, lapply(.SD, str_trim)]

# get output
fintersect(df1, df2)

     V1
1: DHE3
2: AMH5

edited Mar 16 '18 at 20:54

answered Mar 16 '18 at 20:48

YOLO

20,181
5
20
40

I get some error `df1 <- df1[, lapply(.SD, as.character)] Error in .subset(x, j) : invalid subscript type 'list' ` – nik Mar 16 '18 at 20:51
That's weird. I just ran it. Could you reload your df1 and df2 and try again. – YOLO Mar 16 '18 at 20:53
do you know what was the problem? I should have converted them into data.table first before I convert to character . So it solved the issue and I accepted your answer . thanks – nik Mar 16 '18 at 20:59
convert to data.table ? But, the format you gave in question is already in data.table format. Strange! Anyways, glad your problem is solved. – YOLO Mar 16 '18 at 21:09

finding common strings across two data frame

1 Answers1