I have two dataframes, first one (dt
) contains of all chr
and second one (TargetWord
) is a dictionary contains chr
as well. I have used pmatch
to search in dt
which words are available in the TargetWord
and returning the position from TargetWord
. It is working fine when dataframes are small. But problem starts when the dataframes are huge, it is returning the word position for only the first column, rest of the columns are becoming NA.
## Data Table
word_1 <- c("conflict","", "resolved", "", "", "")
word_2 <- c("", "one", "tricky", "one", "", "one")
word_3 <- c("thanks","", "", "comments", "par","")
word_4 <- c("thanks","", "", "comments", "par","")
word_5 <- c("", "one", "tricky", "one", "", "one")
dt <- data.frame(word_1, word_2, word_3,word_4, word_5, stringsAsFactors = FALSE)
## Targeted Words
TargetWord <- data.frame(cbind(c("conflict", "thanks", "tricky", "one", "two", "three")))
## convert into matrix (needed)
dt <- as.matrix(dt)
TargetWord <- as.matrix(TargetWord)
result <- `dim<-`(pmatch(dt, TargetWord, duplicates.ok=TRUE), dim(dt))
print(result)
Returning result,
[,1] [,2] [,3] [,4] [,5]
[1,] 1 NA 2 2 NA
[2,] NA 4 NA NA 4
[3,] NA 3 NA NA 3
[4,] NA 4 NA NA 4
[5,] NA NA NA NA NA
[6,] NA 4 NA NA 4
Now after reading two .csv
as bellow, result is just for the first column where I want it for all columns like above result. Bellow, dt1 = 79*50 dataframe, and word_dict 13901*1 dataframe.
#################### on big data #####################################
dt1 <- read.csv("C:/Users/Wonderland/Downloads/string_feature.csv", stringsAsFactors = FALSE)
word_dict <- read.csv("C:/Users/Wonderland/Downloads/word_dict.csv", stringsAsFactors = FALSE)
dt1 <- as.matrix(dt1)
word_dict <- as.matrix(word_dict)
result <- `dim<-`(pmatch(dt1, word_dict, duplicates.ok=TRUE), dim(dt1))
print(result)