I hope you are doing well. I'm processing unstructured data for a CHR master dataset:
originaldata <- read.csv('./csv/Info ECE 2014 - 2021.csv', header = TRUE, na.strings = "")
After cleaning and structuring data, I'm doing the text processing and for this I'm getting two different datasets from this file:
dataset1 <- data.frame(id = originaldata$id)
# Making the text processing here and adding it to dataset1
dataset2 <- data.frame(id = originaldata$id)
# Making the text processign here and adding it to dataset2
newdata <- merge(dataset1, dataset2, by = "id")
The problem I have is that when I merge dataset 1 and 2, (both have the same row number, e.q. 10,692 obs., also equal than the original data), newdata has 11,392 obs. (700 additional rows) and I cannot figure why, (considering that both id rows become from the same source). Any help will be truly appreciate.
I'm using merge from R base