I have a dataframe with two columns (genome) and a dataframe with one column (list_SSNP).
What I am trying to do is to add a third and fourth columns in my Genome dataframe and add the value "1" for those positions in Genome that appear in list_SSNP and, separately, in list_SCPG.
I am trying to get an output dataframe that looks like this:
Gene_Symbol CHR SNP
A1BG 19q13.43
PDE1C 12p13.31 1
This is part of the content of Genome and I have included a reproducible example:
Genome <- c()
Genome$Gene_Symbol <- c("A1BG", "A1BG-AS1", "A1CF", "A2M", "PDE1C")
Genome$CHR <- c("19q13.43", "19q13.43", "10q11.23", "12p13.31", "12p13.31")
Gene_Symbol CHR
1 A1BG 19q13.43
2 A1BG-AS1 19q13.43
3 A1CF 10q11.23
4 A2M 12p13.31
5 PDE1C 12p13.31
And this is part of the content of list_SSNP:
list_SSNP <- c("PDE1C", "IMMP2L", "ZCCHC14", "NOS1AP", "HARBI1")
Gene_Symbol
1 PDE1C
2 IMMP2L
3 ZCCHC14
4 NOS1AP
5 HARBI1
Using only 1 of the dataframes (list_SSNP), which is what I am attempting to do first, what I have tried to do is a loop through the genome dataframe and for element i (row) in my Genome if the element i of my list_SSNP dataframe is like element i in my Genome dataframe, then add the number 1 to a third column, but when I execute this code, nothing happens.
Full_genome <- read.table("FULL_GENOME.txt", header=TRUE, sep = "\t", dec = ',', na.strings=c("","NA"), fill=TRUE)
Genome <- Full_genome[,c(2,3)]
names(Genome) <- c("Gene_Symbol", "CHR")
list_SSNP <- as.data.frame(Gene_SSNP$Gene_Symbol)
for (i in 1: dim (Genome) [1]) {
if(list_SSNP[i] %in% Genome[i,1]){
Genome[i,3] <- 1
}
}
Just to further clarify, I have checked that all the elements from list_SSNP appear in Genome, so it is absolutely certain that it is not a matter of not finding any coincidences.
EDIT:
I have come to realize that my example does not specify that the entries in list_SSNP and Genome are unique and have no duplicates and that Genome has about 30k lines of entries, while list_SSNP has 49. I just want to add a column in Genome and a number 1 in those rows where the entry exists in both Genome and list_SSNP.