I have 2 txt files I'd like to read into python: 1) A map file, 2) A data file. I'd like to have a lookup table or dictionary read the values from TWO COLUMNS of one, and determine which value to put in the 3rd column using something like the pandas.map function. The real map file is ~700,000 lines, and the real data file is ~10 million lines.
Toy Dataframe (or I could recreate as a dictionary) - Map
Chr Position Name
1 1000 SNPA
1 2000 SNPB
2 1000 SNPC
2 2000 SNPD
Toy Dataframe - Data File
Chr Position
1 1000
1 2000
2 1000
2 2001
Resulting final table:
Chr Position Name
1 1000 SNPA
1 2000 SNPB
2 1000 SNPC
2 2001 NaN
I found several questions about this with only one column lookup: Adding a new pandas column with mapped value from a dictionary. But can't seem to find a way to use 2 columns. I'm also open to other packages that may handle genomic data.
As a bonus second question, it'd also be nice if there was a way to map the 3rd column if it was with a certain amount of the mapped value. In other words, row 4 of the resulting table above would map to SNPD, as it's only 1 away. But I'd be happy to just get the solution for above.