I have a dataframe in which each column contains ID numbers; looks like this:
LC3B.low LC3B.hi P62.low P62.hi
PT 65 PT 172 PT 86 PT 135
PT 86 PT 65 PT 38 PT 56
PT 251 PT 251 PT 217 PT 261
I want to list IDs that are seen in two or more columns. So for the values supplied, pandas would indicate that:
- LC3B.low and LC3B.hi share "PT 65" and "PT 251"
- LC3B.low and P62.low share "PT 86"
Im new to Pandas and used to Perl. In Perl, I'd solve this problem by creating arrays and hashes of each column then checking each element of each array against each hash with print statements accompanying each comparison as well as each match so my output would look like this:
LC3B.low vs LC3B.hi
PT 65
PT 251
LC3B.low vs P62.low
PT 86
LC3B.low vs P62.hi
LC3B.hi vs P62.low
LC3B.hi vs P62.hi
P62.low vs P62.hi
But that produces a messy output and it just feels like its not the most efficient way of solving the problem. Surely Pandas has a built-in way of doing this kind of thing?
UPDATE: I have been trying to learn to use SQL commands to accomplish this task but pandasql does not recognize my column names. So basically:
print pysqldf("SELECT * FROM df;")
prints the table but,
print pysqldf("SELECT ATG12.low FROM df;")
prints "None"
I am definitely not committed to using SQL in Pandas to solve this problem. Will appreciate any and all suggestions or advice of how to list all the ID values that shows up in more than one column