0

I am a beginner who is studying bioinformatics with scanpy these days. I am trying to improve so any help is very welcome, thanks lot!

##This lists contains gene names.
Angio=['ADAM17','AXIN1','AXIN2','CCND2','DKK1','DKK4'] 
Hypoxia=['ADAM17','AXIN1','DLL1','FZD8','FZD1'] 
Infla=['DLL1','FZD8','CCND2','DKK1','ADAM17','JAG2','JAG1'] 
Glycolysis=['MYC','NKD1','PPARD','JAG2','JAG1'] 
Oxophos=['SKP2','TCF7','NUMB']
P53=['NUMB','FZD8','CCND2','AXIN2','KAT2A'] 


df = pd.DataFrame(columns=['Angio', 'Hypoxia', 'Infla', 
                           'Glycolysis', 'Oxophos', 'P53'],
                  index=['Angio', 'Hypoxia', 'Infla', 
                           'Glycolysis', 'Oxophos', 'P53'])


print(df)
           Angio  Hypoxia   Infla   Glycolysis  Oxophos  P53
Angio       NaN     NaN      NaN        NaN       NaN    NaN
Hypoxia     NaN     NaN      NaN        NaN       NaN    NaN
Infla       NaN     NaN      NaN        NaN       NaN    NaN
Glyco       NaN     NaN      NaN        NaN       NaN    NaN
Oxophos     NaN     NaN      NaN        NaN       NaN    NaN
P53         NaN     NaN      NaN        NaN       NaN    NaN


#The function below is to obtain the jaccard similarity score.
#Input is a list of the six above.
def jaccard(list1, list2):
    intersection = len(list(set(list1).intersection(list2)))
    union = (len(list1) + len(list2)) - intersection
    return float(intersection) / union

The six lists contain gene names.

And these lists were named by the names of rows and columns of 'df'.

Obtain the value by using the name of the row and column of 'df' as input in the jaccard function. (Because the previous 6 list names are the names of rows and columns)

At this point, I want to use 'for loop' to replace the NaN of 'df' with the value obtained from the jaccard.

I keep trying to solve this problem, but it doesn't work out. I just don't know what to do. So I am kind of lost, here... Please help me. Thank you.

supigen
  • 17
  • 4
  • What do you expect the data frame df to look like? What do you get when you print df.head()? What do each of the gene lists look like. We don't need the entire set of lists but a representative sample. Please edit your question to provide a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Mar 14 '23 at 18:46
  • Thank you. I revised the things you told me. – supigen Mar 14 '23 at 19:14

1 Answers1

2

If you can convert your lists to dictionary, I suggest following solution:

import pandas as pd


##This dict contains gene names lists. 
genes_dict = {
    'Angio':['ADAM17','AXIN1','AXIN2','CCND2','DKK1','DKK4'],
    'Hypoxia':['ADAM17','AXIN1','DLL1','FZD8','FZD1'],
    'Infla':['DLL1','FZD8','CCND2','DKK1','ADAM17','JAG2','JAG1'],
    'Glycolysis':['MYC','NKD1','PPARD','JAG2','JAG1'],
    'Oxophos':['SKP2','TCF7','NUMB'],
    "P53":['NUMB','FZD8','CCND2','AXIN2','KAT2A'],
}


#The function below is to obtain the jaccard similarity score.
#Input is a list of the six above.
def jaccard(list1, list2):
    intersection = len(list(set(list1).intersection(list2)))
    union = (len(list1) + len(list2)) - intersection
    return float(intersection) / union


names_list = list(genes_dict.keys())


res = {}
for i in range(len(names_list)):
    res[names_list[i]] = {}
    for j in range(len(names_list)):
        res[names_list[i]][names_list[j]] = jaccard(genes_dict[names_list[i]],genes_dict[names_list[j]])
        
        
df = pd.DataFrame(res)
Artyom Akselrod
  • 946
  • 6
  • 14
  • Thank you very much! I will continue to study while checking the codes you wrote. Thank you again. – supigen Mar 14 '23 at 19:30