0

I have a code in R that works. But I want to re-do it in python. I use R to use apply function in order to calculate minor allele frequency. Can someone tell me how such a code would look in python? I am using pandas to read the data in python.

##R-code
###Reading file
var_freq <- read_delim("./cichlid_subset.frq", delim = "\t",
                       col_names = c("chr", "pos", "nalleles", "nchr", "a1", "a2"), skip = 1)

# find minor allele frequency
var_freq$maf <- var_freq %>% select(a1, a2) %>% apply(1, function(z) min(z))

I have read the file using pandas but I am struggling with the second part.

###Python code
###Reading file
var_freq = pd.read_csv("./cichlid_subset.frq",sep='\t',header=None)
column_indices = [0,1,2,3,4,5]
new_names = ["chr", "pos", "nalleles", "nchr", "a1", "a2"]
old_names = df_snv_gnomad.columns[column_indices]

###Finding minor allele frequency

Insights will be appreciated.

John
  • 815
  • 11
  • 31
  • Does this answer your question? [Running R script from python](https://stackoverflow.com/questions/19894365/running-r-script-from-python) – Chris Nov 25 '20 at 13:28
  • No I want to see how I can do the same thing in python. – John Nov 25 '20 at 13:32

1 Answers1

2

Use:

# Read file
colnames = ["chr", "pos", "nalleles", "nchr", "a1", "a2"]
var_freq = pd.read_csv('./cichlid_subset.frq', sep='\t', header=None, skiprows=1, names=colnames)

# Get MAF
var_freq['maf'] = var_freq[['a1','a2']].min(axis=1)
Cainã Max Couto-Silva
  • 4,839
  • 1
  • 11
  • 35