I want to populate 10 columns with the numbers 1-16 depending on the values in 2 other columns. I can start by providing the column header or create new columns (does not matter to me).
I tried to create a function that iterates over the numbers 1-10 and then assigns a value to the z variable depending on the values of b and y. Then I want to apply this function to each row in my dataframe.
import pandas as pd
import numpy as np
data = pd.read_csv('Nuc.csv')
def write_Pcolumns(df):
"""populates a column in the given dataframe, df, based on the values in two other columns in the same dataframe"""
#create string of numbers for each nucleotide position
positions = ('1','2','3','4','5','6','7','8','9','10')
a = "Po "
x = "O.Po "
#for each position create a variable for the nucleotide in the sequence (Po) and opposite to the sequence(o. Po)
for each in positions:
b = a + each
y = x + each
z = 'P' + each
#assign a value to z based on the nucleotide identities in the sequence and opposite position
if df[b] == 'A' and df[y]=='A':
df[z]==1
elif df[b] == 'A' and df[y]=='C':
df[z]==2
elif df[b] == 'A' and df[y]=='G':
df[z]==3
elif df[b] == 'A' and df[y]=='T':
df[z]==4
...
elif df[b] == 'T' and df[y]=='G':
df[z]==15
else:
df[z]==16
return(df)
data.apply(write_Pcolumns(data), axis=1)
I get the following error message: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().