0

I want to populate 10 columns with the numbers 1-16 depending on the values in 2 other columns. I can start by providing the column header or create new columns (does not matter to me).

I tried to create a function that iterates over the numbers 1-10 and then assigns a value to the z variable depending on the values of b and y. Then I want to apply this function to each row in my dataframe.

import pandas as pd

import numpy as np

data = pd.read_csv('Nuc.csv')

def write_Pcolumns(df):

    """populates a column in the given dataframe, df, based on the values in two other columns in the same dataframe"""

    #create string of numbers for each nucleotide position 
    positions = ('1','2','3','4','5','6','7','8','9','10')
    a = "Po "
    x = "O.Po "
    #for each position create a variable for the nucleotide in the sequence (Po) and opposite to the sequence(o. Po)
for each in positions: 
        b = a + each
        y = x + each
        z = 'P' + each
        #assign a value to z based on the nucleotide identities in the sequence and opposite position
        if df[b] == 'A' and df[y]=='A':
            df[z]==1
        elif df[b] == 'A' and df[y]=='C':
            df[z]==2
        elif df[b] == 'A' and df[y]=='G':
            df[z]==3
        elif df[b] == 'A' and df[y]=='T':
            df[z]==4
        ...
        elif df[b] == 'T' and df[y]=='G':
            df[z]==15
        else:
            df[z]==16
    return(df)

data.apply(write_Pcolumns(data), axis=1)

I get the following error message: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

hko
  • 1
  • 1

1 Answers1

0

This happens because df[index]=='value' returns a series of booleans, not a single boolean for each value.

Check out Pandas error when using if-else to create new column: The truth value of a Series is ambiguous

blueharen
  • 126
  • 5