-2

I have a large number of corresponding columns: colA1, colA2... colA60 and colB1, colB2... colB60. Based on the values of colA1 and colB1, I would like to create colC1. How can I create a function which iterates through i in range 1 to 3 for colAi, colBi and colCi? In reality the range is much larger.

I've managed to create a list of column names:

range_list= list(range(1,4))

for i in range(0, len(range_list)): 
    range_list[i] = str(range_list[i])

A_list= ['colA' + s for s in range_list]
B_list= ['colB' + s for s in range_list]
C_list= ['colC' + s for s in range_list]

Some sample data:

my_dict = {'colA1':[2,6,8,28,5],
    'colA2': [38,6,14,63,3], 
    'colA3':[90,40,80,98,3],
    'colB1':[1,46,23,4,42],
    'colB2': [24,3,9,10,24], 
    'colB3':[35,12,19,3,23]} 

df = pd.DataFrame(my_dict)

How can I iterate through these corresponding columns in a function? Here is my attempt which doesn't work at all.

def test_fx(x):  
    for a in range(len(A_list)) and b in range(len(B_list)):
        c == df[a]
        if df[b] >= 10:
            c = df[a]*2
        elif [b] >= 20:
            c = df[a]*3
        elif [b] >= 30:
            c = df[a]*4
        return c

for c in range(len(C_list)):
    df[c] = 'empty'
    df[c].apply(test_fx)

The expected output is three columns added to df:
colC1: 2,24,24,28,20
colC2: 114,6,14,126,9
colC3: 360,80,160,98,9

Marla
  • 340
  • 3
  • 16

3 Answers3

2

The issue I can see in the lines you posted is with your apply. You never specify the columns of your df you wish to apply the function test_fx on.

The basic syntax if you want to create a new column c from an existing column a in a dataframe is :

df[c] = df[a].apply(func)

In your case, for fault of a desired output, I understand that you want to apply your 'text_fx' function to each pairs of columns colAx, colBx to output a colCx in the same data frame.

In this case, the following code should work

import numpy as np

# get your column names
range_list= list(range(1,4))
list_A = ['colA' + str(s) for s in range_list]
list_B = ['colB' + str(s) for s in range_list]
list_C = ['colC' + str(s) for s in range_list]

#your self-defined function
#note that it takes a tuple as input
def test_fx(l):
    a,b=l
    q= a
    if b >= 30:
        q=a*4
    elif b >= 20:
        q = a*3
    elif b >= 40:
        q = a*2
    else: 
        q=np.nan
    return q


#iterate over each couple of colAi, colBi and put the result in colCi
for i in range(0, len(range_list)):
    a = list_A[i] #next colA name
    b = list_B[i] #next colB name
    c = list_C[i] 
    df[c] = df[[a, b]].apply(test_fx, axis =1)

print(df)

Note that like .iterrows from the answer from CenturionNOR, the syntax df[[a,b]] also returns a tuple. More details in this answer

  • In this output, colC2 is missing and colC3 is populated with "None". – Marla Jun 18 '19 at 10:17
  • You are right. I edited it to correct the None to a real Nan. When I run it though, i don't have colC2 missing. Also, I have created the function test_fx on the first function you posted. You can easily modify the values inside the function, I made it so it closely resembles your use case. I will modify it. – Cyriel Mallart Jun 18 '19 at 12:11
  • oops bad copy-paste of an first try, the column C2 magiaccly appears now – Cyriel Mallart Jun 18 '19 at 12:28
2

If you want to iterate through your columns, you can try this:

range_start=1
range_stop=3
for i in range(range_start,range_stop+1):
    b_col="colB"+str(i)
    a_col="colA"+str(i)

    case_1=df.loc[(df[b_col]>= 10) & (df[b_col]< 20),a_col]*2
    case_2=df.loc[(df[b_col]>= 20) & (df[b_col]< 30),a_col]*3
    case_3 = df.loc[(df[b_col]>= 30) ,a_col]*4

    df["C"+str(i)]=pd.concat([case_1,case_2,case_3])

With this method, I create the Ci columns one at a time: I subset the Ai column by the values in Bi columns (here there is 3 subsets for the 3 cases of your example), then I concatenate the result into one Series that will be my Ci column.

The output of df will then be :

   colA1  colA2  colA3  colB1  colB2  colB3    C1     C2     C3
0      2     38     90      1     24     35   NaN  114.0  360.0
1      6      6     40     46      3     12  24.0    NaN   80.0
2      8     14     80     23      9     19  24.0    NaN  160.0
3     28     63     98      4     10      3   NaN  126.0    NaN
4      5      3      3     42     24     23  20.0    9.0    9.0

The NaN values are for values of B columns that don't fall within the ranges.

Lawis
  • 125
  • 7
1

As far as I can read, you want to iterate through all columns at the same time, then you could use iterrows:

for index, row df.iterrows():
    print(index)
    print(row)

If you want to print (or do other things) to only some of the columns:

for index, row in df.iterrows():
    print(row['colA1'], row['colA2'])

Read more on .iterrows() here