2

I have a dataframe consisting of 6 columns . What shall be the fastest way to generate a matrix which does the following:

Step 1) col1*col1a , col2*col2a, col3*col3a, col4*col4a

Step 2 ) col_new = (col1*col1a)-col2*col2a)/(col1a-col2a)

Using a for loop is one of the options - but what could be a quicker way to go about this.

import pandas as pd
df=pd.DataFrame()
df['col1']=[100,200,300,400,500]
df['col1a']=[6,71,8,90,10]
df['col2']=[600,700,800,1900,100]
df['col2a']=[6,17,8,9,10]
df['col3']=[100,220,300,440,500]
df['col3a']=[1,22,3,44,5]

df[1x2]=(df['col1']*df['col1a']-df['col2']*df['col2a'])/(df['col1a']-df['col2a'])

I need to have column combinations of 1x3,1x4,1x5,2x3,2x4 and so on...

Viraj Desai
  • 63
  • 1
  • 1
  • 5
  • Your operation is not symmetric. Do you need 1x3 and 3x1? What should happen when you divide by 0? – Quickbeam2k1 Sep 28 '19 at 10:43
  • colxa`s represent days such that - number of days in a month- thereby making it 1a (30days)>2a(60days)>3a(90 days) - so the solution could rule out a division by 0. However, since we are calculating business days - the could for 1m could vary and so with other months as well - and hence not assuming a constant 30,60,90 here – Viraj Desai Sep 28 '19 at 13:17
  • So I don't think your answer really helps me. Anyway, check my updated answer – Quickbeam2k1 Sep 28 '19 at 18:13

2 Answers2

1

Here is how I will approach it:

def new_col(df, col1, col2):
    """
    Add a new column, modifying the dataframe inplace.

    col1: int
        column counter in the first column name
    col2: int
        column counter in the second column name
    """
    nr = (
        df.loc[:, f"col{col1}"] * df.loc[:, f"col{col1}a"]
        - df.loc[:, f"col{col2}"] * df.loc[:, f"col{col2}a"]
    )
    dr = df.loc[:, f"col{col1}a"] - df.loc[:, f"col{col2}a"]

    df.loc[:, f"col{col1}X{col2}"] = nr / dr

I will call this function with desired column combinations. For ex.

new_col(df, 1, 2)

Output:

enter image description here

The call be issued from a loop.

quest
  • 3,576
  • 2
  • 16
  • 26
1

So apparently, my first answer only matched the original question: Here is an answer for the updated question:

from itertools import combinations
from functools import partial

primary_columns = df.columns[~df.columns.str.endswith("a")]

combs = combinations(primary_columns, 2)

def column_comparison(first, second, df):
    return  (df[first]*df[first+"a"]-df[second]*df[second+"a"])/(df[first+"a"] - df[second+"a"])

dct = {'{first}X{second}'.format(first=comb[0].lstrip("col"), second=comb[1].lstrip("col")): 
       partial(column_comparison, comb[0], comb[1]) for comb in combs}

So we created a dictionary that contains the name of the desired columns and the right function.

Now we can leverage assign

df.assign(**dct)

to obtain

   col1  col1a  col2  col2a  col3  col3a         1X2         1X3          2X3
0   100      6   600      6   100      1        -inf  100.000000   700.000000
1   200     71   700     17   220     22   42.592593  191.020408 -1412.000000
2   300      8   800      8   300      3        -inf  300.000000  1100.000000
3   400     90  1900      9   440     44  233.333333  361.739130    64.571429
4   500     10   100     10   500      5         inf  500.000000  -300.000000 

In a previous version I was using a lambda here, but this was not working - check here for an explanation. I only realized this after finding the solution using partial.

Quickbeam2k1
  • 5,287
  • 2
  • 26
  • 42