0

Data frame,

Output

In this case columns satisfying the formula:(1,2,5,7)

condition : col1 + col2 - col5 = col7

I want to find columns from the data frame which satisfy these conditions :

  1. Condition to be satisfied : coln + colm - colo = colp .
  2. Condition works 90% of the time.

Edit : This is a specific case where I know what columns satisfy the condition. I was looking for a more dynamic approach where the columns are iterated through all combinations ,and maybe the combination with the least deviation/best match is chosen.

Is this a possibility?

  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Aug 08 '22 at 05:07
  • 1
    I would also suggest providing a dummy database with expected output. – le_camerone Aug 08 '22 at 05:26
  • Do you need `df[df.col1+df.col2-df.col5 == df.col7]` ? – jezrael Aug 08 '22 at 05:27
  • I assume that you want to select columns for which can be found 3 others satisfying the equality `all(df[first] + df[second] == df[third] + df[forth])`. To do this, we must iterate over combinations of 4 columns, i.g. `selected = {*''}; for i, j, k, l in itertools.combinations(df.columns, 4): if all(df[i]+df[j] == df[k]+df[l]): selected.update((i,j,k,l))` – Vitalizzare Aug 08 '22 at 06:16
  • if you search not for `all(...) == True` but for 90% to be True, replace `all(...)` condition to `(expresion==True)/recordsnumber >= 0.9` – Vitalizzare Aug 08 '22 at 06:20
  • Initially there are 9 columns, out of which, i want 4 columns which satisfy the condition : df[coln] + df[colm] - df[colo] = df[colp] , 90% of the time. –  Aug 08 '22 at 06:24

3 Answers3

0

Please check the below example:

import pandas as pd
technologies = {
    'col1' :[5, 6, 7, 8],
    'col2' :[1, 2, 3, 4],
    'col3' :[6, 3, 7, 8],
    'col4' :[0, 2, 3, 2500]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

df2=df.loc[(df['col1'] + df['col2'] - df['col3']) == df['col4']]
print(df2)

Only row1 and row3 satisfy this condition.

Massoud
  • 361
  • 1
  • 2
  • 16
  • Thanks for this answer, but i don't know what columns satisfy this condition beforehand.The formula has to be used on all column combinations and if the condition is satisfied 90% of the time on a particular combination , return this combination. –  Aug 08 '22 at 06:10
0

Use boolean indexing:

df1 = df[df.col1+df.col2-df.col5 == df.col7]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Under the assumption that you know which columns should be a linear combination of each other then you can use query.

df = df.query("col1 + col2 - col3 == col4")

Another answer suggested boolean indexing which essentially does the same thing, but I prefer query as it tends to be more readable and flexible

Immot
  • 241
  • 2
  • 7
  • Thanks for this answer, but i don't know what columns satisfy this condition beforehand. –  Aug 08 '22 at 06:07