0

So I have a very large data set I'm working with, but I'm using a much smaller one as a template for what I need to do. Suppose I have the following 2 dataframes:

import numpy as np
import pandas as pd
df = pd.DataFrame({
   'cond': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B', 'B', 'B', 'B', 'B','B','B'],
   'Array':  ['S', 'S', 'TT', 'TT','S', 'S', 'TT', 'TT','S', 'S', 'TT', 'TT','S', 'S', 'TT', 'TT','SS','TT'],
   'X':  [1, 2, 3, 1, 2 , 3, 4, 7.3, 5.1, 3.2, 1.4, 5.5, 9.9, 3.2, 1.1, 3.3, 1.2, 5.4],
   'Y':  [3.1, 2.2, 2.1, 1.2,  2.4, 1.2, 1.5, 1.33, 1.5, 1.6, 1.4, 1.3, 0.9, 0.78, 1.2, 4.0, 5.0, 6.0],
   'Marker':  [2.0, 1.2, 1.2, 2.01, 2.55, 2.05, 1.66, 3.2, 3.21, 3.04, 8.01, 9.1, 7.06, 8.1, 7.9, 5.12, 5.23, 5.15],
   'Area': [3.0, 2.0, 2.88, 1.33,  2.44, 1.25, 1.53, 1.0, 0.156, 2.0, 2.4, 6.3, 6.9, 9.78, 10.2, 15.0, 16.0, 19.0]
})
print(df)

df2 = pd.DataFrame({
   'cond': ['A', 'A', 'B', 'B', 'B'],
   'Array':  ['S', 'TT', 'S', 'SS','TT'],
   'cutoff1':  [2.55, 2.01, 7.06, 1, 8.01],
   'cutoff2':  [1.60, 2.2, 2.1, 1.2,  2.4]
})
print(df2)

This produces the following two sets:

   cond Array    X     Y  Marker    Area
0     A     S  1.0  3.10    2.00   3.000
1     A     S  2.0  2.20    1.20   2.000
2     A    TT  3.0  2.10    1.20   2.880
3     A    TT  1.0  1.20    2.01   1.330
4     A     S  2.0  2.40    2.55   2.440
5     A     S  3.0  1.20    2.05   1.250
6     A    TT  4.0  1.50    1.66   1.530
7     A    TT  7.3  1.33    3.20   1.000
8     A     S  5.1  1.50    3.21   0.156
9     B     S  3.2  1.60    3.04   2.000
10    B    TT  1.4  1.40    8.01   2.400
11    B    TT  5.5  1.30    9.10   6.300
12    B     S  9.9  0.90    7.06   6.900
13    B     S  3.2  0.78    8.10   9.780
14    B    TT  1.1  1.20    7.90  10.200
15    B    TT  3.3  4.00    5.12  15.000
16    B    SS  1.2  5.00    5.23  16.000
17    B    TT  5.4  6.00    5.15  19.000
  cond Array  cutoff1  cutoff2
0    A     S     2.55      1.6
1    A    TT     2.01      2.2
2    B     S     7.06      2.1
3    B    SS     1.00      1.2
4    B    TT     8.01      2.4

What I would like to do is use the cutoff values in df2 to modify my original data set (df). What I'm trying to do is convert all the "Marker" values in df to either 0 or 1. The goal is to create two more dataframes, one which uses cutoff1 as a threshold and one which uses cutoff2 as a threshold. So for example, for cutoff1, since the A-S paring has a cutoff of 2.55, I'd like to make a new dataframe that where all the A-S pairings with a Marker value <=2.55 get set to 0, the B-S pairings with a value <=7.06 get set to 0, etc. and everything else in df remains unchanged. Similarly, I'd like to make a second dataframe where the same thing is done but for the cutoff2 values instead.

I tried to search stack overflow for a model that's already been done that I could adapt to mine, but I only seem to found ones where you change all the values in a single column on one threshold (for example here: Most efficient way to convert values of column in Pandas DataFrame), whereas here there's multiple cutoffs for one column, which is based on the indexing of two other columns.

Brenton
  • 435
  • 2
  • 5
  • 14

1 Answers1

1

You could do, for each cutoff separately:

df = df.set_index(['cond', 'Array'])
result = df.merge(df2, on=['cond', 'Array'])
result.loc[result.Marker < result.cutoff1, 'Marker'] = 0
result = result.drop(['cutoff1', 'cutoff2'], axis=1)

print(result)

Output

   cond Array    X     Y  Marker    Area
0     A     S  1.0  3.10    0.00   3.000
1     A     S  2.0  2.20    0.00   2.000
2     A     S  2.0  2.40    2.55   2.440
3     A     S  3.0  1.20    0.00   1.250
4     A     S  5.1  1.50    3.21   0.156
5     A    TT  3.0  2.10    0.00   2.880
6     A    TT  1.0  1.20    2.01   1.330
7     A    TT  4.0  1.50    0.00   1.530
8     A    TT  7.3  1.33    3.20   1.000
9     B     S  3.2  1.60    0.00   2.000
10    B     S  9.9  0.90    7.06   6.900
11    B     S  3.2  0.78    8.10   9.780
12    B    TT  1.4  1.40    8.01   2.400
13    B    TT  5.5  1.30    9.10   6.300
14    B    TT  1.1  1.20    0.00  10.200
15    B    TT  3.3  4.00    0.00  15.000
16    B    TT  5.4  6.00    0.00  19.000
17    B    SS  1.2  5.00    5.23  16.000

Note that this example is only for cutoff1.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76