So I have a very large data set I'm working with, but I'm using a much smaller one as a template for what I need to do. Suppose I have the following 2 dataframes:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'cond': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B', 'B', 'B', 'B', 'B','B','B'],
'Array': ['S', 'S', 'TT', 'TT','S', 'S', 'TT', 'TT','S', 'S', 'TT', 'TT','S', 'S', 'TT', 'TT','SS','TT'],
'X': [1, 2, 3, 1, 2 , 3, 4, 7.3, 5.1, 3.2, 1.4, 5.5, 9.9, 3.2, 1.1, 3.3, 1.2, 5.4],
'Y': [3.1, 2.2, 2.1, 1.2, 2.4, 1.2, 1.5, 1.33, 1.5, 1.6, 1.4, 1.3, 0.9, 0.78, 1.2, 4.0, 5.0, 6.0],
'Marker': [2.0, 1.2, 1.2, 2.01, 2.55, 2.05, 1.66, 3.2, 3.21, 3.04, 8.01, 9.1, 7.06, 8.1, 7.9, 5.12, 5.23, 5.15],
'Area': [3.0, 2.0, 2.88, 1.33, 2.44, 1.25, 1.53, 1.0, 0.156, 2.0, 2.4, 6.3, 6.9, 9.78, 10.2, 15.0, 16.0, 19.0]
})
print(df)
df2 = pd.DataFrame({
'cond': ['A', 'A', 'B', 'B', 'B'],
'Array': ['S', 'TT', 'S', 'SS','TT'],
'cutoff1': [2.55, 2.01, 7.06, 1, 8.01],
'cutoff2': [1.60, 2.2, 2.1, 1.2, 2.4]
})
print(df2)
This produces the following two sets:
cond Array X Y Marker Area
0 A S 1.0 3.10 2.00 3.000
1 A S 2.0 2.20 1.20 2.000
2 A TT 3.0 2.10 1.20 2.880
3 A TT 1.0 1.20 2.01 1.330
4 A S 2.0 2.40 2.55 2.440
5 A S 3.0 1.20 2.05 1.250
6 A TT 4.0 1.50 1.66 1.530
7 A TT 7.3 1.33 3.20 1.000
8 A S 5.1 1.50 3.21 0.156
9 B S 3.2 1.60 3.04 2.000
10 B TT 1.4 1.40 8.01 2.400
11 B TT 5.5 1.30 9.10 6.300
12 B S 9.9 0.90 7.06 6.900
13 B S 3.2 0.78 8.10 9.780
14 B TT 1.1 1.20 7.90 10.200
15 B TT 3.3 4.00 5.12 15.000
16 B SS 1.2 5.00 5.23 16.000
17 B TT 5.4 6.00 5.15 19.000
cond Array cutoff1 cutoff2
0 A S 2.55 1.6
1 A TT 2.01 2.2
2 B S 7.06 2.1
3 B SS 1.00 1.2
4 B TT 8.01 2.4
What I would like to do is use the cutoff values in df2 to modify my original data set (df). What I'm trying to do is convert all the "Marker" values in df to either 0 or 1. The goal is to create two more dataframes, one which uses cutoff1 as a threshold and one which uses cutoff2 as a threshold. So for example, for cutoff1, since the A-S paring has a cutoff of 2.55, I'd like to make a new dataframe that where all the A-S pairings with a Marker value <=2.55 get set to 0, the B-S pairings with a value <=7.06 get set to 0, etc. and everything else in df remains unchanged. Similarly, I'd like to make a second dataframe where the same thing is done but for the cutoff2 values instead.
I tried to search stack overflow for a model that's already been done that I could adapt to mine, but I only seem to found ones where you change all the values in a single column on one threshold (for example here: Most efficient way to convert values of column in Pandas DataFrame), whereas here there's multiple cutoffs for one column, which is based on the indexing of two other columns.