0

Let's say I have a list that contains all pandas df column names. How can I check if the difference between any column pair is bigger than 3?

Pseudo-code

IF difference between df['T01'] and df['T02'] > 3 or difference between df['T03'] and df['T04'] > 3 or difference between df['T05'] and df['T06'] > 3 and so on... THEN
DO SOMETHING

Code

df_column_names = ['T01', 'T02', 'T03', 'T04', 'T05', 'T06', 'T07', 'T08', 'T09', 'T10', 'T11', 'T12', 'T13', 'T14', 'T15', 'T16', 'T17', 'T18', 'T19', 'T20', 'T21', 'T22', 'T23', 'T24', 'T25', 'T26', 'T27', 'T28', 'T29', T30', 'T31', 'T32']

df
| T01 | T02    | T03 | T04   | ... |
|-----|--------|-----|-------|-----|
| 0.1 | 0.5685 | 1.4 | 0.333 | ... |
konichiwa
  • 532
  • 1
  • 5
  • 23

1 Answers1

2

If pair number of columns select by indexing and subtract:

df1 = df.iloc[:, ::2] - df.iloc[:, 1::2].values

General solution with DataFrameGroupBy.diff:

c = np.arange(len(df.columns)) // 2
df1 = df.groupby(c, axis=1).diff(axis=1).dropna(axis=1, how='all')

EDIT:

If need select by columns names in list:

df1 = df[df_column_names].iloc[:, ::2] - df[df_column_names].iloc[:, 1::2].values


df = df[df_column_names]
c = np.arange(len(df.columns)) // 2
df1 = df.groupby(c, axis=1).diff(axis=1).dropna(axis=1, how='all')

Sample:

df = pd.DataFrame({
         'A':list('abcdef'),
         'T04':[4,5,4,5,5,4],
         'T03':[7,8,9,4,2,3],
         'T02':[1,3,5,7,1,0],
         'T01':[5,3,6,9,2,4],
         'F':list('aaabbb')
})

df_column_names = ['T01', 'T02', 'T03', 'T04']

df1 = df[df_column_names].iloc[:, ::2] - df[df_column_names].iloc[:, 1::2].values
print (df1)
   T01  T03
0    4    3
1    0    3
2    1    5
3    2   -1
4    1   -3
5    4   -1

mask = df1 > 3
print (mask)
     T01    T03
0   True  False
1  False  False
2  False   True
3  False  False
4  False  False
5   True  False
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • How can I do this with the column names in the list? There are 100+ other columns that are not relevant. Also the order of the columns in the df are not sequential.That's why I am using column names as reference instead of indexes. – konichiwa Aug 06 '19 at 12:03
  • Thanks a lot! How can I check if any delta value is larger than 3? – konichiwa Aug 06 '19 at 12:44
  • @dunkubok - `DO SOMETHING` - can you explain more? Because if use `mask = df1 > 3` get boolean DataFrame and process it. – jezrael Aug 06 '19 at 13:01
  • I just need an if-statement that is satisfied when a single value from `df1` is bigger than 3 – konichiwa Aug 06 '19 at 13:05
  • @dunkubok - If check [this](https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#using-if-truth-statements-with-pandas), here your solution cannot be used. So whats happen if True in conditions? Do you need test if at least one True or need all Trues? – jezrael Aug 06 '19 at 13:06
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/197554/discussion-between-jezrael-and-dunkubok). – jezrael Aug 06 '19 at 13:09