Pandas include columns where column names include specific string in IF statement?

Question

The following dataframe called df_merged, is a snippet of a larger dataframe with 30+ commoditites like oil, gold, silver, etc.

    Date        CRUDE_OIL   CRUDE_OIL_SMA200    GOLD        GOLD_SMA200 SILVER    SILVER_SMA200
0   2021-04-26  61.91       48.415              1779.199951 1853.216498 26.211    25.269005
1   2021-04-27  62.939999   48.5252             1778        1853.028998 26.412001 25.30566
2   2021-04-28  63.860001   48.6464             1773.199951 1852.898998 26.080999 25.341655
3   2021-04-29  65.010002   48.7687             1768.099976 1852.748498 26.052999 25.377005
4   2021-04-30  63.580002   48.8861             1767.300049 1852.529998 25.853001 25.407725

How can I implement a way to compare the regular commodity price with the SMA200 equivalent in an IF statement?

My current setup includes the below if statement for 30+ columns but I believe this can be done in a function.

comm_name = []
comm_averages = []

if (df_merged.CRUDE_OIL.tail(1) > df_merged.CRUDE_OIL_SMA200.tail(1)).any():
    print("CRUDE_OIL above 200SMA")
    comm_name.append("CRUDE_OIL")
    comm_averages.append(1)
else:
    print("CRUDE_OIL under 200SMA")
    comm_name.append("CRUDE_OIL")
    comm_averages.append(0)

if (df_merged.GOLD.tail(1) > df_merged.GOLD_SMA200.tail(1)).any():
    print("GOLD above 200SMA")
    comm_name.append("GOLD")
    comm_averages.append(1)
else:
    print("GOLD under 200SMA")
    comm_name.append("GOLD")
    comm_averages.append(0)
    
if (df_merged.SILVER.tail(1) > df_merged.SILVER_SMA200.tail(1)).any():
    print("SILVER above 200SMA")
    comm_name.append("SILVER")
    comm_averages.append(1)
else:
    print("SILVER under 200SMA")
    comm_name.append("SILVER")
    comm_averages.append(0)

comm_signals = pd.DataFrame(
{'Name': comm_name,
 'Signal': comm_averages
})

comm_signals

output of comm_signals:

    Name      Signal
0   CRUDE_OIL   1
1   GOLD        0
2   SILVER      1

I looked through this SO thread but couldn't figure out how to implement: Find column whose name contains a specific string

I guess the goal is a function like this:

comm_name = []
comm_averages = []

def comm_compare(df):
    if (df_merged["X"].tail(1) > df_merged["X" + "_SMA200"].tail(1)).any():
        print(X + "above 200SMA")
        comm_name.append(X)
        comm_averages.append(1)
    else:
        print(X + "under 200SMA")
        comm_name.append(X)
        comm_averages.append(0)

comm_signals = pd.DataFrame(
{'Name': comm_name,
 'Signal': comm_averages
})

print(comm_signals)

    Name      Signal
0   CRUDE_OIL   1
1   GOLD        0
2   SILVER      1

Please edit your question and include the expected output for the given sample dataframe. — Shubham Sharma, May 22 '21 at 12:51

score 1 · Accepted Answer · answered May 22 '21 at 14:22

Try stack + groupby diff

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'Date': ['2021-04-26', '2021-04-27', '2021-04-28', '2021-04-29',
             '2021-04-30'],
    'CRUDE_OIL': [61.91, 62.939999, 63.860001, 65.010002, 63.580002],
    'CRUDE_OIL_SMA200': [48.415, 48.5252, 48.6464, 48.7687, 48.8861],
    'GOLD': [1779.199951, 1778.0, 1773.199951, 1768.099976, 1767.300049],
    'GOLD_SMA200': [1853.216498, 1853.028998, 1852.898998, 1852.748498,
                    1852.529998],
    'SILVER': [26.211, 26.412001, 26.080999, 26.052999, 25.853001],
    'SILVER_SMA200': [25.269005, 25.30566, 25.341655, 25.377005, 25.407725]
})

# Grab tail(1) and only numeric columns
# Replace this with more specific select or use filter if not all
# number columns are needed
s = df.tail(1).select_dtypes('number')

# Remove Suffix from all columns
s.columns = s.columns.str.rstrip('_SMA200')

s = (
    # Subtract and subtract each group and compare to 0
    (s.stack().groupby(level=1).diff().dropna() < 0)
        .astype(int)  # Convert to Int
        .droplevel(0)  # Cleanup levels index and column names
        .reset_index()
        .rename(columns={'index': 'Name', 0: 'Signal'})
)

print(s)

s:

        Name  Signal
0  CRUDE_OIL       1
1       GOLD       0
2     SILVER       1

Grzegorz Skibinski · Answer 2 · 2021-05-22T15:05:46.263

To get the output per day you can do:

# to make connection between 'equity' and 'equity_sma200':
df.columns = df.columns.str.split("_", expand=True)
# we split dates to index - so we have only prices in the table:
df.set_index([("Date",)], append=False, inplace=True)
# you might not need casting 
df = df.T.astype(float)

# since we have only 2 lines per day, per equity - we can just take negative sma, and cross-check aggregate sum:
mask = df.index.get_level_values(1) == "SMA200"
df.loc[mask] = -df.loc[mask]
df = df.groupby(level=0)[df.columns].sum().gt(0)
# moving columns back to human format:
df.columns = map(lambda x: x[0], df.columns)

Output for the sample data:

        2021-04-26  2021-04-27  ...  2021-04-29  2021-04-30
CRUDE         True        True  ...        True        True
GOLD         False       False  ...       False       False
SILVER        True        True  ...        True        True

Pandas include columns where column names include specific string in IF statement?

2 Answers2