1

I want to create my own function that scans a number of user-specified columns in a dataframe, and that function will create a new variable and assign it as '1' if all the specified columns == 1, otherwise 0.

In the following codes, I am accommodating if users are inputting exactly two columns to be scanned over.

import numpy as np

class Tagger:
    def __init__(self):
        pass

    def summing_all_tagger(self, df, tag_var_list, tag_value=1):
        # This tagger creates a tag='1' if all variables in tag_var_list equals to tag_value; otherwise='0'

        self.df = df
        self.tag_var_list = tag_var_list
        self.tag_value = tag_value

        self.df['temp'] = np.where((self.df[self.tag_var_list[0]]==self.tag_value) & 
            (self.df[self.tag_var_list[1]]==self.tag_value), 1, 0)

        return self.df_pin['temp']

Then I can call it in the main.py file:

import pandas as pd
import datetime

import feature_tagger.feature_tagger as ft

tagger_obj = ft.Tagger()

df_pin['PIN_RX&TIME_TAG'] = tagger_obj.summing_all_tagger(df_pin, tag_var_list=['PIN_RX_TAG', 'PIN_TIME_TAG'], tag_value=1)

How can I modify it so users can enter as many column names for tag_var_list as they want?

Such as

df_pin['PIN_RX&TIME_TAG'] = tagger_obj.summing_all_tagger(df_pin, tag_var_list=['PIN_RX_TAG', 'PIN_TIME_TAG', 'PIN_NAME_TAG'], tag_value=1)

# or

df_pin['PIN_RX&TIME_TAG'] = tagger_obj.summing_all_tagger(df_pin, tag_var_list=['PIN_RX_TAG'], tag_value=1)
KubiK888
  • 4,377
  • 14
  • 61
  • 115

2 Answers2

2

I think you can create list comprehension for list of boolean masks and then reduce of masks to one with casting to integer for 0/1 column:

L = [self.df[x]==self.tag_value for x in tag_var_list]
self.df['temp'] = np.logical_and.reduce(L).astype(int)

Or DataFrame.all with casting boolean mask to integers:

self.df['temp'] = (self.df[self.tag_var_list] == self.tag_value).all(axis=1).astype(int)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2

The np.all() is your friend.

self.df['temp'] = np.where(np.all(self.df[self.tag_var_list] == self.tag_value, axis=1), 1, 0)
chrisaycock
  • 36,470
  • 14
  • 88
  • 125