Pandas: add column name to a list, if the column contains a specific set of value

Question

I wish to create a new list which contains column names of those columns which have atleast one of the following values.

Most of the time
Quite Often
Less than often
Never

Sample :

df ={'A':['name1', 'name2', 'name3', 'name4'],

 'B':['Most of the Time', 'Never', 'Quite Often', 'Less than 
        Often'],

 'C':['abc', 'DEF', 'ghi', 'Jkl'],
 'D':['1', '2', '3', '4'],
 'E':['Most of the Time', 'Never', 'Quite Often', 'Less than 
        Often'],
 'F':['Most of the Time', 'Never', 'Quite Often', 'Less than 
        Often']       
           }

h_ls = ['B','E','F']

I tried the following code

df.columns

h_ls=[]

    

for x in df.columns:
    xi = str(x)
    for i in df[x]:
        if i.startswith("Most of the Time") or i.startswith("Quite 
            Often") or i.startswith("Less that Often") or 
            i.startswith("Never"):
            
            h_ls.append(xi)
            break
        else:
            continue

I get an error that says 'Timestamp' object has no attribute 'startswith'

It gets stuck on the first column where the condition is false.

Can anyone tell me where I am committing the mistake or if there exists a better solution? I have dropped the timestamp column from the data frame but still it's popping up.

please provide a sample of the input data and the matching expected output — mozway, Mar 31 '22 at 08:59
possible duplicate of https://stackoverflow.com/questions/35956712/check-if-certain-value-is-contained-in-a-dataframe-column-in-pandas. It is at least very similar. — SpaceBurger, Mar 31 '22 at 09:02

score 1 · Answer 1 · answered Mar 31 '22 at 09:16

Your error is coming up because not all your data are strings, so the .startswith method fails. You can use .astype(str) to force a series to be strings. In your example that would be for i in df[x].astype(str):

It is, however, bad practice to loop over the series. Instead you could apply your check to the whole column at once.

accepted_strings = ["Most of the time", "Quite Often", "Less than often", "Never"]
h_ls = [col for col in df.columns if df[col].isin(accepted_strings).any()]

Here df[col].isin(accepted_strings) returns a boolean series of [True, False, False...] corresponding to whether the values in df[col] are in your accepted_string list. .any() then returns True if any of the values in this boolean series are True.

score 1 · Accepted Answer · answered Apr 01 '22 at 19:33

1

Here is another way:

df.columns[df.isin(accepted_strings).any()].tolist()

Output:

['B', 'E', 'F']

answered Apr 01 '22 at 19:33

rhug123

7,893
1
9
24

Pandas: add column name to a list, if the column contains a specific set of value

2 Answers2