My question is an extension of Delete Column in Pandas based on Condition, but I have headers and the information isn't binary. Instead of removing a column containing all zeros, I'd like to be able to pass a variable "search_var" (containing a string) to filter out columns containing only that string.
I initially thought I should read in the df and iterate across each column, read each column in as a list, and print columns where len(col_list) > 2 and search_var not in col_list. The solution provided to the previous post involving a boolean dataframe (df != search_var) intrigued me there might be a simpler way, but how could I go around the issue that the header will not match and therefore cannot purely filter on True/False?
What I have (non-working):
import pandas as pd
df = pd.read_table('input.tsv', dtype=str)
with open('output.tsv', 'aw') as ofh:
df['col_list'] = list(df.values)
if len(col_list) < 3 and search_var not in col_list:
df.to_csv(ofh, sep='\t', encoding='utf-8', header=False)
Example input, search_var = 'red'
Name Header1 Header2 Header3
name1 red red red
name2 red orange red
name3 red yellow red
name4 red green red
name5 red blue blue
Expected Output
Name Header2 Header3
name1 red red
name2 orange red
name3 yellow red
name4 green red
name5 blue blue