This question is a near-duplicate of this one, with some tweaks.
Take the following data frame, and get the positions of the columns that have "sch" or "oa" in them. Simple enough in R:
df <- data.frame(cheese = rnorm(10),
goats = rnorm(10),
boats = rnorm(10),
schmoats = rnorm(10),
schlomo = rnorm(10),
cows = rnorm(10))
grep("oa|sch", colnames(df))
[1] 2 3 4 5
write.csv(df, file = "df.csv")
Now over in python, I could use some verbose list comprehension:
import pandas as pd
df = pd.read_csv("df.csv", index_col = 0)
matches = [i for i in range(len(df.columns)) if "oa" in df.columns[i] or "sch" in df.columns[i]]
matches
Out[10]: [1, 2, 3, 4]
I'd like to know if there is a better way to do this in python than the list comprehension example above. Specifically, what if I've got dozens of strings to match. In R, I could do something like
regex <- paste(vector_of_strings, sep = "|")
grep(regex, colnames(df))
But it isn't obvious how to do this using list comprehension in python. Maybe I could use string manipulation to programmatically create the string that'd get executed inside of the list, to deal with all of the repetitious or
statements?