python: loop through list of dataframes and list of lists

Question

I have a list of dataframes (lst_dfs) of which i want to subset columns using a list of partial column names (lst). The list of the columns needs to be applied with startswith as sometimes the end of the identifier is different. the index of the list of dataframes matches the index of the list of names. Its easy to apply with one dataframe, but not with this list/in a loop. The expected output would be a dictionary containing a list of two dataframes with the subsetted columns but its returning empty. I think my level of iteration is incorrect (amongst other things?). any help is very appreciated. thanks so much!

two data framees that i put in to a list

df1 = pd.DataFrame(data={'x':[1,2,3,4,5], 
                         'am.1': [1,1,1,1,1],
                         'abn.1': [1,1,1,1,1],
                         'b1c': [1,1,1,1,1],
                         'b1d': [1,1,1,1,1]})

df2 = pd.DataFrame(data={'x':[1,2,3,4,5], 
                         'am.1': [1,1,1,1,1],
                         'am.1': [1,1,1,1,1],
                         'al.2': [1,1,1,1,1],
                         'b1d': [1,1,1,1,1],
                         'b2d': [1,1,1,1,1]})

lst_dfs = [df1, df1]

lst = (['a','b'],['am','b1'])

dat={}
for i, df in enumerate(lst_dfs):
    for elem in lst:
        print(elem)
        dat[i] = df.loc[(df.columns.str.startswith(str(elem)))]

Does this answer your question? [pandas: best way to select all columns whose names start with X](https://stackoverflow.com/questions/27275236/pandas-best-way-to-select-all-columns-whose-names-start-with-x) — Chris, Jun 15 '21 at 17:18
@Chris thanks for the comment but not really no as they start with different characters, so it needs to iterate through elements within the list. thanks though. — b101, Jun 15 '21 at 18:28

score 0 · Accepted Answer · answered Jun 15 '21 at 17:54

0

Use df.filter with regex param to filter out the columns that start with elements in your list:

from collections import defaultdict
dat = defaultdict(list)

for i, df in enumerate(lst_dfs):
    for elem in lst:
        dat[i].append(df.filter(regex='^('+'|'.join(elem)+')', axis=1))

Output:

>>> dat[0]
[   am.1  abn.1  b1c  b1d
0     1      1    1    1
1     1      1    1    1
2     1      1    1    1
3     1      1    1    1
4     1      1    1    1,    am.1  b1c  b1d
0     1    1    1
1     1    1    1
2     1    1    1
3     1    1    1
4     1    1    1]

answered Jun 15 '21 at 17:54

Ank

1,704
9
14

thanks for the response. no its appending all original columns rather than subsetting them – b101 Jun 16 '21 at 08:55
It should work. What are the column names in your original df? It works for the sample df you posted in your question (see output). – Ank Jun 16 '21 at 08:58

python: loop through list of dataframes and list of lists

1 Answers1