0

I have a list of dataframes (lst_dfs) of which i want to subset columns using a list of partial column names (lst). The list of the columns needs to be applied with startswith as sometimes the end of the identifier is different. the index of the list of dataframes matches the index of the list of names. Its easy to apply with one dataframe, but not with this list/in a loop. The expected output would be a dictionary containing a list of two dataframes with the subsetted columns but its returning empty. I think my level of iteration is incorrect (amongst other things?). any help is very appreciated. thanks so much!

two data framees that i put in to a list

df1 = pd.DataFrame(data={'x':[1,2,3,4,5], 
                         'am.1': [1,1,1,1,1],
                         'abn.1': [1,1,1,1,1],
                         'b1c': [1,1,1,1,1],
                         'b1d': [1,1,1,1,1]})

df2 = pd.DataFrame(data={'x':[1,2,3,4,5], 
                         'am.1': [1,1,1,1,1],
                         'am.1': [1,1,1,1,1],
                         'al.2': [1,1,1,1,1],
                         'b1d': [1,1,1,1,1],
                         'b2d': [1,1,1,1,1]})

lst_dfs = [df1, df1]

lst = (['a','b'],['am','b1'])
dat={}
for i, df in enumerate(lst_dfs):
    for elem in lst:
        print(elem)
        dat[i] = df.loc[(df.columns.str.startswith(str(elem)))]

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
b101
  • 287
  • 1
  • 8
  • Does this answer your question? [pandas: best way to select all columns whose names start with X](https://stackoverflow.com/questions/27275236/pandas-best-way-to-select-all-columns-whose-names-start-with-x) – Chris Jun 15 '21 at 17:18
  • @Chris thanks for the comment but not really no as they start with different characters, so it needs to iterate through elements within the list. thanks though. – b101 Jun 15 '21 at 18:28

1 Answers1

0

Use df.filter with regex param to filter out the columns that start with elements in your list:

from collections import defaultdict
dat = defaultdict(list)

for i, df in enumerate(lst_dfs):
    for elem in lst:
        dat[i].append(df.filter(regex='^('+'|'.join(elem)+')', axis=1))

Output:

>>> dat[0]
[   am.1  abn.1  b1c  b1d
0     1      1    1    1
1     1      1    1    1
2     1      1    1    1
3     1      1    1    1
4     1      1    1    1,    am.1  b1c  b1d
0     1    1    1
1     1    1    1
2     1    1    1
3     1    1    1
4     1    1    1]
Ank
  • 1,704
  • 9
  • 14
  • thanks for the response. no its appending all original columns rather than subsetting them – b101 Jun 16 '21 at 08:55
  • It should work. What are the column names in your original df? It works for the sample df you posted in your question (see output). – Ank Jun 16 '21 at 08:58