How can I select discrete columns from data frame

Question

My data table looks like this. With data.columns of following 60 columns

['WeatherHR0', 'WeatherHR1', 'WeatherHR2', 'WeatherHR3', 'WeatherHR4',
       'WeatherHR5', 'WeatherHR6', 'WeatherHR7', 'WeatherHR8', 'WeatherHR9',
       'WeatherHR10', 'WeatherHR11', 'WeatherHR12', 'WeatherHR13',
       'WeatherHR14', 'WeatherHR15', 'WeatherHR16', 'WeatherHR17',
       'WeatherHR18', 'WeatherHR19', 'WeatherHR20', 'WeatherHR21',
       'WeatherHR22', 'WeatherHR23', 'AvgDB', 'HDD0', 'HDD5', 'HDD10', 'HDD13',
       'HDD18', 'CDD15', 'CDD18', 'Peak Average', 'Day of Week', 'Holiday',
       'HR1', 'HR2', 'HR3', 'HR4', 'HR5', 'HR6', 'HR7', 'HR8', 'HR9', 'HR10',
       'HR11', 'HR12', 'HR13', 'HR14', 'HR15', 'HR16', 'HR17', 'HR18', 'HR19',
       'HR20', 'HR21', 'HR22', 'HR23', 'HR24', 'Max']

At times, I wanted to select data from multiple columns that are separated by unwanted columns (unwanted as for that moment, might needed for later).

I wanted to do something like df.loc[['WeatherHR0':WeatherHR23'+ 'Peak Average'+ 'HR0':'HR24']].

For example, I may want to select columns from WeatherHR0~WeatherHR23 + Peak Average + HR0~24 while keep other columns undropped.

I know I can drop the unselected columns/create new dataframe, but is there a pythonic way to selected discreted columns in pandas?

Does this answer your question? [Selecting multiple columns in a pandas dataframe](https://stackoverflow.com/questions/11285613/selecting-multiple-columns-in-a-pandas-dataframe) — AMC, Feb 28 '20 at 20:45
@AMC No it doesn't. I need to select multiple discrete columns which I can't use slice selection in loc[] operation to combine columns do to like df.loc[HR0:24 + Peak + Weather0:24]. So no, it's different question and answer. — Meruemu, Feb 29 '20 at 15:40

score 2 · Accepted Answer · answered Feb 28 '20 at 17:27

2

you can create a list of strings (pythonically) and then use that to select the columns:

columns_of_interest = ([f'WeatherHR{i}' for i in range(24)] + 
                       ['Peak Average'] + 
                       [f'HR{i}' for i in range(25)])
data[columns_of_interest]

Of course, this strategy works best if there are common patterns in the column names.

answered Feb 28 '20 at 17:27

jfaccioni

7,099
1
9
25

Thank for the quick answer. I was expecting some .iloc[] or .loc[] tricks or lambda function that I can deal with data on the fly. But it works too. – Meruemu Feb 28 '20 at 17:34
If you want to use `loc` (for example to select all columns containing the string HR), you could do `df.loc[:, df.columns.str.contains('HR')]`, although you'd still need to manually add the Peak Average column. – jfaccioni Feb 28 '20 at 17:43

How can I select discrete columns from data frame

1 Answers1