21

I have a dataframe which looks like this:

    a1    b1    c1    a2    b2    c2    a3    ...
x   1.2   1.3   1.2   ...   ...   ...   ...
y   1.4   1.2   ...   ...   ...   ...   ...
z   ...

What I want is grouping by every nth column. In other words, I want a dataframe with all the as, one with bs and one with cs

    a1     a2     a4
x   1.2    ...    ...
y
z

In another SO question I saw that is possibile to do df.iloc[::5,:], for example, to get every 5th raw. I could do of course df.iloc[:,::3] to get the c cols but it doesn't work for getting a and b.

Any ideas?

Angelo
  • 767
  • 1
  • 6
  • 21

3 Answers3

34

slice the columns:

df[df.columns[::2]]

To get every nth column

Example:

In [2]:
cols = ['a1','b1','c1','a2','b2','c2','a3']
df = pd.DataFrame(columns=cols)
df

Out[2]:
Empty DataFrame
Columns: [a1, b1, c1, a2, b2, c2, a3]
Index: []

In [3]:
df[df.columns[::3]]
Out[3]:

Empty DataFrame
Columns: [a1, a2, a3]
Index: []

You can also filter using startswith:

In [5]:
a = df.columns[df.columns.str.startswith('a')]
df[a]

Out[5]:
Empty DataFrame
Columns: [a1, a2, a3]
Index: []

and do the same for b cols and c cols etc..

You can get a set of all the unique col prefixes using the following:

In [19]:
df.columns.str.extract(r'([a-zA-Z])').unique()

Out[19]:
array(['a', 'b', 'c'], dtype=object)

You can then use these values to filter the columns using startswith

EdChum
  • 376,765
  • 198
  • 813
  • 562
  • but I have n numbers of a,b,c. This only gets me c1,c2,c3...cn – Angelo Mar 10 '16 at 22:27
  • This is great. But what if I need to select the first 10 columns, and after that I want to select every other column. Is it possible to do it in your way without using `concat`? Thanks. – Bowen Liu Feb 08 '19 at 20:39
  • @BowenLiu the simplest thing would be to make a list of the first 10 cols, and then prepend/join this with a list of the remaining cols of interest and just pass this to your df to sub-select it. I don't think there is a nicer way to do this – EdChum Feb 08 '19 at 22:42
  • Thanks Ed. I have a question that is not related to this post. But I see that you are super erudite with Pandas so I will ask anyway: is there any way to add a total row calculating ONLY the columns that I specified, Something like `df.loc['Total'] = df.sum(select_list), select_list = [columnA, columnB ...]`. I made a post but didn't really get the answer that I want. Thanks again. – Bowen Liu Feb 12 '19 at 14:16
  • @BowenLiu https://stackoverflow.com/questions/25748683/pandas-sum-dataframe-rows-for-given-columns/25748826#25748826 – EdChum Feb 12 '19 at 14:22
  • I Googled for a while and didn't find it. Thanks! – Bowen Liu Feb 12 '19 at 14:29
  • Your answer gave a sum column calculating cells on each row. How to get a sum row calculating cells on each column? Transpose and then transpose back or simply use `axis = 0`? Thanks. – Bowen Liu Feb 12 '19 at 16:51
  • @BowenLiu if you don't pass `axis=1` then it sums the total for each column – EdChum Feb 12 '19 at 16:52
  • Perfect! I have been looking for this for so long. Thank you so much. I didn't know that I can do `list(dr)`. Dropping columns from that is much easier than listing all the columns that I do want. – Bowen Liu Feb 12 '19 at 17:02
6

In current version (0.24), this works:

Getting your 'a' columns:

df.iloc[:, ::3]

getting your 'b' columns:

df.iloc[:, 1::3]

getting your 'c' columns:

df.iloc[:, 2::3]
joctee
  • 2,429
  • 1
  • 23
  • 19
4

The following should work:

df.ix[:, ::2] - get every second column, beginning with first (here all a's)
df.ix[:, 1::2] - get every second column, beginning with second (b's)
....

I just searched for a solution to the same problem and that solved it.

divandc
  • 149
  • 5