My dataframe has many (192) columns. How to select two columns at time?

Question

My dataframe is like df.columns= ['Time1','Pmpp1','Time2',..........,'Pmpp96'] I want to select two successive columns at a time. Example, Time1,Pmpp1 at a time. My code is:

for i,j in zip(df.columns,df.columns[1:]):
    print(i,j)

My present output is:

 Time1 Pmmp1
 Pmmp1 Time2
 Time2 Pmpp2

Expected output is:

 Time1 Pmmp1
 Time2 Pmpp2
 Time3 Pmpp3

you might consider to remove `python-3.x` tag and, eventually, add `pandas`. — rpanai, Aug 27 '18 at 11:45
@user32185, I did edit my tags. I am just curious and new to this platform. How does it matter. — Msquare, Aug 27 '18 at 11:48
the `dataframe` tag could be related to other languages and it has about 15x less watchers than `pandas`. Using appropriate tags it help you to get more answer and other users who have similar problems. — rpanai, Aug 27 '18 at 11:51

score 5 · Accepted Answer · answered Aug 27 '18 at 11:42

5

You're zipping on the list, and the same list starting from the second element, which is not what you want. You want to zip on the uneven and even indices of your list. For example, you could replace your code with:

for i, j in zip(df.columns[::2], df.columns[1::2]): print(i, j)

answered Aug 27 '18 at 11:42

Yaniv Oliver

3,372
1
19
20

Thanks for response. But, it is giving error as `invalid syntax`. Could you check it again. – Msquare Aug 27 '18 at 11:46
`File "", line 18 for ^ SyntaxError: invalid syntax` – Msquare Aug 27 '18 at 11:49
Post a complete code sample that reproduces this error – Yaniv Oliver Aug 27 '18 at 12:10
I tried yesterday. It did not work. Today, it worked. Excellent. Thank you very much. I am in starting stage of python and Could you please explain how to understand this? mainly `.columns[::2]`, `.columns[1::2]` ? I mean, what exactly they are doing? – Msquare Aug 28 '18 at 05:49
See [Understanding Python's slice notation](https://stackoverflow.com/questions/509211/understanding-pythons-slice-notation). It gives a nice overview of the topic – Yaniv Oliver Aug 28 '18 at 06:56

score 1 · Answer 2 · answered Aug 27 '18 at 12:05

As an alternative to integer positional slicing, you can use str.startswith to create 2 index objects. Then use zip to iterate over them pairwise:

df = pd.DataFrame(columns=['Time1', 'Pmpp1', 'Time2', 'Pmpp2', 'Time3', 'Pmpp3'])

times = df.columns[df.columns.str.startswith('Time')]
pmpps = df.columns[df.columns.str.startswith('Pmpp')]

for i, j in zip(times, pmpps):
    print(i, j)

Time1 Pmpp1
Time2 Pmpp2
Time3 Pmpp3

score 0 · Answer 3 · answered Aug 27 '18 at 12:04

In this kind of scenario, it might make sense to reshape your DataFrame. So instead of selecting two columns at a time, you have a DataFrame with the two columns that ultimately represent your measurements.

First, you make a list of DataFrames, where each one only has a Time and Pmpp column:

dfs = []
for i in range(1,97):
    tmp = df[['Time{0}'.format(i),'Pmpp{0}'.format(i)]]
    tmp.columns = ['Time', 'Pmpp']  # Standardize column names
    tmp['n'] = i                    # Remember measurement number
    dfs.append(tmp)                 # Keep with our cleaned dataframes

And then you can join them together into a new DataFrame. That has three columns.

new_df = pd.concat(dfs, ignore_index=True, sort=False)

This should be a much more manageable shape for your data.

>>> new_df.columns
[n, Time, Pmpp]

Now you can iterate through the rows in this DataFrame and get the values for your expected output

for i, row in new_df.iterrows():
    print(i, row.n, row.Time, row.Psmpp)

It also will make it easier to use the rest of pandas to analyze your data.

new_df.Pmpp.mean()
new_df.describe()

score 0 · Answer 4 · answered Aug 27 '18 at 12:19

After a series of trials, I got it. My code is given below:

for a in range(0,len(df.columns),2):
    print(df.columns[a],df.columns[a+1])

My output is:

DateTime   A016.Pmp_ref
DateTime.1 A024.Pmp_ref
DateTime.2 A040.Pmp_ref
DateTime.3 A048.Pmp_ref
DateTime.4 A056.Pmp_ref
DateTime.5 A064.Pmp_ref
DateTime.6 A072.Pmp_ref
DateTime.7 A080.Pmp_ref
DateTime.8 A096.Pmp_ref
DateTime.9 A120.Pmp_ref
DateTime.10 A124.Pmp_ref
DateTime.11 A128.Pmp_ref

My dataframe has many (192) columns. How to select two columns at time?

4 Answers4