3

My dataframe is like df.columns= ['Time1','Pmpp1','Time2',..........,'Pmpp96'] I want to select two successive columns at a time. Example, Time1,Pmpp1 at a time. My code is:

for i,j in zip(df.columns,df.columns[1:]):
    print(i,j)

My present output is:

 Time1 Pmmp1
 Pmmp1 Time2
 Time2 Pmpp2

Expected output is:

 Time1 Pmmp1
 Time2 Pmpp2
 Time3 Pmpp3 
jpp
  • 159,742
  • 34
  • 281
  • 339
Msquare
  • 353
  • 1
  • 7
  • 17

4 Answers4

5

You're zipping on the list, and the same list starting from the second element, which is not what you want. You want to zip on the uneven and even indices of your list. For example, you could replace your code with:

for i, j in zip(df.columns[::2], df.columns[1::2]): print(i, j)

Yaniv Oliver
  • 3,372
  • 1
  • 19
  • 20
  • Thanks for response. But, it is giving error as `invalid syntax`. Could you check it again. – Msquare Aug 27 '18 at 11:46
  • `File "", line 18 for ^ SyntaxError: invalid syntax` – Msquare Aug 27 '18 at 11:49
  • Post a complete code sample that reproduces this error – Yaniv Oliver Aug 27 '18 at 12:10
  • I tried yesterday. It did not work. Today, it worked. Excellent. Thank you very much. I am in starting stage of python and Could you please explain how to understand this? mainly `.columns[::2]`, `.columns[1::2]` ? I mean, what exactly they are doing? – Msquare Aug 28 '18 at 05:49
  • See [Understanding Python's slice notation](https://stackoverflow.com/questions/509211/understanding-pythons-slice-notation). It gives a nice overview of the topic – Yaniv Oliver Aug 28 '18 at 06:56
1

As an alternative to integer positional slicing, you can use str.startswith to create 2 index objects. Then use zip to iterate over them pairwise:

df = pd.DataFrame(columns=['Time1', 'Pmpp1', 'Time2', 'Pmpp2', 'Time3', 'Pmpp3'])

times = df.columns[df.columns.str.startswith('Time')]
pmpps = df.columns[df.columns.str.startswith('Pmpp')]

for i, j in zip(times, pmpps):
    print(i, j)

Time1 Pmpp1
Time2 Pmpp2
Time3 Pmpp3
jpp
  • 159,742
  • 34
  • 281
  • 339
0

In this kind of scenario, it might make sense to reshape your DataFrame. So instead of selecting two columns at a time, you have a DataFrame with the two columns that ultimately represent your measurements.

First, you make a list of DataFrames, where each one only has a Time and Pmpp column:

dfs = []
for i in range(1,97):
    tmp = df[['Time{0}'.format(i),'Pmpp{0}'.format(i)]]
    tmp.columns = ['Time', 'Pmpp']  # Standardize column names
    tmp['n'] = i                    # Remember measurement number
    dfs.append(tmp)                 # Keep with our cleaned dataframes 

And then you can join them together into a new DataFrame. That has three columns.

new_df = pd.concat(dfs, ignore_index=True, sort=False)

This should be a much more manageable shape for your data.

>>> new_df.columns
[n, Time, Pmpp]

Now you can iterate through the rows in this DataFrame and get the values for your expected output

for i, row in new_df.iterrows():
    print(i, row.n, row.Time, row.Psmpp)

It also will make it easier to use the rest of pandas to analyze your data.

new_df.Pmpp.mean()
new_df.describe()
jfbeltran
  • 1,808
  • 3
  • 13
  • 17
0

After a series of trials, I got it. My code is given below:

for a in range(0,len(df.columns),2):
    print(df.columns[a],df.columns[a+1]) 

My output is:

DateTime   A016.Pmp_ref
DateTime.1 A024.Pmp_ref
DateTime.2 A040.Pmp_ref
DateTime.3 A048.Pmp_ref
DateTime.4 A056.Pmp_ref
DateTime.5 A064.Pmp_ref
DateTime.6 A072.Pmp_ref
DateTime.7 A080.Pmp_ref
DateTime.8 A096.Pmp_ref
DateTime.9 A120.Pmp_ref
DateTime.10 A124.Pmp_ref
DateTime.11 A128.Pmp_ref
Msquare
  • 353
  • 1
  • 7
  • 17