Subsetting data frame into subsamples made of two columns each

Question

I have a big data frame with N columns. Columns are presented in pairs as follows:

column 1: ISIN 1, sequence of daily dates (issuance to maturity of bond 1)
column 2: historical data on prices wrt ISIN1
column 3: ISIN 2, sequence of daily dates (issuance to maturity of bond 2)
column 4: historical data on prices wrt ISIN2 and so on.

Columns are paired like this: the first two go together, and so the next two, until the end of the dataframe:

  XS0552790049  Unnamed: 5583 XS0628646480  Unnamed: 5585
0   2010-10-22          100.0   2011-05-24         99.711
1   2010-10-25          100.0   2011-05-25         99.685
2   2010-10-26          100.0   2011-05-26        100.125
3   2010-10-27          100.0   2011-05-27         99.893
4   2010-10-28          100.0   2011-05-30         99.792

I want to subset this big data frame into N/2 subsamples, each containing a pair of columns "ISIN dates + prices", as shown above. I thought about using a for loop, but I am definitely missing something as it does not generate the subsamples. Perhaps I am indexing wrong.

Here's my attempt: I tried to create a dictionary containing a subsample for every key.

sub = {}
for i in range(0,len(df.columns)+1):
    sub[i] = df.iloc[:,i:i+3]

I am pretty new with Python, so any suggestion is welcome.

Welcome to Stack Overflow! Please take the [tour](https://stackoverflow.com/tour). Input data is better shared as text, see [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples), to help us help you. Dataframes are best shared as print(df) or df.to_dict() — OCa, Sep 01 '23 at 13:41
Based on your example data, what do you expect the output to be? — Ian Thompson, Sep 01 '23 at 14:36
Good question, actually. I've been screening for duplicates, but I'm not finding it asked and answered before. — OCa, Sep 01 '23 at 15:40
You could omit the financial wording (ISIN, wrt...) as it does not bring value to the coding part, and only obscures your introduction. — OCa, Sep 01 '23 at 15:43
Thank you @IanThompson for providing the input dataframe as text. I've edited my answer to use it instead of a dummy — OCa, Sep 01 '23 at 15:58

OCa · Answer 1 · 2023-09-02T08:50:45.653

Mostly, you just omitted the step in your range(start, stop, step) iterator, use step=2.

Then list comprehensions advantageously encase for loops in such cases:

dfs = [ df.iloc[:,[i,i+1]] for i in range(0, len(df.columns), 2) ]

This will return your requested list of pairwise subsets:

dfs
[  XS0552790049  Unnamed: 5583
 0   2010-10-22          100.0
 1   2010-10-25          100.0
 2   2010-10-26          100.0
 3   2010-10-27          100.0
 4   2010-10-28          100.0,
   XS0628646480  Unnamed: 5585
 0   2011-05-24         99.711
 1   2011-05-25         99.685
 2   2011-05-26        100.125
 3   2011-05-27         99.893
 4   2011-05-30         99.792]

dfs[0]
  XS0552790049  Unnamed: 5583
0   2010-10-22          100.0
1   2010-10-25          100.0
2   2010-10-26          100.0
3   2010-10-27          100.0
4   2010-10-28          100.0

Side notes:

One should refrain from using sub as a variable name, since this is a Python function in the re module.
{} is for instanciating a dictionary, while you seem to require a list.
df.shape[1] may replace len(df.columns), since dataframe dimensions are also given by df.shape as a tuple.

Subsetting data frame into subsamples made of two columns each

1 Answers1