1

Say I have just 2 columns in pandas. Column 1 has all numerical values and column 2 has values only at the every 16th position (so column 2 has value at index 0 followed by 15 NaN and value at index 16 followed by 15 NaNs).

How to create a new row, that contains itself and next 15 values of column 1 (as list [value, value2,....value16]) when column 2 is not null.

Can someone let me know a time efficient solution for the below:

Here is the pandas code to reproduce the sample data

df=pd.DataFrame(zip([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32],
              ['xyz',None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,
                'abc',None,None,None,None,None,None,None,None,None,None,None,None,None,None,None],
              [[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,
               [17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32],None,None,None,None,None,None,None,None,None,None,None,None,None,None,None]), columns= ['A','B','C']) 

enter image description here

tjt
  • 620
  • 2
  • 7
  • 17
  • 1
    could you include your data as text so that we could copy-paste it in our environment? –  May 07 '22 at 05:21
  • @enke thanks. when I am trying to paste from excel, it is giving picture as the only option. Can you please tell me how to paste as text. The above table is made in excel and I am doing a copy paste form it – tjt May 07 '22 at 05:23
  • 1
    @tjt see [how to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – mozway May 07 '22 at 05:24

1 Answers1

1

Use a boolean mask:

m = df['column 2'].notna()
df.loc[m, 'column 3'] = df.groupby(m.cumsum())['column 1'].agg(list).values
print(df)

# Output
    column 1 column 2                                           column 3
0          1      xyz  [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...
1          2      NaN                                                NaN
2          3      NaN                                                NaN
3          4      NaN                                                NaN
4          5      NaN                                                NaN
5          6      NaN                                                NaN
6          7      NaN                                                NaN
7          8      NaN                                                NaN
8          9      NaN                                                NaN
9         10      NaN                                                NaN
10        11      NaN                                                NaN
11        12      NaN                                                NaN
12        13      NaN                                                NaN
13        14      NaN                                                NaN
14        15      NaN                                                NaN
15        16      NaN                                                NaN
16        17      abc  [17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 2...
17        18      NaN                                                NaN
18        19      NaN                                                NaN
19        20      NaN                                                NaN
20        21      NaN                                                NaN
21        22      NaN                                                NaN
22        23      NaN                                                NaN
23        24      NaN                                                NaN
24        25      NaN                                                NaN
25        26      NaN                                                NaN
26        27      NaN                                                NaN
27        28      NaN                                                NaN
28        29      NaN                                                NaN
29        30      NaN                                                NaN
30        31      NaN                                                NaN
31        32      NaN                                                NaN
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • 1
    `m` is a boolean series. You can interpret True as 1 and False as 0. `cumsum` computes the cumulative sum. From index 0 to 15, the cumsum is 1 and from index 16 to 31 the cumsum is 2. You have 2 distinct groups to use with `groupby`. – Corralien May 07 '22 at 05:41
  • got it. Thanks, I assume the LHS ((df.loc....) has only the rows where m is true and the RHS has the exact the same number of groupby values and hence it works. right ? so it works not by matching the index of LHS and RHS but by having the same number of rows on both sizes that aligns with our logic. Also why is that it doesn't work, without the '.values' at the end – tjt May 07 '22 at 05:45
  • 1
    It doesn't work without `.values` because index are not aligned. – Corralien May 07 '22 at 13:03