1

I already asked my question but it was not enough accurate in its description. Smart people in this forum already proposed solutions, but I forgot(sorry) to precise that if there were zeros in the relevant columns, they should be kept.

Hello I have a dataframe like below

              2014  2015  2016  2017  2018  2019  

         2014   10    20    30    40    0      5
         2015   0     0    200    0    100     0       
         2016   0     0    200   140    35    10       
         2017   0     0     0     20     0    12       

I need to have a result like this:

    yearStart  yearStart+1  yearStart+2  yearStart+3  yearStart+4  
0      10          20            30          40          0
1      0          200             0          100         0       
2     200         140            35          10          0
3      20          0             12           0          0

The idea is to select in each row, the columns between two dates:

index and index +delta,with delta a parameter (in this example 4) to put them in a dataframe.

With iterrows(), it takes too much time.

I tried with

 df1 = df.apply(lambda x: pd.Series(x[x.keys()>=x.index],1)).fillna(0).astype(int)

but it doesn't work:

TypeError: ('Index(...) must be called with a collection of some kind,
1 was passed', 'occurred at index 2014')

Thank you

phil
  • 57
  • 5
  • Possible duplicate of [panda dataframe: how to copy some columns in others according to a value in the row](https://stackoverflow.com/questions/46283564/panda-dataframe-how-to-copy-some-columns-in-others-according-to-a-value-in-the) – GPhilo Sep 19 '17 at 13:58
  • You should edit your first question clarifying your requirements, not open a new question essentially identical to the first one. – GPhilo Sep 19 '17 at 13:59
  • 1
    sorry but I am a beginner in this forum. I apologize for my clumsyness. Anyway thank you. The fact that some zeros have to be kept gave me reasons to think that it was a different case. – phil Sep 19 '17 at 14:07

1 Answers1

0

One of the ways would be

In [1010]: def yearmove(x):
      ...:     idx = x.index.astype(int)
      ...:     idx = idx - x.name
      ...:     mask = idx >= 0
      ...:     idx = 'yearStart' + idx.astype(str)
      ...:     return pd.Series(x.values[mask], index=idx[mask])
      ...:

In [1011]: df.apply(yearmove, 1).fillna(0).astype(int)
Out[1011]:
      yearStart0  yearStart1  yearStart2  yearStart3  yearStart4  yearStart5
2014          10          20          30          40           0           5
2015           0         200           0         100           0           0
2016         200         140          35          10           0           0
2017          20           0          12           0           0           0
Jarad
  • 17,409
  • 19
  • 95
  • 154
Zero
  • 74,117
  • 18
  • 147
  • 154
  • Thak you. it works perfectly. If I need to have a condition on the time windows duration, i write mask=(idx>=0 and idx<=delta) but it answers ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', 'occurred at index 2015') – phil Sep 19 '17 at 14:25
  • I realize my mistake :'and' instead of '&' but there is a type error between 'int' and 'int64' with the '&' operator...I am tracking it – phil Sep 19 '17 at 14:42