1

I have a DF indexed by time and one of its columns (with 2 variables) is like [x,x,y,y,x,x,x,y,y,y,y,x]. I want to slice this DF so Ill get this column without same consecutive variables- in this example :[x,y,x,y,x] and every variable was the first in his subsequence.

Still trying to figure it out...

Thanks!!

Benus13
  • 81
  • 7

2 Answers2

2

Assuming you have df like below

df=pd.DataFrame(['x','x','y','y','x','x','x','y','y','y','y','x'])

We using shift to find the next is equal to the current or not

df[df[0].shift()!=df[0]]
Out[142]: 
    0
0   x
2   y
4   x
7   y
11  x
BENY
  • 317,841
  • 20
  • 164
  • 234
0

You jsut try to loop through and safe the last element used

df=pd.DataFrame(['x','x','y','y','x','x','x','y','y','y','y','x'])
df2=pd.DataFrame()

old = df[0].iloc[0] # get the first element
for column in df:
    df[column].iloc[0] != old:
        df2.append(df[column].iloc[0])
        old = df[column].iloc[0]

EDIT:

Or for a vector use a list

>>> L=[1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> from itertools import groupby
>>> [x[0] for x in groupby(L)]
[1, 2, 3, 4, 5, 1, 2]
Cornelis
  • 1,065
  • 8
  • 23
  • Please don't use loops with DataFrames, that's like mixing vegetable oil in gasoline. – cs95 May 14 '18 at 00:12
  • @COLDSPEED I mean if you want to use dataframes for what can be done by a list then that is on you. Added a simple solution using lists. – Cornelis May 14 '18 at 00:49