-1

I'm working with some large arrays where usually values are repeated. Something similar to this:

data[0] = 10
data[1] = 10
data[2] = 12
data[3] = 12
data[4] = 13
data[5] = 9

Is there any way to get the positions where values do change. I mean, get something similar to this:

data[0] = 10
data[2] = 12
data[4] = 13
data[5] = 9

The goal is somehow compress the array so I can work with smaller arrays. I have been looking at pandas too, but without any success at the moment.

Thank you,

1 Answers1

1

You can use pandas shift and loc to filter out consecutive duplicates.

In [11]:
# construct a numpy array of data
import pandas as pd
import numpy as np
# I've added some more values at the end here
data = np.array([10,10,12,12,13,9,13,12])
data
Out[11]:
array([10, 10, 12, 12, 13,  9, 13, 12])
In [12]:
# construct a pandas dataframe from this
df = pd.DataFrame({'a':data})
df
Out[12]:
    a
0  10
1  10
2  12
3  12
4  13
5   9
6  13
7  12

In [80]:

df.loc[df.a != df.a.shift()]
Out[80]:
    a
0  10
2  12
4  13
5   9
6  13
7  12
In [81]:

data[np.roll(data,1)!=data]
Out[81]:
array([10, 12, 13,  9, 13, 12])
In [82]:

np.where(np.roll(data,1)!=data)
Out[82]:
(array([0, 2, 4, 5, 6, 7], dtype=int64),)
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • +1, I think you have to use `shift(1)` in place of `shitf(-1)` to get OP more expected result. – furas Jul 10 '14 at 13:40