Pandas .asfreq is giving a repeated index error

Question

I'm trying to make some regular time series with ffill with Pandas, but I'm getting a non-unique index error.

Here's my code:

for d in data_types:
    series = df[df['datatype'] == d]['measurementvalue'].values
    times = df[df['datatype'] == d]['displaydate'].values
    data_series = pd.Series(series, index = times)
    data_series.drop_duplicates(inplace = True)
    data_series.asfreq('30Min', method = 'ffill')
    all_series.append(data_series)

I'm getting the following error as a result of the asfreq call for one particular data_type:

ValueError: cannot reindex a non-unique index with a method or limit

This is for a set of data where drop_duplicates results in a length drop from 2119 to 1299, suggesting it's the densest (time wise) value.

==========

EDIT

I did some poking around and have narrowed down the issue by taking time lags to the nearest second in the indices, I can see the 'duplicate' indices that are created when two rows fall into the same second. My guess is that these are the offending rows...

2016-03-02 04:03:29.693    8.250347
2016-03-02 04:03:29.693    7.478983
2016-03-06 00:19:30.183    45.97248
2016-03-06 00:19:30.183    24.06088
2016-03-14 02:44:58.783    9.169300
2016-03-14 02:44:58.783    4.221998
2016-03-18 21:54:20.097    73.80586
2016-03-24 16:41:19.825    3.608202
2016-03-24 16:41:19.825    3.887996
2016-03-25 03:35:57.197    4.974968
2016-03-25 03:35:57.197    5.638140
2016-04-02 11:18:27.290    7.923712
2016-04-02 11:18:27.290    6.143240
2016-04-10 19:59:54.677     3.143636
2016-04-10 19:59:54.686    14.222390

What's the best way to drop a value? Let's say I want to write a custom method that sends me all the duplicate values for a given index value and sends back the single values that should be used for that index value. How can I do that?

It means you have duplicated indexes in your data-frame. To see it: `df[df.index.duplicated()]` also take a look at http://stackoverflow.com/questions/27236275/what-does-valueerror-cannot-reindex-from-a-duplicate-axis-mean and http://stackoverflow.com/questions/27711623/valueerror-cannot-reindex-from-a-duplicate-axis — michael_j_ward, Jun 07 '16 at 22:15
also, whenever posting, its best to include sample data that makes your question "[Minimum, Complete, and Reproducible](http://stackoverflow.com/help/mcve)" — michael_j_ward, Jun 07 '16 at 22:16
@michael_j_ward thanks for your suggestion. I did not know about that method call. It's handy to know for the future. Unfortunately it does not help me because the original indices are not duplicated..they only wind up being duplicated when I am imposing a frequency. I'm adding more data now. — helloB, Jun 08 '16 at 15:07
http://stackoverflow.com/questions/13035764/remove-rows-with-duplicate-indices-pandas-dataframe-and-timeseries — michael_j_ward, Jun 08 '16 at 15:52

score 0 · Answer 1 · answered Jun 07 '16 at 22:22

0

Try something like this, But since you havent included any data this is just a starter.

for d in data_types:
        rawDf       = df[df['datatype'] == d]
        data_series = rawDf[['measurementvalue','displaydate']]
        data_series.set_index('displaydate',drop=False, inplace = True)
        data_series.drop_duplicates(inplace = True)
        data_series.asfreq('30Min', method = 'ffill')
        all_series.append(data_series)

answered Jun 07 '16 at 22:22

Merlin

24,552
41
131
206

1

thanks for your suggestion, but that still produces the error. I think I know the source of the error now although I don't know the fix. I am editing my question now to include more information about the data. – helloB Jun 08 '16 at 15:06
correct, so how do I write the logic to remove values and what if I want to combine their values? – helloB Jun 08 '16 at 15:36
let's say I want to keep the max. – helloB Jun 08 '16 at 15:45
df.groupby('displaydate')['measurementvalue'].max() – Merlin Jun 08 '16 at 15:55

score 0 · Answer 2 · edited May 23 '17 at 12:22

0

If you want to keep the maximum for each date-time. First make date_time a column and use

df.groupby('date_time').max()

If you want to always keep the first or last entry, look at this answer:

edited May 23 '17 at 12:22

Community

1
1

answered Jun 08 '16 at 15:56

michael_j_ward

4,369
1
24
25

Pandas .asfreq is giving a repeated index error

EDIT

2 Answers2