2

I have a dateframe named Mj_rank, with date as Datetime and index which looks like this:

                A      B     C ...
date
2016-01-29     False  False  True
2016-01-30     False  False  True
2016-02-01     True   True   True
  ....
2017-12-29     False  True   True

Currently, the data is daily, but I would like to resample the data into a new df that contains every 6 months nth.

Therefore I did:

Mj_rank_s = Mj_rank.resample('6M').asfreq().tail()

which gives me this output:

ValueError: cannot reindex from a duplicate axis

strangely enough, if I use other methods like max() or min() it works fine, but not "asfreq()".

I tried different ways based on existing stackoverflow suggestions like adding in front, but didn't work :

Mj_rank = Mj_rank.reset_index()
Mj_rank['date'] = pd.to_datetime(Mj_rank['date'])
Mj_rank = Mj_rank.set_index('date')

Thanks a lot!

Edit: Thanks to @jezrael he pointed out I had problems with duplicates using Mj_rank[Mj_rank.index.duplicated(keep=False)]

Alexander Thomsen
  • 459
  • 1
  • 4
  • 16
  • It looks like bug/ not implemented for duplicated `date`s + `resample` + `asfreq` :( – jezrael Jan 17 '18 at 10:21
  • So is possible remove dupes dates? – jezrael Jan 17 '18 at 10:22
  • I'm pretty confident there are no duplicated dates, which makes it even more strange. – Alexander Thomsen Jan 17 '18 at 10:29
  • 2
    So `Mj_rank.index.is_unique` return True ? – jezrael Jan 17 '18 at 10:30
  • 2
    And `Mj_rank[Mj_rank.index.duplicated(keep=False)]` mo rows? – jezrael Jan 17 '18 at 10:31
  • Damn! output: False and I see duplicates! – Alexander Thomsen Jan 17 '18 at 10:40
  • Mj_rank[Mj_rank.index.duplicated(keep=False)]: also shows me 2 columns I haven't seen, one called level_0 and one called index – Alexander Thomsen Jan 17 '18 at 10:41
  • Why would you do both `resample` and `asfreq`? Normally you'd do one or the other. `resample` is a deferred action so (like groupby) and needs to be followed by something like max or min -- something that tells resample how to aggregate the data -- not asfreq, which is essentially redundant here. Pandas behavior seems like what I'd expect here, unless I'm missunderstanding. – JohnE Jan 17 '18 at 18:25
  • As I want the rows for every 6month and I therefore can't use max or min, it works by using asfreq() according to panda docs examples. I'm sure there is a better way to achieve the goal though :D. – Alexander Thomsen Jan 17 '18 at 19:45

0 Answers0