2

I have a small Pandas DataFrame I'd like to resample, and I hoped you could help me :)

I cannot show it to you as it is confidential but I can describe to you a simpler version of it.

It has 4 columns:

-Date (YYYY-MM-DD format)

-Country

-Amount

-Frequency

So here is what I'd like to do:

Depending on the rows, the Frequency is either YEARLY or MONTHLY If it happens to be monthly, then nothing needs to be done.

But if it's YEARLY, I would like to change it to MONTHLY and insert 11 new rows so that in the 12 rows (the existing one and the 11 new ones) considered, the country would remain the same (so same country name repeted 12 times), the amount would be amount/12 (repeted 12 times too, you get it), and the date would be incremented by 1 month at each row.

I really wish someone could help me with that!

Thanks in advance,

Alex

EDIT:

I already change the Date Column as an index using

df.set_index("Date",inplace=True)

However, I don't really know how to use the resample function for my purpose...

Alex
  • 87
  • 1
  • 1
  • 7

2 Answers2

0

I think you missed the same step I did. You need to do a line something like this:

df.index = pd.to_datetime(df.index)

Then you should be able to just do:

df.resample('M').sum() #'Y' for Yearly

and you should be good.

I couldn't figure out why the method wouldn't work either then I read through the very long doc to figure out my data frame just had unconverted dates. It took me way to long and I hope to save people the time.

qinntt
  • 1
  • 2
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 24 '23 at 06:11
-1

First ensure that your dataframe has an index of type DateTimeIndex.

Then use the resample function to either upsample (higher frequency) or downsample (lower frequency) your dataframe. Then apply an aggregator (e.g. sum) to aggregate the values across the new sampling frequency.

See the resample documentation here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html

  • OK, now that you have your "Date" column as the index, use `df.index = pd.to_datetime(df.index)` to convert it to a `DateTimeIndex` and then use `resample` to adjust the sampling to the given interval. – Riley Steele Parsons Nov 12 '18 at 21:29
  • Yes but how/where do I write all the conditions I've mentioned above? – Alex Nov 13 '18 at 07:51
  • Because you need to resample based on another column’s value, it might be best to just create a new dataframe based on a mask for frequency and resample if necessary. Then, concatenate the resulting arrays once they’ve been properly sampled. – Riley Steele Parsons Nov 13 '18 at 15:51
  • I couldn't agree more with you, but that's precisely what I don't know how to do !! – Alex Nov 13 '18 at 17:47
  • For information on masking I'd point you to this question: https://stackoverflow.com/questions/38802675/create-bool-mask-from-filter-results-in-pandas. To concatenate the resulting dataframes from the masks, check out the `concat` documentation (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html). If you're still having trouble, open a new question for this. Please accept this answer if I've resolved this particular resampling question. – Riley Steele Parsons Nov 13 '18 at 17:57