0

I have a DataFrame containing 8 columns. The time ("Tidspunkt"), flow ("Streamflow"), precipitation ("precip_mean_mm"), the beginning and end of the event for the flow and the precipitation ("begin_rain_checked", "end_rain_checked", "begin_flow_checked", and "end_flow_checked"), and a number assigned on each event ("event_number").

What I want is to extract all the rows for the months October, November, December, January, February, March, and April within all the years of the DataFrame. But I only want full events. So if I have an event in October which started in Septemer I don't want to include it. And if I have an event which started in April and runs into May I do want to include it.

So the the code should be able to both take the months in "Tidspunkt" and the "event_number" into consideration, when extracted the data.

I tried to mask it but it did not work, it gave me the same DataFrame but with an extra column containing the number of the month.

# Get the month values for each row
Rain_Stream['month'] = Rain_Stream['Tidspunkt'].dt.month

# Create a mask for the period of interest (October to April)
mask = ((Rain_Stream['month'] >= 10) | (Rain_Stream['month'] <= 4))

# Create a mask for the start and end of each event
start_mask = Rain_Stream['event_number'].diff() > 0
end_mask = Rain_Stream['event_number'].diff(periods=-1) < 0

# Combine the masks to get the final mask
final_mask = ((start_mask & mask) | (end_mask & mask) | mask)

# Extract the rows that satisfy the final mask
extracted_rs = Rain_Stream[final_mask].reset_index(drop=True)
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
Camilla T
  • 1
  • 1
  • Please add a [MRE](https://stackoverflow.com/help/minimal-reproducible-example) (also look [here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)) that replicates your problem. – Timus May 04 '23 at 12:32

1 Answers1

0

I'm not really sure about start_mask and end_mask but at least

final_mask = ((start_mask & mask) | (end_mask & mask))

your final_mask should be like this if you add OR with mask the other 2 conditions are worthless.