0

I wanted to know if there is a more concise way to write the code below. I basically have a dataframe with three columns: "Dates", "Ramp_1", "Ramp_2". My goal is to separate the data (Ramp_1 and Ramp_2) by month. The dates contain multiple years of data in this form: %Y-%m-%d %H:%M. I created a new column with only the months and then I used df.loc to create new dataframes:

mon=['jan','feb','mar','apr', 'may', 'jun','jul','aug','sep','oct','nov','dec']
for i in range(len(mon)):
    mon[i] = df.loc[df["month"]==(i+1), ["Ramp_1","Ramp_2"]]

I wanted to create 12 new dataframes with each one named after a month. Instead of doing that, I ended up creating a list of dataframes. So I manually wrote the code like this:

jan=mon[0]
feb=mon[1]
mar=mon[2]
apr=mon[3]
may=mon[4]
jun=mon[5]
jul=mon[6]
aug=mon[7]
sep=mon[8]
octo=mon[9]
nov=mon[10]
dec=mon[11]

My question is: is there a more concise way to write this? I know there is, but i haven't been able to figure it out!

I also tried doing this:

mon=['jan','feb','mar','apr', 'may', 'jun','jul','aug','sep','oct','nov','dec']
for i in range(len(mon)):
    name = mon[i]
    name = df.loc[df["month"]==(i+1), ["Ramp_1","Ramp_2"]]

sample of my data:

Dates  Ramp_1   Ramp_2  month
0     2016-01-01 02:00:00  -823.0  -566.47      1
1     2016-01-01 03:00:00  -899.0  -586.54      1
2     2016-01-01 04:00:00  -652.0  -473.33      1
3     2016-01-01 05:00:00  -304.0  -178.20      1
4     2016-01-01 06:00:00    99.0   273.08      1
...                   ...     ...      ...    ...
35045 2019-12-31 11:00:00  -613.0  -793.54     12
35046 2019-12-31 12:00:00  -311.0 -1159.81     12
35047 2019-12-31 13:00:00  -530.0  -964.18     12
35048 2019-12-31 14:00:00    79.0   538.85     12
35049 2019-12-31 15:00:00   181.0   574.21     12
FuturePhD
  • 15
  • 5
  • Can you provide a sample dataset as comma separated text? – PSK Jan 28 '21 at 23:42
  • Yes! One moment! – FuturePhD Jan 28 '21 at 23:46
  • Dates Ramp_1 Ramp_2 month 0 2016-01-01 02:00:00 -823.0 -566.47 1 1 2016-01-01 03:00:00 -899.0 -586.54 1 2 2016-01-01 04:00:00 -652.0 -473.33 1 3 2016-01-01 05:00:00 -304.0 -178.20 1 4 2016-01-01 06:00:00 99.0 273.08 1 ... ... ... ... ... 35045 2019-12-31 11:00:00 -613.0 -793.54 12 35046 2019-12-31 12:00:00 -311.0 -1159.81 12 35047 2019-12-31 13:00:00 -530.0 -964.18 12 35048 2019-12-31 14:00:00 79.0 538.85 12 35049 2019-12-31 15:00:00 – FuturePhD Jan 28 '21 at 23:48
  • Please consider posting you data like [this](https://meta.stackexchange.com/questions/223766/how-do-i-post-an-example-of-a-csv-to-stack-overflow). – PSK Jan 28 '21 at 23:57
  • is that better? – FuturePhD Jan 29 '21 at 00:07
  • My guess is that you don't need this loop at all, and `dataframe.groupby(...).apply(...)` will get you what you're ultimately after – Paul H Jan 29 '21 at 00:11

2 Answers2

0
import pandas as pd

# Use your data instead of this sample df
df = pd.DataFrame({'Dates':['2016-01-01 02:00:00','2016-01-01 03:00:00','2016-01-01 04:00:00',
                            '2016-01-01 05:00:00','2016-01-01 06:00:00'],
                   'Ramp_1':[-823.0,-899.0,-652.0,-304.0,99.0],
                   'Ramp_2':[-566.47,-586.54,-473.33,-178.20,273.08]})

df['Dates'] = pd.to_datetime(df['Dates'], format="%Y-%m-%d %H:%M")

# This is a list with the 12 DataFrames that you want
dfs_by_month = [df[df['Dates'].apply(lambda x: x.month)==month] for month in range(1,13)]
PSK
  • 347
  • 2
  • 13
  • Hey PSK, thanks for your comment. This leads to a similar result to what I wrote initially with a list of dataframes. I wonder if there is a way to more concisely write the second step into the code, instead of typing jan = dfs_by_month[0], feb= dfs_by_month[1],... – FuturePhD Feb 04 '21 at 22:57
  • Can you please explain why you want to name your ```DataFrames``` after a month? A dictionary might be a good option. You could create a class and use ```setattr()``` like mentioned [here](https://stackoverflow.com/questions/9561174/using-setattr-in-python). You could also use ```exec()``` but that's not recommended as mentioned [here](https://stackoverflow.com/questions/9672791/how-to-safely-use-exec-in-python#:~:text=The%20only%20safe%20way%20to,to%20drive%20your%20code%20execution.&text=If%20you're%20using%20python,and%20you%20should%20use%20those.). Really depends on why you're doing this. – PSK Feb 05 '21 at 00:25
-1

If you want to store the data frames to call later would you be happy with a dictionary?

months = ['jan', 'feb', 'mar', 'apr', 'may', 'jun',
          'jul', 'aug', 'sep', 'oct', 'nov', 'dec']

data = {m: df.loc[df['month'] == n, ['Ramp_1', 'Ramp_2']]
        for n, m in enumerate(months, 1)}
Timothy Helton
  • 332
  • 4
  • 10
  • Hi Timothy, thanks for your reply. I haven't really tried using dictionaries so this was pretty neat to see. This actually also leads to a similar result to what I wrote. I wonder if there is a way to more concisely write the second step into the code: jan = data["jan"], feb = data["feb"], ... – FuturePhD Feb 04 '21 at 22:58