2

I already looked on SE and couldn't find an answer to my question. I am still new to this.

I am trying to take a purchasing csv file and break it into separate dataframes for each year.

For example, if I have a listing with full dates in MM/DD/YYYY format, I am trying to separate them into dataframes for each year. Like Ord2015, Ord2014, etc...

I tried to covert the full date into just the year, and also attempted to use slicing to only look at the last four of the date to no avail.

Here is my current (incomplete) attempt:

import pandas as pd
import csv
import numpy as np
import datetime as dt
import re
    

purch1 = pd.read_csv('purchases.csv')

#Remove unneeded fluff
del_colmn = ['pid', 'notes', 'warehouse_id', 'env_notes', 'budget_notes']
purch1 = purch1.drop(del_colmn, 1)

#break down by year only
purch1.sort_values(by=['order_date'])

Ord2015 = ()
Ord2014 = ()

for purch in purch1:
    Order2015.add(purch1['order_date'] == 2015)

Per req by @anon01... here are the results of the code you had me run. I only used a sample of four as that was all I was initially playing with... The record has almost 20k lines, so I only pulled aside a few to play with.

'{"pid":{"0":75,"2":95,"3":117,"1":82},"env_id":{"0":12454,"2":12532,"3":12623,"1":12511},"ord_date":{"0":"10\/2\/2014","2":"11\/22\/2014","3":"2\/17\/2015","1":"11\/8\/2014"},"cost_center":{"0":"Ops","2":"Cons","3":"Net","1":"Net"},"dept":{"0":"Ops","2":"Cons","3":"Ops","1":"Ops"},"signing_mgr":{"0":"M. Dodd","2":"L. Price","3":"M. Dodd","1":"M. Dodd"},"check_num":{"0":null,"2":null,"3":null,"1":82301.0},"rec_date":{"0":"10\/11\/2014","2":"12\/2\/2014","3":"3\/1\/2015","1":"11\/20\/2014"},"model":{"0":null,"2":null,"3":null,"1":null},"notes":{"0":"Shipped to east WH","2":"Rec'd by L.Price","3":"Shipped to Client (1190)","1":"Rec'd by K. Wilson"},"env_notes":{"0":"appr by K.Polt","2":"appr by S. Crane","3":"appr by K.Polt","1":"appr by K.Polt"},"budget_notes":{"0":null,"2":"OOB expense","3":"Bill to client","1":null},"cost_year":{"0":2014.0,"2":2015.0,"3":null,"1":2014.0}}'

  • what exactly do you mean "break it into separate dataframes for each year" - create one df for each date range? Typically you would leave this as a single `df` and use `groupby` or a filter operation for whatever you're trying to achieve. What is the ultimate goal? – anon01 Sep 29 '20 at 05:25
  • also: please include a sample of the dataframe; you can paste the contents of `df.sample(n=20).to_json()` for example – anon01 Sep 29 '20 at 05:28
  • As you may be pointing out, I might be overthinking it, so suggestions would be welcome. Yes... I want to create a repository where I can compare expenses for each year, then compare them to each other. To me is seemed that the best way to do this would be to take the main DF (purch1) and break it into multiple DFs based on the year (Ord2014, Ord2015, etc...). I suspect I am taking too many steps for something that will probably be much easier than I am making it out to be. – Samuel Wright Sep 29 '20 at 05:31
  • yes. Post a sample of your `df` and an example of a concrete (very explicit) calculation you want to compute and someone can surely help. – anon01 Sep 29 '20 at 05:33
  • @anon01, I added the info and an explanation. Heading to bed now... check in in the morning. Thank for the assist, in advance. – Samuel Wright Sep 29 '20 at 06:16

1 Answers1

1

You can add parse_dates to read_csv for convert column to datetimes and then create dictionary of DataFrames dfs, for selecting is used keys:

purch1 = pd.read_csv('purchases.csv', parse_dates=['ord_date'])

dfs = dict(tuple(purch1.groupby(df['ord_date'].dt.year)))
Ord2015 = dfs[2015]
Ord2016 = dfs[2016]

It is not recommended, but possible create DataFrames by years groups:

for i, g in df.groupby(purch1['ord_date'].dt.year):
    globals()['Ord' + str(i)] =  g
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • For knowledge purposes only, why is it not recommended to create the separate dataframes? I ask so that I will understand better going forward. – Samuel Wright Sep 29 '20 at 13:43
  • @SamuelWright - I only say is not recommended create variables from strings like `globals()['Ord' + str(i)]`, betetr is create dictioanry – jezrael Sep 29 '20 at 13:54
  • Thanks for that. Looking at your initial explanation of that and running it around in my head, I see where it could be problematic. I understand now. Thanks for that. I will try your suggested solution as soon as I have some bandwidth in my day and update this post. Appreciate the assist. – Samuel Wright Sep 29 '20 at 14:28
  • 1
    Just wanted to update you. That is what I needed and it worked fine using the parse_dates process. Thank you again for the help with this. Something new learned. :-) – Samuel Wright Sep 29 '20 at 16:42