The bug has been found: The code snippets posted as solutions below work. The problem regarding my results was rooted in the data source (FEC.GOV). I have found it and am now moving on. Thanks a bunch for all of the time, patience, help, etc. from the community regarding this issue!
Since solutions have been suggested that work on the snippets found on the github site I am providing the following link to the original files (http://fec.gov/finance/disclosure/ftpdet.shtml#a2011_2012). I am using years 2008 to 2014, Data File: pas212.zip, Data Name: (Contributions to Candidates (and other expenditures) from Committees). As well, as the code below can be found at [https://github.com/Michae108/python-coding.git]. Thank you in advance for any help in resolving this issue. I've been working for three days on what should be a very simple task. I import and concatenate 4 "|" separated value files. Read as pd.df; set date column to date.time. This gives me the following output:
cmte_id trans_typ entity_typ state amount fec_id cand_id
date
2007-08-15 C00112250 24K ORG DC 2000 C00431569 P00003392
2007-09-26 C00119040 24K CCM FL 1000 C00367680 H2FL05127
2007-09-26 C00119040 24K CCM MD 1000 C00140715 H2MD05155
2007-07-20 C00346296 24K CCM CA 1000 C00434571 H8CA37137
Secondly, I want to be able to group the index by one month frequency. Then I want to sum the [amount] according to [trans_typ] and [cand_id].
Here is my code for doing that:
import numpy as np
import pandas as pd
import glob
df = pd.concat((pd.read_csv(f, sep='|', header=None, low_memory=False, \
names=['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', \
'12', '13', 'date', '15', '16', '17', '18', '19', '20', \
'21', '22'], index_col=None, dtype={'date':str}) for f in \
glob.glob('/home/jayaramdas/anaconda3/Thesis/FEC_data/itpas2_data/itpas2**.txt')))
df.dropna(subset=['17'], inplace=True)
df.dropna(subset=['date'], inplace=True)
df['date'] = pd.to_datetime(df['date'], format='%m%d%Y')
df1 = df.set_index('date')
df2 = df1[['1', '6', '7', '10', '15', '16', '17']].copy()
df2.columns = ['cmte_id', 'trans_typ', 'entity_typ', 'state', 'amount',\
'fec_id','cand_id']
df2['amount'] = df2['amount'].astype(float)
grouper = df2.groupby([pd.TimeGrouper('1M'), 'cand_id', 'trans_typ'])
df = grouper['amount'].sum()
grouper['amount'].sum().unstack().fillna(0)
#print (df.head())
Here is my output from running the code:
trans_typ 24A 24C 24E 24F 24K 24N 24R 24Z
date cand_id
1954-07-31 S8AK00090 0 0 0 0 1000 0 0 0
1985-09-30 H8OH18088 0 0 36 0 0 0 0 0
1997-04-30 S6ND00058 0 0 0 0 1000 0 0 0
As you can see, the date column gets messed up after I run the group by. I am certain that my dates do not go further back then 2007. I have tried to do this simple task of grouping by 1 month periods, and then summing [amount] by [trans_typ] and [cand_id]. It seems that it should be simple but I have found no solution. I have read many questions on Stackoverflow, and have tried different techniques to solve the problem. Does anyone have an idea on this?
Here is a sample of my raw data if it helps:
C00409409|N|Q2|P|29992447808|24K|CCM|PERRIELLO FOR CONGRESS|IVY|VA|22945|||06262009|500|C00438788|H8VA05106|D310246|424490|||4072320091116608455
C00409409|N|Q2|P|29992447807|24K|CCM|JOHN BOCCIERI FOR CONGRESS|ALLIANCE|OH|44601|||06262009|500|C00435065|H8OH16058|D310244|424490|||4072320091116608452
C00409409|N|Q2|P|29992447807|24K|CCM|MIKE MCMAHON FOR CONGRESS|STATEN ISLAND|NY|10301|||06262009|500|C00451138|H8NY13077|D310245|424490|||4072320091116608453
C00409409|N|Q2|P|29992447808|24K|CCM|MINNICK FOR CONGRESS|BOISE|ID|83701|||06262009|500|C00441105|H8ID01090|D310243|424490|||4072320091116608454
C00409409|N|Q2|P|29992447807|24K|CCM|ADLER FOR CONGRESS|MARLTON|NJ|08053|||06262009|500|C00439067|H8NJ03156|D310247|424490|||4072320091116608451
C00435164|N|Q2|P|29992448007|24K|CCM|ALEXI FOR ILLINOIS EXPLORATORY COMMITTEE||||||06292009|1500|C00459586|S0IL00204|SB21.4124|424495|||4071620091116385529