1

I would like to transform a list of dates in the following format:

01-02-12
01-03-12
01-27-12
02-01-12
02-23-12
   .
   .
   .
01-03-13 
02-02-13

as

1
1
1
2
2
.   
.
.
13 
14

ie: index each date by month, with respect to year also.

I am not sure how to do this and can't find a similar problem, so advice would be appreciated. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Edit: In response to @Psidom. Just an example dataset with made up numbers. In the actual dataset I'm dealing with I've transformed the dates to datetime objects.

dat = pd.read_csv('matchdata-update.csv',encoding = "ISO-8859-1")   
dat['Date']=pd.to_datetime(dat['Date'],format='%m-%d-%y% I:%M%p'). 

Ideally I would like it to count a month , even if it was not observed. End goal is to index each month and to count number of rows in that insex, so if no month was observed then the number of rows counted for that index would just be 0.

Luke
  • 85
  • 4
  • 1
    So year always starts from 12? And what if there are months missing for a year, how would you count the next year, still starts from 13 or something smaller? Also is this a list of string or datetime objects? – Psidom Jan 11 '17 at 02:27
  • http://stackoverflow.com/questions/4039879/best-way-to-find-the-months-between-two-dates – DaveQ Jan 11 '17 at 02:34

2 Answers2

1

If you want to count the number of rows for every month, this should work:

dat.set_index("Date").resample("M").size()
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • 1
    Excellent work. I promise to start returning the favour once I'm more up to speed. – Luke Jan 11 '17 at 02:53
  • Good to hear. But you don't have to feel obliged, a good question can benefit others who have the same problem. – Psidom Jan 11 '17 at 03:00
1

Here is a different answer using the data as given and producing the answer requested, including 0s for missing monthes.

dates = '''\
01-02-12
01-03-12
01-27-12
02-01-12
02-23-12
01-03-13 
02-02-13
'''.splitlines()

def monthnum(date, baseyear):
    "Convert date as 'mm-dd-yy' to month number starting with baseyear xx."
    m,d,y = map(int, date.split('-'))
    return m + 12 * (y-baseyear)

print(monthnum(dates[0], 12) == 1, monthnum(dates[-1], 12) == 14)

def monthnums(dates, baseyear):
    "Yield month numbers of 'mm-dd-yy' starting with baseyear."
    for date in dates:
        m,d,y = map(int, date.split('-'))
        yield m + 12 * (y-baseyear)

print(list(monthnums(dates, 12)) == [1,1,1,2,2,13,14])   

def num_per_month(mnums):
    prev, n = 1, 0
    for k in mnums:
        if k == prev:
            n += 1
        else:
            yield prev, n
            for i in range(prev+1, k):
                yield i, 0
            prev, n = k, 1
    yield prev, n

for m, n in num_per_month(monthnums(dates, 12)):
    print(m, n)

prints

True True
True
1 3
2 2
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 1
14 1
Terry Jan Reedy
  • 18,414
  • 3
  • 40
  • 52