2

I am trying to make a dataframe so that I can send it to a CSV easily, otherwise I have to do this process manually..

I'd like this to be my final output. Each person has a month and year combo that starts at 1/1/2014 and goes to 12/1/2016:

      Name    date
0     ben     1/1/2014
1     ben     2/1/2014
2     ben     3/1/2014
3     ben     4/1/2014
....

12    dan     1/1/2014
13    dan     2/1/2014
14    dan     3/1/2014

code so far:

import pandas as pd

days = [1]
months = list(range(1, 13))
years = ['2014', '2015', '2016']
listof_people = ['ben','dan','nathan', 'gary', 'Mark', 'Sean', 'Tim', 'Chris']

df = pd.DataFrame({"Name": listof_people})
for month in months:
    df.append({'date': month}, ignore_index=True)
print(df)

When I try looping to create the dataframe it either does not work, I get index errors (because of the non-matching lists) and I'm at a loss.

I've done a good bit of searching and have found some following links that are similar, but I can't reverse engineer the work to fit my case.

Filling empty python dataframe using loops

How to build and fill pandas dataframe from for loop?

I don't want anyone to feel like they are "doing my homework", so if i'm derping on something simple please let me know.

Community
  • 1
  • 1
MattR
  • 4,887
  • 9
  • 40
  • 67
  • 1
    `append` is not an inplace process, so you need to reassign: `df = df.append({'date': month}, ignore_index=True)`. – root Jan 17 '17 at 18:57
  • @root thank you! this get's me closer, but still not where i need to be. with reassigning, the months come in after the last list name (being Chris). Adding this `for index, row in df.iterrows():` before the `month` loop is helping, but how do i do this for each person? – MattR Jan 17 '17 at 18:59

2 Answers2

3

I think you can use product for all combination with to_datetime for column date:

from  itertools import product

days = [1]
months = list(range(1, 13))
years = ['2014', '2015', '2016']
listof_people = ['ben','dan','nathan', 'gary', 'Mark', 'Sean', 'Tim', 'Chris']

df1 = pd.DataFrame(list(product(listof_people, months, days, years)))
df1.columns = ['Name', 'month','day','year']
print (df1)
      Name  month  day  year
0      ben      1    1  2014
1      ben      1    1  2015
2      ben      1    1  2016
3      ben      2    1  2014
4      ben      2    1  2015
5      ben      2    1  2016
6      ben      3    1  2014
7      ben      3    1  2015
8      ben      3    1  2016
9      ben      4    1  2014
10     ben      4    1  2015
...
...
df1['date'] = pd.to_datetime(df1[['month','day','year']])
df1 = df1[['Name','date']]
print (df1)
      Name       date
0      ben 2014-01-01
1      ben 2015-01-01
2      ben 2016-01-01
3      ben 2014-02-01
4      ben 2015-02-01
5      ben 2016-02-01
6      ben 2014-03-01
7      ben 2015-03-01
...
...
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This helps very much, thank you! Know of any resources that outline how you came to this answer (even documentation). I understand the solution, but i'm sure that in that resource it will save me from answering other questions like this one. – MattR Jan 17 '17 at 21:50
2
mux = pd.MultiIndex.from_product(
    [listof_people, years, months],
    names=['Name', 'Year', 'Month'])

pd.Series(
    1, mux, name='Day'
).reset_index().assign(
    date=pd.to_datetime(df[['Year', 'Month', 'Day']])
)[['Name', 'date']]

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624