-1

I have a text file that has data in each line and each line has a time stamp. So I read the data to a data frame like this:

table = pd.read_table(file, sep='|', skiprows=[1], usecols = columns, parse_dates = dateColumns, date_parser = parsedate, converters=columnsFormat)

So far, so good.

And my result is a data frame like the example below:

Name Local  Code Date        Value
A1   Here   01   01-01-1990  1.2
A1   Here   01   01-02-1990  0.8
A1   Here   01   01-03-1990  1.6
...
A2   There  02   01-01-1990  1.1
A2   There  02   01-02-1990  0.7
A2   There  02   01-03-1990  1.3
...
An   Where  n    12-31-2013  2.1

The date is in time order, however I have several groups and they don't have the same number of elements.

What I want to do is to group the data frame by Name, Local and Code. So I can have these values as the index and the Date and Value as the columns of the groups.

Something like the example below:

(Index)            Date        Value
(A1   Here   01)   01-01-1990  1.2
                   01-02-1990  0.8
                   01-03-1990  1.6
...
(A2   There  02)   01-01-1990  1.1
                   01-02-1990  0.7
                   01-03-1990  1.3
...
(An   Where  n)    12-31-2013  2.1

But instead of having groups like these, when I execute

table = table.groupby(['Name', 'Local', 'Code'])

I end up having groups like these below. The first group has all data for day 1, the second group all data for day 2, and so on.

Name Local  Code Date        Value
A1   Here   01   01-01-1990  1.2
A2   There  02   01-01-1990  1.1
...
A1   Here   01   01-02-1990  0.8
A2   There  02   01-02-1990  0.7
...
A1   Here   01   01-03-1990  1.6
A2   There  02   01-03-1990  1.3
...
An   Where  n    12-31-2013  2.1

Any ideas on how I may have it grouped the way I explained?

If I use table = table.groupby(['Name', 'Local', 'Code', 'Date']) I have one group like:

Name Local  Code Date        Value
A1   Here   01   01-01-1990  1.2
                 01-02-1990  0.8
                 01-03-1990  1.6
...
A2   There  02   01-01-1990  1.1
                 01-02-1990  0.7
                 01-03-1990  1.3
...
An   Where  n    12-31-2013  2.1

It's almost what I want, however I would have to split it in several groups by Name, Local and Code. Is it possible?

When reading the table, do parse_dates and converters change something in the index?

Hopefully I made myself clear now. Thank you.

Lucas
  • 884
  • 9
  • 20

2 Answers2

1

As a workaround you could set_index and then groupby the index:

In [11]: df1 = df.set_index(['Name', 'Local', 'Code'])

In [12]: g = df1.groupby(df1.index)

In [13]: for i in df1.groupby(df1.index): print i
(('A1', 'Here', 1),
                       Date  Value
Name Local Code                   
A1   Here  1     01-01-1990    1.2
           1     01-02-1990    0.8
           1     01-03-1990    1.6)
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
0

Answer to your last question:

If you iterate through

groups = df.groupby(['name','local','code'])

you should get individual data frames per group, ie:

for g, grp in groups:
    print grp
grasshopper
  • 3,988
  • 3
  • 23
  • 29
  • I think the OP has this, but the issue is that these aren't indexed by [name, local, code] (i.e.it's not respecting as_index at this stage) which is what the OP is asking about. – Andy Hayden Mar 19 '14 at 17:14