Python pandas groupby method not working properly

Question

I have a text file that has data in each line and each line has a time stamp. So I read the data to a data frame like this:

table = pd.read_table(file, sep='|', skiprows=[1], usecols = columns, parse_dates = dateColumns, date_parser = parsedate, converters=columnsFormat)

So far, so good.

And my result is a data frame like the example below:

Name Local  Code Date        Value
A1   Here   01   01-01-1990  1.2
A1   Here   01   01-02-1990  0.8
A1   Here   01   01-03-1990  1.6
...
A2   There  02   01-01-1990  1.1
A2   There  02   01-02-1990  0.7
A2   There  02   01-03-1990  1.3
...
An   Where  n    12-31-2013  2.1

The date is in time order, however I have several groups and they don't have the same number of elements.

What I want to do is to group the data frame by Name, Local and Code. So I can have these values as the index and the Date and Value as the columns of the groups.

Something like the example below:

(Index)            Date        Value
(A1   Here   01)   01-01-1990  1.2
                   01-02-1990  0.8
                   01-03-1990  1.6
...
(A2   There  02)   01-01-1990  1.1
                   01-02-1990  0.7
                   01-03-1990  1.3
...
(An   Where  n)    12-31-2013  2.1

But instead of having groups like these, when I execute

table = table.groupby(['Name', 'Local', 'Code'])

I end up having groups like these below. The first group has all data for day 1, the second group all data for day 2, and so on.

Name Local  Code Date        Value
A1   Here   01   01-01-1990  1.2
A2   There  02   01-01-1990  1.1
...
A1   Here   01   01-02-1990  0.8
A2   There  02   01-02-1990  0.7
...
A1   Here   01   01-03-1990  1.6
A2   There  02   01-03-1990  1.3
...
An   Where  n    12-31-2013  2.1

Any ideas on how I may have it grouped the way I explained?

If I use table = table.groupby(['Name', 'Local', 'Code', 'Date']) I have one group like:

Name Local  Code Date        Value
A1   Here   01   01-01-1990  1.2
                 01-02-1990  0.8
                 01-03-1990  1.6
...
A2   There  02   01-01-1990  1.1
                 01-02-1990  0.7
                 01-03-1990  1.3
...
An   Where  n    12-31-2013  2.1

It's almost what I want, however I would have to split it in several groups by Name, Local and Code. Is it possible?

When reading the table, do parse_dates and converters change something in the index?

Hopefully I made myself clear now. Thank you.

possible duplicate of http://stackoverflow.com/questions/17027470/pandas-groupby-and-multiindex — grasshopper, Mar 19 '14 at 17:06
One data frame to each 'Name Local Code' with two columns: date and value. — Lucas, Mar 19 '14 at 17:08
I have a sequential .txt file with several time series. Each 'Name Local Code' group is a different time series. I want to split in one group to each time series, so I can process them. — Lucas, Mar 19 '14 at 17:17
when you say "process" that's the part that we're interested in... — Andy Hayden, Mar 19 '14 at 17:26
I will use them with some packages from CRAN - Multivariate Time Series (http://cran.r-project.org/web/views/TimeSeries.html). I don't see how that is relevant to my question. — Lucas, Mar 19 '14 at 17:29

score 1 · Answer 1 · answered Mar 19 '14 at 17:41

1

As a workaround you could set_index and then groupby the index:

In [11]: df1 = df.set_index(['Name', 'Local', 'Code'])

In [12]: g = df1.groupby(df1.index)

In [13]: for i in df1.groupby(df1.index): print i
(('A1', 'Here', 1),
                       Date  Value
Name Local Code                   
A1   Here  1     01-01-1990    1.2
           1     01-02-1990    0.8
           1     01-03-1990    1.6)

answered Mar 19 '14 at 17:41

Andy Hayden

359,921
101
625
535

Sorry, I thought this solved the problem, but it didn't. – Lucas Mar 19 '14 at 18:55
@Lucas obviously you have to elaborate. Anyway, like I commented usually it's better to use groupby methods e.g apply. – Andy Hayden Mar 19 '14 at 18:59

score 0 · Answer 2 · answered Mar 19 '14 at 17:11

0

Answer to your last question:

If you iterate through

groups = df.groupby(['name','local','code'])

you should get individual data frames per group, ie:

for g, grp in groups:
    print grp

answered Mar 19 '14 at 17:11

grasshopper

3,988
3
23
29

I think the OP has this, but the issue is that these aren't indexed by [name, local, code] (i.e.it's not respecting as_index at this stage) which is what the OP is asking about. – Andy Hayden Mar 19 '14 at 17:14

Python pandas groupby method not working properly

2 Answers2