How to get a combined look at two different DataFrames

Question

I want to show a table which combines two DataFrames with conditional indexing. This works with one DataFrame:

room1,weather = pd.read_excel(mypath,sheetnames[0]),pd.read_excel(mypath,sheetnames[2])
selector = (room1.Time>='08:00') & (room1.Time<='18:00')
view     = ['Time','Cooling_plant_sensible_load']
room1[selector][view][:12]

which gives me something like this:

    Time    Cooling_plant_sensible_load
7   08:00   0.000
8   09:00   0.000
....
16  17:00   0.000
17  18:00   0.000
31  08:00   0.000

The weather DataFrame has a Series called Dry_Bulb_Temperature which I would like to add to the view so it shows like this

    Time    Cooling_plant_sensible_load    Dry_Bulb_Temperature
7   08:00   0.000                          18
8   09:00   0.000                          22
....
16  17:00   0.000                          19
17  18:00   0.000                          16
31  08:00   0.000                          12

I tried adding:

selector2 = (weather.Time>='08:00') & (weather.Time<='18:00')
pd.concat({'room1':room1[selector][view][:12],'wea':weather[selector2]['Dry_bulb_temperature']},axis=1)

which gave me a AttributeError: 'Series' object has no attribute '_data'

EDIT:

weather[selector2]['Dry_bulb_temperature'][:12] looks like this:

major
7        15.3
8        16.0
9        18.0
10       19.9
11       21.9
12       22.9
13       24.0
14       25.0
15       24.8
16       24.5
17       24.3
31       16.2
Name: Dry_bulb_temperature, dtype: float64

EDIT2:

The AttributeError: 'Series' object has no attribute '_data' is caused because weather[selector2]['Dry_bulb_temperature'] is a Series ~~while concat expects a DataFrame~~ which can not be concat-ed with a Dataframe, i.e. concat needs two similar types (previous comment is wrong as pointed out by @Philip below).

So I could combine the room1 DataFrame with the weather DataFrame. Is this the way to go? How do I avoid that the two 'Time' series are duplicated?

I've got a number of room(n) dataframes and was thinking that there might be a way for each to reference the same weather dataset.

can you show `weather_df[selector2]['Dry_bulb_temperature']`? — elyase, Sep 07 '13 at 04:49
`concat` expects objects the same *type*, not only `DataFrame`s. You can `concat` two `Series`, for example. — Phillip Cloud, Sep 07 '13 at 15:10
@Philip cheers, that makes a lot of sense, I'll update the question — pandita, Sep 08 '13 at 07:28

mattexx · Answer 1 · 2013-09-07T13:06:28.117

1

I'm not sure what is happening in your concat. It might be that you have a field name confused. I see both 'Dry_Bulb_Temperature' and 'Dry_bulb_temperature' in different parts of your question.

Assuming the two dataframes have the same index, I would concat the whole thing, then do your filter:

df = pd.concat([room1, weather[['Dry_Bulb_Temperature']]], axis=1)
df[(df['Time'] >= '08:00') & (df['Time'] <= '18:00')]

Less code and easier to read.

edited Sep 07 '13 at 13:06

answered Sep 07 '13 at 04:51

mattexx

6,456
3
36
47

Both dataframes have a `Time` series. However I couldn't get your pd.concat to work... it gives me the same `AttributeError` as above. I typed `df = pd.concat([room1, weather['Dry_bulb_temperature']], axis=1)` adding a `]` to your code above – pandita Sep 07 '13 at 05:41
This concats correctly: `df = pd.concat([room1, weather], axis=1)`. The problem is that `weather['Dry_bulb_temperature]` is a Series object, while concat expects a dataframe... – pandita Sep 07 '13 at 05:47
ah that's right. updated my answer, but looks like you already have your own! – mattexx Sep 07 '13 at 10:36
You can get a dataframe with one column using [['field']] as above, which gets around your duplicate column problem. Sorry I left you with non-working code before. Bad dev, bad! ;-) – mattexx Sep 07 '13 at 13:21

Andy Hayden · Accepted Answer · 2013-09-07T10:27:20.630

1

It looks like you want to do a join (which can merge a DataFrame and a Series on their index):

In [11]: df
Out[11]:
    Time  Cooling_plant_sensible_load  Dry_Bulb_Temperature
7  08:00                            0                    18
8  09:00                            0                    22

In [12]: s
Out[12]:
7    15.3
8    16.0
Name: Dry_bulb_temperature, dtype: float64

In [13]: df.join(s)
Out[13]:
    Time  Cooling_plant_sensible_load  Dry_Bulb_Temperature  Dry_bulb_temperature
7  08:00                            0                    18                  15.3
8  09:00                            0                    22                  16.0

See more in the merging, join and concating section of the docs.

Note:

You can create the Series/column using loc, avoiding chaining:

s = weather.loc[selector2, 'Dry_bulb_temperature']

edited Sep 07 '13 at 10:27

answered Sep 07 '13 at 08:46

Andy Hayden

359,921
101
625
535

Thanks @Andy this looks good too. However it doesn't work in my case since I've got overlapping Series in the dataframes. In the answer below I used `df.T.groupby(level=0).first().T` to remove duplicate columns after concat-ing the dataframes. Being new to pandas, I'm not too sure what's behind these methods though. Cheers. – pandita Sep 07 '13 at 12:41
Best way is to understand each of the functions in turn. If the example is different it'll help to give a snippet (e.g. the output of df.head().) – Andy Hayden Sep 07 '13 at 23:28

pandita · Answer 3 · 2013-09-07T07:21:55.967

Ok, I got something that works based on @mattexx initial proposal:

#pd.concat([room1, weather], axis=1)[selector][view.append('Dry_bulb_temperature')]
df = pd.concat([room1, weather], axis=1)


# Removing duplicate columns based on this link:
# http://stackoverflow.com/questions/16938441/how-to-remove-duplicate-columns-from-a-dataframe-using-python-pandas

df = df.T.groupby(level=0).first().T
selector = [(df.Time>='08:00') & (df.Time<='18:00')]
view     = ['Time','Cooling_plant_sensible_load','Dry_bulb_temperature']
df[['Time','Cooling_plant_sensible_load','Dry_bulb_temperature']][(df.Time>='08:00') & (df.Time<='18:00')][:12]

which gives:

    Time    Cooling_plant_sensible_load     Dry_bulb_temperature
7   08:00   0   15.3
8   09:00   0   16
.......................
17  18:00   0   24.3
31  08:00   0   16.2

Not sure if this is the best way to get there, but it works for now. Thanks guys for getting me on the right track.

You can use `df.loc[(df.Time >= '08:00') & (...), view]` to avoid chaining, and .head(12) to get the last 12 entries. :) — Andy Hayden, Sep 07 '13 at 08:35

How to get a combined look at two different DataFrames

3 Answers3