Pandas group and join

Question

I am new to pandas. I want to analysis the following case. Let say, A fruit market is giving the prices of the fruits daily the time from 18:00 to 22:00. For every half an hour they are updating the price of the fruits between the time lab. Consider the market giving the prices of the fruits at 18:00 as follows,

Fruit    Price
Apple     10
Banana    20

After half an hour at 18:30, the list has been updated as follows,

Fruit    Price
Apple     10
Banana    21
Orange    30
Grapes    25
Pineapple 65

I want to check has the prices of the fruits been changed of recent one[18:30] with the earlier one[18:00]. Here I want to get the result as,

Fruit   18:00   18:30
Banana    20     21

To solve this I am thinking to do the following,

1) Add time column in the two data frames.
2) Merge the tables into one.
3) Make a Pivot table with Index Fruit name and Column as ['Time','Price'].

I don't know how to get intersect the two data frames of grouped by Time. How to get the common rows of the two Data Frames.

Can you Create time and Update time. so that, you can get the value of Modified Fruit. — Anjali, Mar 11 '19 at 12:49
You don't need to pivot. Simply rename your price col for each dataframe as the proper time and do an "outer join" as argument in `pd.merge` — Nico Albers, Mar 11 '19 at 12:49

Erfan · Accepted Answer · 2019-03-11T13:25:40.383

2

You dont need to pivot in this case, we can simply use merge and use suffixes argument to get the desired results:

df_update = pd.merge(df, df2, on='Fruit', how='outer', suffixes=['_1800h', '_1830h'])

    Fruit       Price_1800h Price_1830h
0   Apple       10.0        10.0
1   Banana      20.0        21.0
2   Orange      NaN         30.0
3   Grapes      NaN         25.0
4   Pineapple   NaN         65.0

Edit

Why are we using the outer argument? We want to keep all the new data that is updated in df2. If we use inner for example, we will not get the updated fruits, like below. Unless this is the desired output by OP, which is not clear in this case.

df_update = pd.merge(df, df2, on='Fruit', how='inner', suffixes=['_1800h', '_1830h'])

    Fruit   Price_1800h Price_1830h
0   Apple   10          10.0
1   Banana  20          21.0

edited Mar 11 '19 at 13:25

answered Mar 11 '19 at 12:57

Erfan

40,971
8
66
78

That would not work in this case, since it would only give us back two rows (Apple, Banana) and not the updated other fruits. See my edit for explanation. @SergeBallesta – Erfan Mar 11 '19 at 13:23
@Erfan: Thank you for the answer. I learned a new thing from your answer. But The suffix is needed only when the merged data frames have two columns with the same name. Am I right? What If I have extra column as 'Family of Fruit'. Will it work? – Smith Dwayne Mar 12 '19 at 04:50
@SmithDwayne Family of Fruit would also be added indeed, since merge brings all the columns. Btw, you can accept an answer as correct if it helped you – Erfan Mar 12 '19 at 09:54

user2672299 · Answer 2 · 2019-03-11T17:27:28.247

0

If Fruit is the index of your data frame the following code should work. The Idea is to return rows with inequality:

df['1800'] = df1['Price']
df['1830'] = df2['Price']
print(df.loc[df['1800'] != df1['1830']])

You can also use datetime in your column heading.

edited Mar 11 '19 at 17:27

answered Mar 11 '19 at 12:59

user2672299

414
2
12

1

Some explanation of the code my also help learners. Code only answers are not always the most helpful. Please add some more text using the [edit] button. – Brian Tompsett - 汤莱恩 Mar 11 '19 at 14:39

Pandas group and join

2 Answers2