2

I'm having difficulty coming up with a simple merge logic.

 TimeSeries = A.merge(B, on = ['YEAR','MONTH'], how= 'left')

I have the following dataframe. A continuous month/year column from 2001-2018 (12 months X 17 yrs)

A:

|---------------------|------------------|
|       Month         |         Year     |
|---------------------|------------------|
|          12         |         2001     |
|---------------------|------------------|
|          01         |         2002     |
|---------------------|------------------|
|       ......        |        .....     |
|---------------------|------------------|
|          12         |         2018     |
|---------------------|------------------|

And the other data frame which has a unique ID, month, year and $ amount

B:

 |-----------------|----------------|--------------|--------|
 |       Month     |       Year     |       ID     | Amount |
 |-----------------|----------------|--------------|--------|
 |          05     |       2013     |       2      | 500K   |
 |-----------------|----------------|--------------|--------|
 |          01     |       2002     |       1      | 200K   |
 |-----------------|----------------|--------------|--------|
 |          12     |       2016     |       3      | 800K   |
 |-----------------|----------------|--------------|--------|

How would I be able to append/merge both dataframes so each ID from dataframe B has the entire time series from dataframe A. And it should merge on the correct month and year column.

Desired Output for ID. Each ID should have ~204 rows and if the month and year are same, then display the correct $ amount

 |---------------------|------------------|--------------|-----------|
 |       Month         |         Year     |       ID     |   Amount  |
 |---------------------|------------------|--------------|-----------|
 |          01         |         2001     |       1      |    NaN    |
 |---------------------|------------------|--------------|-----------|
 |          01         |         2002     |       1      |    200K   |
 |---------------------|------------------|--------------|-----------|
 |          12         |         2018     |       1      |    NaN    |
 |---------------------|------------------|--------------|-----------|
Cesar
  • 617
  • 3
  • 8
  • 17
  • 3
    Your merge seems fine, what goes wrong? – Erfan Jun 04 '19 at 13:49
  • @Erfan It only does the merge on the month and year and does not repeat the entire time series I want. So I only get 5 rows instead of 204 rows. – Cesar Jun 04 '19 at 13:53
  • Oh the issue must be that ID doesn't repeat, and is instead NaN for most rows. This is more a `reindex` – ALollz Jun 04 '19 at 13:53
  • Could you maybe reset your python interpreter and try again? the `how='left'` argument should keep your `A` dataframe's shape intact. – Erfan Jun 04 '19 at 13:55

1 Answers1

1

You want .reindex, that way ID is filled in for each row. There's no need for DataFrameA:

import pandas as pd

names = ['Month', 'Year', 'ID']
idx = pd.MultiIndex.from_product([range(1, 13, 1), range(2001, 2019, 1), B.ID.unique()], 
                                 names=names)

B = B.set_index(names).reindex(idx).reset_index()

Output:

print(B.head())

#   Month  Year  ID Amount
#0      1  2001   2    NaN
#1      1  2001   1    NaN
#2      1  2001   3    NaN
#3      1  2002   2    NaN
#4      1  2002   1   200K

B.groupby('ID').size()
#ID
#1    216
#2    216
#3    216
#dtype: int64

Performant Cartesian Product has ways to optimize the product if it's rather slow

ALollz
  • 57,915
  • 7
  • 66
  • 89