0

I have two data frames mention below.

df1 dataframe consists SaleDate column as the unique key column df1 shape is (12, 11)

the 2nd data frame mention below

df2 dataframe consists SaleDate column as the unique key column

df2 shape is (2,19)

But the dimension of each data-frame are different .

Some how I need to join 2 data-frames based on new [month-year] column which can be derived from SaleDate and add same urea price for whole month of the respective year.

Expected out put mention below

df3 data-frame consist of monthly ureaprice for each raw at the data-frame The shape of new dataframe (13,11)

***The actual df1 consist of 2 Million records and df2 consist of 360 records.

I tried to join two data-frames with left join to get above output. But, unable to achieve it.

import pandas as pd # Import Pandas for data manipulation using dataframes

    df1['month_year']=pd.to_datetime(df1['SaleDate']).dt.to_period('M')
    df2['month_year'] = pd.to_datetime(df2['SaleDate']).dt.to_period('M')

df1 = pd.DataFrame({'Factory': ['MF0322','MF0657','MF0300','MF0790'], 
               'SaleDate': ['2013-02-07','2013-03-07','2013-06-07','2013-05-07']
               'month-year':['2013-02','2013-03','2013-06','2013-05']})

df2 = pd.DataFrame({'Price': ['398.17','425.63','398.13','363','343.33','325.13'], 
                   'Month': ['2013-01-01','2013-02-01','2013-03-01','2013-04-01','2013-05-01','2013-06-01']
                   'month-year':['2013-01','2013-02','2013-03','2013-04','2013-05','2013-06']})

Final data frame

s1 = pd.merge(df1, df2, how='left', on=['month_year'])

all values pertaining for the urea-price was "NaN".

Hope to get expert advice in this regard.

hiran
  • 69
  • 5
  • Welcome to SO. Please provide a [mcve] (no images/links). If you need help with this for Pandas, see **[How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)**. – jpp Feb 03 '19 at 14:49

2 Answers2

0

Assuming your SaleDate columns are string dtypes, you could just do: df1['month_year'] = df1['SaleDate'].apply(lambda x: x[:7]) df2['month_year'] = df2['SaleDate'].apply(lambda x: x[:7])

And I think the rest should work!

user8710949
  • 17
  • 1
  • 4
  • I need to join 2 data-frames and get each urea-price for respective [month-year] of df1 I have given sample data frames above. – hiran Feb 03 '19 at 15:28
  • I understand what you're trying to achieve. However, you posted images of the data frames so I can't actually reproduce it on my machine to test out possible fixes. In order to do that, I would have to manually copy all your data by hand. Are you seriously expecting someone on SO to do that? Let us know what df1 and df2 look like after you add in your `month_year` columns. This will help us determine what the problem is. – user8710949 Feb 03 '19 at 15:40
0

I copied your code, without month_year column:

df1 = pd.DataFrame({'Factory': ['MF0322','MF0657','MF0300','MF0790'],
    'SaleDate': ['2013-02-07','2013-03-07','2013-06-07','2013-05-07']})
df2 = pd.DataFrame({'Price': ['398.17','425.63','398.13','363','343.33','325.13'],
    'Month': ['2013-01-01','2013-02-01','2013-03-01','2013-04-01','2013-05-01',
    '2013-06-01']})

Then I created month_year column in both DataFrames:

df1['month_year'] = pd.to_datetime(df1['SaleDate']).dt.to_period('M')
df2['month_year'] = pd.to_datetime(df2['Month']).dt.to_period('M')

and merged them:

s1 = pd.merge(df1, df2, how='left', on=['month_year'])

When I executed print(s1) I got:

  Factory    SaleDate month_year   Price       Month
0  MF0322  2013-02-07    2013-02  425.63  2013-02-01
1  MF0657  2013-03-07    2013-03  398.13  2013-03-01
2  MF0300  2013-06-07    2013-06  325.13  2013-06-01
3  MF0790  2013-05-07    2013-05  343.33  2013-05-01

As you can see, Price column is correct, equal to Price for respective month (according to SaleDate).

So generally your code is OK.

Check for other sources of errors. E.g. in your code snippet:

  • you first set month_year in each DataFrame,
  • then you create both DataFrames again, destroying the previous content.

Copy my code (and nothing more) and confirm that it gives the same result. Maybe the source of your problem is in some totally other place?

Note that e.g. your df2 has Month column, not SaleDate. Maybe this is the root cause?

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • Thank you for the prompt reply. Once I execute the above it give exact results. But, when I work with the csv files , it give "NaN" for the column price . Is it possible to test my 2 csv files? – hiran Feb 03 '19 at 17:24
  • Note that the field to be read as a price should have a **dot** between integer and decimal part. Maybe in some place you have a **comma**? Find rows in your DataFrame with these *NaN* values and check corresponding rows in the source file. – Valdi_Bo Feb 03 '19 at 17:35