6

Have two dataframes, one has few information (df1) and other has all data (df2). What I am trying to create in a new column in df1 that finds the Total2 values and populates the new column accordingly based on the Names. Note that the Names visible in df1 will always find a match in Names of df2. I am wondering if there is some function in Pandas that already does this? My end goal is to create a bar chart.

alldatapath = "all_data.csv"
filteredpath = "filtered.csv"

import pandas as pd

df1 = pd.read_csv(
    filteredpath,     # file name
    sep=',',                    # column separator
    quotechar='"',              # quoting character
    na_values="NA",                # fill missing values with 0
    usecols=[0,1],     # columns to use
    decimal='.')                # symbol for decimals

df2 = pd.read_csv(
    alldatapath,     # file name
    sep=',',                    # column separator
    quotechar='"',              # quoting character
    na_values="NA",                # fill missing values with 0
    usecols=[0,1],     # columns to use
    decimal='.')                # symbol for decimals

df1 = df1.head(5) #trim to top 5

print(df1)
print(df2)

output (df1):

         Name  Total
0  Accounting      3
1   Reporting      1
2     Finance      1
3       Audit      1
4    Template      2

output (df2):

          Name   Total2
0    Reporting    100
1   Accounting    120
2      Finance    400
3        Audit    500
4  Information     50
5     Template   1200
6      KnowHow   2000

Final Output (df1) should be something like:

         Name  Total  Total2(new column)
0  Accounting      3    120
1   Reporting      1    100
2     Finance      1    400
3       Audit      1    500
4    Template      2   1200
Mr Lister
  • 45,515
  • 15
  • 108
  • 150
Gonzalo
  • 1,084
  • 4
  • 20
  • 40

1 Answers1

14

Need map by Series first for new column:

df1['Total2'] = df1['Name'].map(df2.set_index('Name')['Total2'])
print (df1)
         Name  Total  Total2
0  Accounting      3     120
1   Reporting      1     100
2     Finance      1     400
3       Audit      1     500
4    Template      2    1200

And then set_index with DataFrame.plot.bar:

df1.set_index('Name').plot.bar()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • thanks! I wiill study these fuctions to apply it to my global code. – Gonzalo Oct 17 '17 at 11:44
  • Do you know what to do when we want to copy all the columns for matching index in df2? e.g. in this case we only have "total2", what if there were 5 more columns and we want to include all 6 columns to df1? – Aisha Jan 13 '21 at 09:00
  • @Aisha -- Not sure if understand, can you check [this](https://stackoverflow.com/q/53010406) ? – jezrael Jan 13 '21 at 09:10