0

I've got 2 CSV files.

The first CSV is a small dataset that looks like:

CSV ONE.csv`

COLUMN A    COLUMN B    COLUMN C    COLUMN D    COLUMN E
    1          XYZ          A            B           D
    2          YZX.12       E            F           G
    3          ZYX.567      H            I           J 

Second CSV is a much larger dataset that looks like:

CSV TWO.csv

COLUMN A   COLUMN B   COLUMN C   COLUMN D   COLUMN E   COLUMN F   COLUMN G
   1        ZYX.567       A          B          D         AAA        ABB
   2        SAMPLE A      E          F          G         BBB        ACA  
   3        SAMPLE B      H          I          J         CCC        BBC 
   4        XYZ           A          B          D         ABA        BBA
   5        SAMPLE C      E          F          G         ABC        BAB
   6        YZX.12        H          I          J         CCA        CAC

I want my output to be CSV TWO but only with COLUMN B from CSV ONE:

COLUMN A   COLUMN B   COLUMN C   COLUMN D   COLUMN E   COLUMN F   COLUMN G
   1        ZYX.567       A          B          D         AAA        ABB
   4        XYZ           A          B          D         ABA        BBA
   6        YZX.12        H          I          J         CCA        CAC

I am using Pandas data frames... Any help you can provide will be much appreciated. Thank you.

Amie Johnson
  • 57
  • 1
  • 5

1 Answers1

1

This is a merge problem with selecting only the relevant columns from your right dataframe.

df_merged = pd.merge(df1, df2[['COLUMN B', 'COLUMN F', 'COLUMN G']], on='COLUMN B', how='inner')

print(df_merged)
   COLUMN A COLUMN B COLUMN C COLUMN D COLUMN E COLUMN F COLUMN G
0         1      XYZ        A        B        D      ABA      BBA
1         2   YZX.12        E        F        G      CCA      CAC
2         3  ZYX.567        H        I        J      AAA      ABB
Erfan
  • 40,971
  • 8
  • 66
  • 78
  • I'm using df_merged = pd.merge(df1, df2[['group_list', 'Unnamed: 0', 'index']], on='group_list', how='inner') But I get the error: KeyError: "None of [Index(['group_list', 'Unnamed: 0', 'index'], dtype='object')] are in the [index]" Do you know what I am doing wrong? – Amie Johnson Mar 11 '19 at 17:43