0

I have a csv files with the following setup:

  col0 col1     col2 col3  col4
    0    A       12   22    23
    1    B       40   59    63
    2    F       76   81    99
    3    J       10   11    12
    4    A       20   15    43

df = pd.read_csv('foo.csv,dtype='object',sep=r'\s*,\s*',engine='python',encoding='ascii')

# col0 represents pandas default indexing
# This dataframe contains 230 rows

I have a second csv file with the following setup:

col0 colx coly colz colz1

 0    A    12   211   313 
 1    J    44   222   134
 2    A    33   344   277
 3    B    456  344   333

df2 = pd.read_csv('foo2.csv,dtype='object',sep=r'\s*,\s*',engine='python',encoding='ascii')

# I have 14 rows in this dataframe. 

The goal is to check col1 against colx if the elements match, then I would like the code to append coly, colz, colz1 to the first dataframe. Forexample,

  col0 col1   col2 col3   col4  col5 col6  col7 
    0    A       12   22    23   12   211  313
    1    B       40   59    63   
    2    F       76   81    99
    3    J       10   11    12   44   222  134
    4    A       20   15    43

# I have left some rows blank just to highlight 'A' and 'J'.

Here is one of the more reasonable solutions I could come up with.

def add_multiple_columns(x):

    df2 = pd.read_csv('foo2.csv', dtype='object', sep=r'\s*,\s*', engine='python')

    df2 = df2.apply(lambda y: y['coly', 'colz', 'colz1'] if x['col1'] == y['colx'] else None, axis=1)


    df2 = df2.dropna(axis=0, how='all')

    if df2.empty:

        df2 = 0

    else:

        df2 = df2.to_string(index=False)

    return df2


df['col5', 'col6', 'col7'] = df.apply(add_multiple_columns, axis=1)

This code results in such a messy traceback that I am not even sure if I would want to post the whole thing. The only things I understood were that there is a *type error, 'str' object cannot be interpreted as an integer and a KeyError: ('('col1', 'occurred at index 0', 'occurred at index 0')

Please keep in mind that I dont have an equal number of rows in the two dataframes so as a result there will be multiple instances where rows in the first dataframe will need to have the same quantities from the second dataframe appended to them. How do I make this work?

JM9
  • 119
  • 1
  • 8

1 Answers1

0

I saw that you had multiple rows in df2 colx that were A and you matched to the first row with A, so I dropped duplicates and kept the first value. This does a left merge which will give you the behavior you want. I noticed in your output you didn't match on B, it should match on that, right?

df = pd.read_csv('foo.csv',dtype='object',sep=r'\s*,\s*',engine='python',encoding='ascii')
df2 = pd.read_csv('foo2.csv',dtype='object',sep=r'\s*,\s*',engine='python',encoding='ascii')
temp = df2[['colx', 'coly', 'colz', 'colz1']] # this drops col0
temp = temp.drop_duplicates(subset='colx', keep='first') # This is to drop duplicate values for colx
df3 = df.merge(temp,on_left='col1', on_right='colx', how='left')
df3 = df3.rename(columns = {'coly':'col5', 'colz':'col6', 'colz1':'col7'}) 
MBA Coder
  • 374
  • 1
  • 3
  • 13