I have a csv files with the following setup:
col0 col1 col2 col3 col4
0 A 12 22 23
1 B 40 59 63
2 F 76 81 99
3 J 10 11 12
4 A 20 15 43
df = pd.read_csv('foo.csv,dtype='object',sep=r'\s*,\s*',engine='python',encoding='ascii')
# col0 represents pandas default indexing
# This dataframe contains 230 rows
I have a second csv file with the following setup:
col0 colx coly colz colz1
0 A 12 211 313
1 J 44 222 134
2 A 33 344 277
3 B 456 344 333
df2 = pd.read_csv('foo2.csv,dtype='object',sep=r'\s*,\s*',engine='python',encoding='ascii')
# I have 14 rows in this dataframe.
The goal is to check col1 against colx if the elements match, then I would like the code to append coly, colz, colz1 to the first dataframe. Forexample,
col0 col1 col2 col3 col4 col5 col6 col7
0 A 12 22 23 12 211 313
1 B 40 59 63
2 F 76 81 99
3 J 10 11 12 44 222 134
4 A 20 15 43
# I have left some rows blank just to highlight 'A' and 'J'.
Here is one of the more reasonable solutions I could come up with.
def add_multiple_columns(x):
df2 = pd.read_csv('foo2.csv', dtype='object', sep=r'\s*,\s*', engine='python')
df2 = df2.apply(lambda y: y['coly', 'colz', 'colz1'] if x['col1'] == y['colx'] else None, axis=1)
df2 = df2.dropna(axis=0, how='all')
if df2.empty:
df2 = 0
else:
df2 = df2.to_string(index=False)
return df2
df['col5', 'col6', 'col7'] = df.apply(add_multiple_columns, axis=1)
This code results in such a messy traceback that I am not even sure if I would want to post the whole thing. The only things I understood were that there is a *type error, 'str' object cannot be interpreted as an integer and a KeyError: ('('col1', 'occurred at index 0', 'occurred at index 0')
Please keep in mind that I dont have an equal number of rows in the two dataframes so as a result there will be multiple instances where rows in the first dataframe will need to have the same quantities from the second dataframe appended to them. How do I make this work?