I have two data frames.
df1:
filename|data
fileA|1
fileB|33
fileC|343
df2:
path|filesize|filetype
/tmp/fileA.csv|123|csv
/tmp/fileB.csv|123|csv
/tmp/fileC.csv|3534|csv
/tmp/fileD.csv|234|csv
I want the result to be
filename|data|path
fileA|1|/tmp/fileA.csv
fileB|33|/tmp/fileB.csv
fileC|343|/tmp/fileC.csv
fileD|3243|/tmp/fileD.csv
This seems extremely simple but I can't seem to get it to work with .assign(). I need to match each row that is in df1.filename with what is in df2.filepath and then add df1['filepath'] to df1.
I tried the following, but it complains that Series is not "hashable"
df1.assign(path = lambda x: df2[df2.path.str.contains(x.filename + ".csv")][path])
{TypeError}'Series' objects are mutable, thus they cannot be hashed
I tested to make sure my df1.assign() was correct by doing
df1.assign(path = lambda x: x.filename)
and it worked and just appended the filename on the df1 (which is what I would expect).
I'm assuming that the problem area is the `contains(x.filename + ".csv") being a "Series". If I change it to x.filename.values I then get
{TypeError}unhashable type: 'numpy.ndarray'. I don't understand what "x" is. I assume its a Series object, but no idea how to tell which "row" its associated with if it is.
I could brute-force this and just loop over df1 but df1 is 2M+ records and loops seem to be generally frowned upon for performance reasons with pandas. Can someone point me into what I'm doing wrong?