0

In my code I use merge/join in numerous places. Recently I bumped on an join probably just making a Cartesian product (probably only out of 5000 files to process). Since the code works on a 64 bit system/python, this join keeps running to fill all memory, blocking every process/user on this hardware node. Since no actual error occurs, it is very hard to debug as well.

Is there a easy way to test the validity of a join/merge, which i could use in an assert statement?

Thanks,

Luc

user1708646
  • 111
  • 1
  • 6
  • In my opinion the only way to do it is to understand the different join types ([this](http://pandas.pydata.org/pandas-docs/stable/merging.html#brief-primer-on-merge-methods-relational-algebra) gives the equivalent to SQL) – herrfz Feb 19 '13 at 08:31
  • This is strange, I thought joins and merges were actually references rather that creating new data (in memory), what Wes means by ["cold"](http://stackoverflow.com/a/8992714/1240268)? – Andy Hayden Feb 19 '13 at 10:52
  • merge/join creates a new dataframe, that the memory problem. I think of a test to check the content of the join fields. If the join columns do not match (compare two set(index_col.values) of join columns) this might be used to decide to merge or just return an error... – user1708646 Feb 19 '13 at 15:09
  • You can create a function that takes names of 2 dataframes & join columns. then you can use loops to find out matching ones. – r.burak Feb 14 '21 at 13:23

0 Answers0