I am learning about the two functions merge and join in pandas.
I have a lot of SQL experience so I am trying to understand how are similar things done in Pandas.
So... OK, I was watching a lecture about pandas merge/join but it raised more questions in me than it answered I'm afraid. I have two main questions.
- How are merge and join different in pandas, even their docs seem quite similar? Not sure why pandas needs both. Even this page doesn't answer that, it seems.
https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_sql.html
Can I join (like in SQL join) but based on inequalities
between two columns or based on more complicated logical conditions e.g.select * from t1 join t2 on t1.a < t2.b+1 and upper(t1.s1) > upper(t2.s2) select * from t1 join t2 on t1.a < t2.b+1 or upper(t1.s1) > upper(t2.s2 || t2.s3) select * from t1 join t2 on t1.a < t2.b+10 or len(t1.s1) > coalesce(t2.c, 5) select * from t1 join t2 on ( t1.a * t1.a < t2.b + 10 ) or ( upper(t1.s1) > REGEXP_REPLACE(t2.s2,'(.*) (.*)','\2, \1') )
How can we translate these three SQL queries e.g. to pandas joins/merges?
Note 1: a,b,c are some numeric columns, while s1, s2, s3 are string columns
The SQL flavor I used was Postgres in particular but my question is generic.
The regexp expression in the 4th example replaces the first and last name (if we assume t2.s2 has a full name in it e.g. 'John Smith'). ||
is string concatenation in this SQL dialect.
I am just trying to understand what are my limits in pandas (if any)
when compared to SQL w.r.t. using merge/join.
Note 2: I realize I can pre-massage my two data frames (left/right) so that I simplify the above join conditions. But say we don't want to do this for whatever reason. Then... is it possible to express the above SQL join semantics in pandas.