Finding this one difficult and would like to see if you can help, pls:
I have two df
's, dfA
and dfB
. In each there is a telnum
, file
, and datetime
column, but only dfA
contains a name
column and only dfA
has complete telephone number string values unlike dfB
which sometimes has partially complete telephone numbers up to x
of n
digits, e.g. sometimes the int'l calling code is missing, other times a '0' is present instead of the int'l calling code, and other times neither int'l calling code or the leading '0' is present.
dfA
also contains more data than dfB
(<10 rows), but between the two there are complete timestamp
, file
, and telnum
columns always with a date-time and string value respectively (even though dfB['telnum']
is not fully complete as stated above).
What I want to do is extract the rows from dfA
where dfB['telnum']
matches dfA['telnum']
but because dfB['telnum']
is not always complete then I need to check for matches as a substring of dfA['telnum']
too.
I would like the result to be dfResult
but where the returned results have dfA
values on the left and dfB
on the right so that I can see the different file
and datetime
values.
Any ideas?
EDIT:
I think I need an inner merge, e.g.
pandas.merge(dfA, dfB, on='telnum', how='inner')
However, because dfB['telnum']
isn't always complete telnum
string, the results are not complete. How can I get those that match between the two by checking if dfB['telnum']
is a substring of dfA['telnum']
as well?