0

Long time user and first time poster here.

I'm new to pandas and python. I'm working with a data frame that contains either a single value, or a string of values separated by a semicolon; the second data frame is a list of subcontractors and the company that oversees the subcontractor. I'm trying to figure out how to merge these data frames based on the two columns that could contain the subcontractors name. In order to cut down on a lot of the steps I was trying with the pandas library, I've been trying pandasql.

I've tried

on_resource_name = '''
    SELECT
    w.*, v.[Company Name], v.[Oversite Company] as [Survey Submitter Oversite Co]
    FROM
    wo_sat w
    LEFT JOIN vendors v ON
    v.[Company Name] LIKE  '%'||w.[Embedded Data - Resource Name]||'%' OR 
    v.[Company Name] LIKE '%'||w.[Embedded Data - Source Company]||'%' OR
    v.[Company Name] LIKE  w.[Embedded Data - Resource Name]||'%' OR 
    v.[Company Name] LIKE w.[Embedded Data - Source Company]||'%'      
    ;
'''

For some reason, the wild cards seem to be ignored and only exact matches are being returned. For example, if a cell contains only (SODEXO), Sodexo is returned, but on a cell that contains, (SODEXO; Jerry Thompson; Bill Murray) no value is matched.

Any help would be greatly appreciated!

upe
  • 1,862
  • 1
  • 19
  • 33
  • In your example do the values have parentheses which makes a difference? – Parfait May 27 '20 at 20:43
  • No, the values do not have parenthesis. When I try without the closing parentheses, like this v.[Company Name] LIKE '%||w.[Embedded Data - Resource Name]||%', I get (sqlite3.OperationalError) near "%": syntax error – idratherbehiking May 27 '20 at 21:48
  • Please post actual sample of data of both data frames for a [reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Parfait May 27 '20 at 22:40

0 Answers0