-1

I have two data frame where I have * asterisk wild card in string which need to be compared with full string such that * asterisk will match any number of characters.

This same is working in Excel vlookup using below fomula, need same to be done using Python / Pandas Dataframe tried using pd.merge but * asterisk is considered as String in python instead of wildcard

Left_dataframe Right_dataframe Formula in excel working Output expected
Compare * data Compare with the data VLOOKUP(A2,B:B,1,0) Compare with the data

Tried using using Python / Pandas Dataframe tried using pd.merge but * asterisk is considered as String in python instead of wildcard and merge in not working

Timus
  • 10,974
  • 5
  • 14
  • 28
  • 4
    Can you make a minimum reproducible example of the data? https://stackoverflow.com/help/minimal-reproducible-example – oskros Jan 23 '23 at 08:34
  • Please provide enough code so others can better understand or reproduce the problem. – Community Jan 23 '23 at 09:26

1 Answers1

0

It is difficult to propose a solution. In the absence of 'min reproducible example', you might consider the following:

#import libraries
import pandas as pd
import re  ##not used initially

#create dataframes
left_data = {'Left_dataframe': 'Compare * data'}
right_data = {'Right_dataframe': 'Compare with the data'}

df_left = pd.DataFrame(left_data, index=[0])
df_right = pd.DataFrame(right_data, index=[0])

#check df
df_left
df_right

#compare cell value | escape *
df_output_where = df_right.where(df_right.Right_dataframe.str.contains('/*'), df_left)
df_output_where


'''
[Disclaimer] Below might not be the most pythonic way.
The below add-on checks for the '*' asterisk within the first df, and in addition,
check if the first 'word' in the two df matches.
If so, it returns the value in the second df.
'''

## pattern to use
str_pattern5 = r'(\w+) (.*) (\w+)'

## startswith first word in column (returned by re.match())
## assist from https://stackoverflow.com/a/56073916/20107918
## assist from https://stackoverflow.com/a/61713961/20107918
compare_left = df_left.Left_dataframe.apply(lambda x: x.startswith(re.match(str_pattern5, (df_left.Left_dataframe.values)[0]).group(1))) #.group(1))
compare_right = df_right.Right_dataframe.apply(lambda x: x.startswith(re.match(str_pattern5, (df_right.Right_dataframe.values)[0]).group(1))) #.group(1))

## return value from the right dataframe where
## the right df, compare_right == compare_left

df_output_compare02 = df_right.Right_dataframe if ((df_left.Left_dataframe.str.contains('/*')).all() and (compare_right == compare_left).all()) else None
df_output_compare02

#### NB: 
#df_output_where_compare = df_right.where(((df_left.Left_dataframe.str.contains('/*')) and (compare_right == compare_left)))  ##truth value 
#df_output_where_compare = df_right.Right_dataframe.where((df_left.Left_dataframe.str.contains('/*')).all() and (compare_right == compare_left).all(), df_left)  #ValueError: Array conditional must be same shape as self
#df_outpute_where_compare

PS: Below does not return the desired output

#### PS: does not return the desired output

#perform **pd.merge**
df_output = df_left.merge(df_right, how='outer', left_on='Left_dataframe', right_on='Right_dataframe')
df_output

Further consideration:
string If performing a check on * as a pattern, do it as regex.
You may import re for better regex handling.

semmyk-research
  • 333
  • 1
  • 9