-1

I'm trying to join 2 data frames using regex. In one data frame is Postcode Area (e.g. BA, M) in the other is Postcode District (e.g. BA1, M18). I want to join on the Postcode Area. My regex is ([A-Z][A-Z]?). How to I actually join the two data frames on this regex?

My regex is ([A-Z][A-Z]?).

eshirvana
  • 23,227
  • 3
  • 22
  • 38
  • [How to merge pandas table by regex](https://stackoverflow.com/questions/49522818/how-to-merge-pandas-table-by-regex) – Ryan Aug 01 '23 at 20:10
  • I don’t understand - I have a set regex to be applied as the join condition. The circumstances in that post seem very different with 3 different regex functions stored within the data frame to perform the join on – oscarwitch Aug 01 '23 at 21:33
  • Hi Oscar! Welcome to StackOverflow. I think it's best to do the matches beforehand and then join – Mark Aug 01 '23 at 23:52
  • From the wiki of the pandas tag: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Paul H Aug 02 '23 at 00:00

1 Answers1

0

You could do something like this:

df1 = pd.DataFrame(
    {
        "PostcodeArea" : ["BA", "M", "AB"],
        "Location" : ["Bath", "Manchester", "Aberdeen"],

    }
)
df2 = pd.DataFrame(
    {
        "PostcodeDistrict" : ["BA1", "M1", "AB1"],
    }
)

df2.assign(PostcodeArea = df2['PostcodeDistrict'].str.extract(r"([A-Z]+)")).merge(df1, on="PostcodeArea")

# Output:
  PostcodeDistrict PostcodeArea    Location
0              BA1           BA        Bath
1               M1            M  Manchester
2              AB1           AB    Aberdeen
Mark
  • 7,785
  • 2
  • 14
  • 34