0

I have a string with space separated addresses and I want to separate the number from the street name.

So if we have :

Street Blah Blah 34

or

34 Street Blah Blah

I want a regex to match the "Street Blah Blah" and another to match "34"

It can get more complex with addresses like this:

Überbrückerstraße 24a.

where it should return "24a" and the rest as a street or

Järnvägstationg. 3/B

where it should return 3/B and the rest as a street etc.

I am currently doing this using C# where I split all strings by space and return whichever string contains at least one number and then return all the rest as a street.

However I was wondering if it would be more elegant and more efficient to do this with Regex.

I've been fiddling with regex but I couldn't find a robust way so far. Any ideas?

Here are some unit test data. Input street, Expected premise number and expected street:

    [TestCase("Järvägstationg. 3/B", "3/B", "Järvägstationg.")]
    [TestCase("Überbrückerstraße 24a", "24a", "Überbrückerstraße")]
    [TestCase("Street Blah Blah 34", "34", "Street Blah Blah")]
    [TestCase("34 Street Blah Blah", "34", "Street Blah Blah")]
    [TestCase("Ueckerstr. 20 b", "20 b", "Ueckerstr.")]
    [TestCase("Elmshornerstraße 163", "163", "Elmshornerstraße")]
    [TestCase("Hallgartenerstrasse Moritzstr.", "", "Hallgartenerstrasse Moritzstr.")]
    [TestCase("19 Green Lane", "19", "Green Lane")]

I think out of these the

Ueckerstr. 20 b

is the trickiest, in which case, I don't mind if that one fails for now.

Nick
  • 2,877
  • 2
  • 33
  • 62
  • 1
    You misspelled Järvägstationg, there is supposed to be an n in there: Jär**n**vägstationg ;-) – Andreas Jun 07 '16 at 10:54
  • kind of a duplicate: Performance on [split](http://stackoverflow.com/questions/3601465/string-split-vs-regex-split) – aloisdg Jun 07 '16 at 11:00
  • @aloisdg from your link :`(using a character that will not exist anywhere else in the string)` – Thomas Ayoub Jun 07 '16 at 11:01
  • 1
    @ThomasAyoub indeed the problem is a bit different. So maybe OP should just run his horses and do a [benchmark](https://github.com/PerfDotNet/BenchmarkDotNet) – aloisdg Jun 07 '16 at 11:04

3 Answers3

1

http://www.phpliveregex.com/p/fWT

 var matches = Regex.Match(@"(.*)\s(\d+.*)", input);
Jamiec
  • 133,658
  • 13
  • 134
  • 193
Andreas
  • 23,610
  • 6
  • 30
  • 62
  • Possibly because this asked for a c# regex answer and you didnt give one (not my DV btw). Silly reason for a DV if so, as its the regex you gave which matters. – Jamiec Jun 07 '16 at 11:10
  • Well the c# has been edited in the question. If the downvoter wants to downvote so bad maybe it should be the question that did not include language? – Andreas Jun 07 '16 at 11:12
  • Indeed - I changed your answer to C# :) – Jamiec Jun 07 '16 at 11:13
  • @Jamiec yes agree, the regex can be translated. There is nothing special in my regex that is only php. However, I do not have any knowledge of C#. – Andreas Jun 07 '16 at 11:14
  • @Jamiec Thank you!! I really mean it. Finally someone that is not triggerhappy on the downvote and actually contribute to SO instead. – Andreas Jun 07 '16 at 11:16
  • On a more grumpy day I would have just DV you and moved on. Sorry :( – Jamiec Jun 07 '16 at 11:17
  • Ah thanks for that. For the record, it wasn't me who downvoted. I am now going through the responses and checking the code performance etc etc. – Nick Jun 07 '16 at 12:10
0

@"(?<=^\d[^ ]*) | (?=\d)" as split might work for you, it will however not work for Hallgartenerstrasse Moritzstr. since it will put Hallgartenerstrasse Moritzstr. in match group 0 and not 1:

Test:

using System;
using System.Text.RegularExpressions;

public class Example {
    public static void Main() {
        string[] inputs = {
            "Überbrückerstraße 24a",
            "34 Street Blah Blah",
            "Hallgartenerstrasse Moritzstr.",
            "Ueckerstr. 20 b"
        };
        foreach (string input in inputs) {
            string pat = @"(?<=^\d[^ ]*) | (?=\d)";
            string[] matches = Regex.Split(input, pat);
            foreach (string match in matches) {
                Console.Write("<{0}>", match);
            }
            Console.Write("\n");
        }
    }
}

Will output:

<Überbrückerstraße><24a>
<34><Street Blah Blah>
<Hallgartenerstrasse Moritzstr.>
<Ueckerstr.><20 b>
Andreas Louv
  • 46,145
  • 13
  • 104
  • 123
  • This works fine for all addresses where the street precedes the number, but fails for an address such as "19 Green Lanes" for instance – Nick Jun 07 '16 at 13:08
  • @Nick You didn't specify that in your original question. None of the other answers cater for that as well. – Andreas Louv Jun 07 '16 at 13:09
  • Yes I have just updated the question with specific case scenarios. Apologies, I thought it would be clear that I wanted to extract the number regardless where that is in the string. – Nick Jun 07 '16 at 13:14
0

If your input strings follow the same format, you can use:

(?<street>.*) (?<number>.*)

See Live demo

Then access it with:

var address = "Überbrückerstraße 24a.";
var m = Regex.Matches(address, @"(?<street>.*) (?<number>.*)");
var street = m[0].Groups["street"].Value;
var streetNumber = m[0].Groups["number"].Value;
Console.WriteLine(string.Format("Street Name: {0}, at {1}", street, streetNumber));

outputs:

Street Name: Überbrückerstraße, at 24a.

See live C#


Given what you provided after, I would use:

^(\d.*?) (.*)|(.*) (\d.*)|(.+)

where:

  • ^(\d.*?) (.*) matches the string with the number at the beginning;
  • (.*) (\d.*) matches the string with the number at the end;
  • (.+) matches the string that doesn't contain numbers. It must stay at the end or it will capture every case.

See Demo

Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142
  • Unfortunately we never know what the user will input there, so it could be or or some weird alien mixture – Nick Jun 07 '16 at 12:11
  • @Nick then you should edit your question with more information about your inputs. Keep in mind that sometimes it doesn't worth parting such cases – Thomas Ayoub Jun 07 '16 at 12:14
  • I have added test data to the question – Nick Jun 07 '16 at 13:09