1

I have a document with lines containing email addresses and IP addresses. I need to split the document in terms of email addresses and IP addresses to store each IP/email address or words in the file in an array.

Is there a way to use regex/String Tokenizer to find email/IP address to do this? I am aware of how regex/String Tokenizer can be used to separate words in a sentence line by line. Just not sure how to find email/IP addresses. Because the file may contain illegal characters like @ \ // which should not be included in the array.

For example my document contains:

You can contact test@test.com, the address is 192.168.1.1.

My array should contain:

You

can

contact

test@test.com

the

address

is

192.168.1.1

user100000001
  • 35
  • 1
  • 2
  • 8
  • Are you looking to tokenize the string and then run the regex against each token to find the subsequent matches? I think this is close to what you need: http://regexr.com/3gspa, creating a regex that will match all based on the RFC but I think this should work – Dan King Oct 04 '17 at 00:43
  • Yes, I'd like to tokenize the string. I had a problem because I couldn't use String Tokenizer to retrieve the tokens as the IP address and email address would be split into separate tokens because of the symbols they include. Thanks for your help! – user100000001 Oct 05 '17 at 22:39

2 Answers2

0

Here is a regexr with some examples and a regex that should work for you.

Regex is (the email portion is copied from here, I'm also not positive it copied and pasted correct.):

(([^<>()\[\]\.,;:\s@\"]+(\.[^<>()\[\]\.,;:\s@\"]+)*)|(\".+\"))@(([^<>()[\]\.,;:\s@\"]+\.)+[^<>()[\]\.,;:\s@\"]{2,})|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Dan King
  • 1,080
  • 1
  • 11
  • 28
0

The regex for email address is:

[\w!#$%&'*+/=?^_`{|}~-]+(?:\.[\w!#$%&'*+/=?^_`{|}~-]+)*@(?:[\w](?:[\w-]*[\w])?\.)+[\w](?:[\w-]*[\w])?

And the regex for IP address is:

((?:(?:25[0-5]|2[0-4]\d|((1\d{2})|([1-9]?\d)))\.){3}(?:25[0-5]|2[0-4]\d|((1\d{2})|([1-9]?\d))))

In my opinion, you can use java.util.regex.Matcher and call method matcher.group(0) like:

 Pattern p = Pattern.compile("<your regex here>");
 Matcher m = p.matcher(str);
 List<String> strs = new ArrayList<>();
 while (m.find())
     strs.add(m.group(0));

These may works fine, but I'm not test yet.

Ray Eldath
  • 375
  • 3
  • 13