String Tokenizer/Regex to find email address/IP Address in a file

Question

I have a document with lines containing email addresses and IP addresses. I need to split the document in terms of email addresses and IP addresses to store each IP/email address or words in the file in an array.

Is there a way to use regex/String Tokenizer to find email/IP address to do this? I am aware of how regex/String Tokenizer can be used to separate words in a sentence line by line. Just not sure how to find email/IP addresses. Because the file may contain illegal characters like @ \ // which should not be included in the array.

For example my document contains:

You can contact test@test.com, the address is 192.168.1.1.

My array should contain:

You

can

contact

test@test.com

the

address

is

192.168.1.1

Are you looking to tokenize the string and then run the regex against each token to find the subsequent matches? I think this is close to what you need: http://regexr.com/3gspa, creating a regex that will match all based on the RFC but I think this should work — Dan King, Oct 04 '17 at 00:43
Yes, I'd like to tokenize the string. I had a problem because I couldn't use String Tokenizer to retrieve the tokens as the IP address and email address would be split into separate tokens because of the symbols they include. Thanks for your help! — user100000001, Oct 05 '17 at 22:39

Dan King · Answer 1 · 2017-10-04T00:53:43.160

0

Here is a regexr with some examples and a regex that should work for you.

Regex is (the email portion is copied from here, I'm also not positive it copied and pasted correct.):

(([^<>()\[\]\.,;:\s@\"]+(\.[^<>()\[\]\.,;:\s@\"]+)*)|(\".+\"))@(([^<>()[\]\.,;:\s@\"]+\.)+[^<>()[\]\.,;:\s@\"]{2,})|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}

edited Oct 04 '17 at 00:53

answered Oct 04 '17 at 00:48

Dan King

1,080
1
11
28

This helps. Thanks! – user100000001 Oct 05 '17 at 22:40

score 0 · Accepted Answer · answered Oct 04 '17 at 02:28

The regex for email address is:

[\w!#$%&'*+/=?^_`{|}~-]+(?:\.[\w!#$%&'*+/=?^_`{|}~-]+)*@(?:[\w](?:[\w-]*[\w])?\.)+[\w](?:[\w-]*[\w])?

And the regex for IP address is:

((?:(?:25[0-5]|2[0-4]\d|((1\d{2})|([1-9]?\d)))\.){3}(?:25[0-5]|2[0-4]\d|((1\d{2})|([1-9]?\d))))

In my opinion, you can use java.util.regex.Matcher and call method matcher.group(0) like:

 Pattern p = Pattern.compile("<your regex here>");
 Matcher m = p.matcher(str);
 List<String> strs = new ArrayList<>();
 while (m.find())
     strs.add(m.group(0));

These may works fine, but I'm not test yet.

String Tokenizer/Regex to find email address/IP Address in a file

2 Answers2