3

I'm trying to extract some results from a download manager, the format is:

[#8760e4 4.3MiB/40MiB(10%) CN:2 DL:4.9MiB ETA:7s]

what I'd like to extract from the above example, would be an array that looks like this:

['4.3','MiB','40','MiB','10%','4.9','MiB','7','s']

I've tried to split this in various combinations, but nothing seems to be right. Would anyone happen to know how to do this or be able to offer suggestions?

Thank you!

dzm
  • 22,844
  • 47
  • 146
  • 226

3 Answers3

3

You can do

var arr = str.match(/ ([\d\.]+)(\w+)\/([\d\.]+)(\w+)\(([^\)]+)\).*:([\d\.]+)(\w+).*:([\d\.]+)(\w+)/).slice(1)

With your string, it gives

["4.3", "MiB", "40", "MiB", "10%", "4.9", "MiB", "7", "s"]

but it really depends on the possible strings. With only one example it's impossible to be sure. My advice would be to

  1. ensure you understand my regex (read it step by step)
  2. test and adapt with the knowledge of your domain

Here's an explanation : In between parenthesis, you have capturing groups, that's what we get in the array. Here are some of them :

  • ([\d\.]+) : this group is made of digit(s) and dot(s) (if you want to ensure there's at most one dot, use (\d+\.?\d*))
  • (\w+) : some letters
  • ([^\)]+) : some characters that aren't closing parenthesis

Be careful that if it gets too complex or deeply structured, then regexes won't be the right solution and you'll have to use a parsing logic.


EDIT

Following your comments, to help you with more complex strings.

Supposing you use this regex :

/ ([\d\.]+)(\w+)\/([\d\.]+)(\w+)\(([^\)]+)\).*:([\d\.]+)(\w+) ETA:(\d+h)?(\d+m)?(\d+s)?/

then

"[#8760e4 4.3MiB/40MiB(10%) CN:2 DL:4.9MiB ETA:1h30m7s]"

would give

["4.3", "MiB", "40", "MiB", "10%", "4.9", "MiB", "1h", "30m", "7s"]

and

"[#8760e4 4.3MiB/40MiB(10%) CN:2 DL:4.9MiB ETA:7s]"

would give

["4.3", "MiB", "40", "MiB", "10%", "4.9", "MiB", undefined, undefined, "7s"]

I changed the end of the regex. A group like (\d+h)? means "some digits followed by h, optionnal".

Denys Séguret
  • 372,613
  • 87
  • 782
  • 758
  • How about an explanation of what your regular expression does as well? :) – subZero Oct 24 '13 at 15:38
  • Gross... but +1 in any case :P Almost looks like you had that regex already in your clipboard! – Lix Oct 24 '13 at 15:38
  • 2
    Well... I always have the right regex in my clipboard. I'm just lucky. – Denys Séguret Oct 24 '13 at 15:39
  • @dystroy - [good luck](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) ;) – Lix Oct 24 '13 at 15:40
  • This works perfect! I have other checks in place, that ensure this data is this format, otherwise it won't attempt to match/split the match. Thanks so much! – dzm Oct 24 '13 at 15:49
  • I'll just post my regexp here since you were faster: http://regex101.com/r/eT3uK2 The explanations in the answer are valid for mine as well. PS: There's an extra 2 captured in mine, not sure if you wanted that :). – Tibos Oct 24 '13 at 15:49
  • So.. I've done some more tests, the data does change in one spot. At the end instead of `ETA:7s` this could be `ETA:1h30m7s` any suggestions to split those times? Thank you! – dzm Oct 24 '13 at 16:08
  • I've changed the last expression to `([\d\.(\w+)]+)` this gives me the last array value of `1h30m7s` that could be sufficient, then I can work with that to determine the total seconds. – dzm Oct 24 '13 at 16:12
  • I would match the whole string including `DL:` `ETA:` etc so it doesn't fail silently and give you wrong results – Vitim.us Oct 24 '13 at 16:30
1

I would like to suggest a different regex, usually .* is not a good thing to do, if for some reason your input change it will fail silently and return you wrong misleading results. So instead you make sure to match the whole thing to see if it has the format you're expecting.

Follow my Regex, it have a slightly different output than OP asked, though.

Test string: [#8760e4 4.3MiB/40MiB(10%) CN:2 DL:4.9MiB ETA:7s]

/\[(#\w+) (\d+.?\d*\w+)/(\d+.?\d*\w+)\((\d+%)\) CN:(\d+) DL:(\d+.?\d*\w+) ETA:(\w+)\]/

enter image description here

Regex broken down

regex part          matched part   captured part
-------------------------------------------------
\[                  [
(#\w+)              #8760e4        #8760e4
\s
(\d+.?\d*\w+)       4.3MiB         4.3MiB
/                   /
(\d+.?\d*\w+)       40MiB          40MiB
\((\d+%)\)          (10%)          10%
\s
CN:(\d+)            CN:2           2
\s
DL:(\d+.?\d*\w+)    DL:4.9MiB      4.9MiB
\s
ETA:(\w+)           ETA:7s         7s
\]                  ]

Output:

["#8760e4", "4.3MiB", "40MiB", "10%", "2", "4.9MiB", "7s"]
Vitim.us
  • 20,746
  • 15
  • 92
  • 109
0

First, you have to split it by space. Thus, discard the first element, pick the second, split by uppercase, get the first, i.e. 4.3, then split the second by /, pick the first and you will have MiB, split again for upper case and you will have 40, and the last split by non-alphanumeric char.. and so on..

X-Pippes
  • 1,170
  • 7
  • 25