3

I am working on a simple CSS parser in Python. Right now I want to extact all values from this string: "1px solid rgb(255, 255, 255)". Right now my pattern (which is not working) is: "\S+[^rgb]+". When I use it with string "1px solid rgb(255, 255, 255)", I get following:

...
>>> re.findall("\S+[^rgb]+", string)
("1px solid", "rgb(255, 255, 255)")

And I want it to be

("1px", "solid", "rgb(255, 255, 255)")

P.S. Also, is there a better way for parsing CSS declaration? Currently my pattern is "[\s]?(\S+)[\s]?:[\s]?(.+)[\s]?;". Parsing "color: red;" gives me:

("color", "red")
JadedTuna
  • 1,783
  • 2
  • 18
  • 32

2 Answers2

2

You can try this:

(\S+)[ ]+(?:(\S+)[ ]+)?(rgb\([^)]+\))

http://regex101.com/r/vA4kH1

EDIT: Whatever you're trying to do, this is probably not the right way to handle it, because CSS syntax can be unpredictable. You can use tinycss, the Python CSS parser for something more sane:

http://pythonhosted.org/tinycss/

One last edit...

As per your solution, you're doing a findAll, which puts them in an array separately. You only need rgb() in there once, ignoring the space. This should work for the value pattern, which is cleaner than what you have. And also note, that you don't want to use "." for your rgb() expression. If you have rgb() 1px rgb() on the same line, regexes are greedy by default...it'll match as much as it can. Try this: r"(rgb([^)]+))|(\S+))"

sdanzig
  • 4,510
  • 1
  • 23
  • 27
  • I am not sure how it is supposed to work. It just extracts all `(num, num, num)` from the text – JadedTuna Oct 25 '13 at 21:09
  • Oh, I thought you meant values as in the numeric values. What exactly do youmean by "values" for your sample string? – sdanzig Oct 25 '13 at 21:11
  • Oh, sorry. My fault. Please check my modified answer, I wrote which output do I actually need – JadedTuna Oct 25 '13 at 21:12
  • Whoops I ran into another problem. Whenether I try to use it with string `"1px rgb(255, 255, 255)"` it gives me an empty list. – JadedTuna Oct 25 '13 at 21:21
  • Unlucky. If I use more arguments (ex `"1px solid blah rgb(255, 255, 255)"`), it produces `["solid", "blah", "rgb(255, 255, 255)"] # No '1px' here`. – JadedTuna Oct 25 '13 at 21:29
  • Spent too much time on this already, just to see a change in requirements. Points on stack overflow don't get you slave labor :) – sdanzig Oct 25 '13 at 21:55
  • Sorry :). Please check out my answer. I posted working (I hope) code there – JadedTuna Oct 25 '13 at 21:59
1

Ok. I got it working (I hope). Here is the final code.


EDIT

After long and boring reading of the manual I finally got it working properly: "rgb\([^)]*\)|\S+"

JadedTuna
  • 1,783
  • 2
  • 18
  • 32
  • I don't understand why you repeated the rgb() expression, and have those .'s before and after. But yeah, for your case, definitely easier to match one token at a time. I actually did give an attempt to have a more flexible expression out of curiosity, but my effort fell flat: http://stackoverflow.com/questions/19600204/why-arent-these-non-capturing-regex-groups-working-right – sdanzig Oct 25 '13 at 22:14
  • @sdanzig, I repear rgb's two times to make it match `rgb(...)` **before** and **after** other text (like `"solid", "1px"`) – JadedTuna Oct 25 '13 at 22:16