1

I have a string like this, where symbol and property vary:

a = '/stock/%(symbol)s/%(property)s'

I have another string like this, where AAPL and price vary:

b = '/stock/AAPL/price'

I'm trying to generate a dict like this:

c = {
    'symbol': 'AAPL',
    'property': 'price'
}

With string formatting, I could do a this:

> a % c == b
True

But I'm trying to go the other direction. Time for some regex magic?

nathancahill
  • 10,452
  • 9
  • 51
  • 91
  • Are you sure you don't want your dictionary to be `D = {'APPL' : price}` so you can look up price by symbol? Otherwise you will need a new dictionary for each stock. – beroe Aug 21 '13 at 16:26
  • 1
    I'm assuming (unlike other answers so far) that your first-string doesn't *necessarily* say `symbol` and/or `property`, e.g., it might read `/zog/%(evil)s=%(level)s,%(flavor)s`. Is that the case? – torek Aug 21 '13 at 16:33
  • Do you have control of the format of `a`? If you use a more modern interpolation style, certain things become easier. – DSM Aug 21 '13 at 17:50
  • @DSM I might be able to control it. What format would be easier? – nathancahill Aug 21 '13 at 17:51
  • By control, I mean the ```%(symbol)s``` part. The slashes aren't changeable. – nathancahill Aug 21 '13 at 17:52
  • @nathancahill: well, if it were '/stock/{symbol}/{property}', I mean. Then you could use `string.Formatter` to extract the names without regex, see [here](http://stackoverflow.com/a/14061832/487339). – DSM Aug 21 '13 at 17:55
  • The names, yes, but I still think a solution like @Ashwini's would be needed to extract the name-value pairs. – nathancahill Aug 21 '13 at 17:59
  • @torek Yes, that's the case, thanks for pointing that out. – nathancahill Aug 21 '13 at 18:04

3 Answers3

4

A solution with regular expressions:

>>> import re
>>> b = '/stock/AAPL/price'
>>> result = re.match('/.*?/(?P<symbol>.*?)/(?P<property>.*)', b)
>>> result.groupdict()
{'symbol': 'AAPL', 'property': 'price'}

You can adjust a bit more the regular expression but, in essence, this is the idea.

moliware
  • 10,160
  • 3
  • 37
  • 47
2

Assuming well-behaved input, you could just split the strings and zip them to a dict

keys = ('symbol', 'property')
b = '/stock/AAPL/price'
dict(zip(keys, b.split('/')[2:4]))
tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • I came up with the letter-for-letter same solution. `str.split` is almost always going to be many times more time-efficient than the `re`-based equivalent. – Kirk Strauser Aug 21 '13 at 17:53
  • @KirkStrauser - yeah, there's a hundred ways to parse strings, but I like the simple solutions. – tdelaney Aug 21 '13 at 17:55
  • As long as it's *always* slashes, and slashes don't appear in the output from some key(s). If the output might begin with, e.g., `/nyse/stock/` (vs say `/ftse/stock/` and just `/stock/`) sometimes, you'd need to adjust the indices too. In short, much depends on input constraints. – torek Aug 21 '13 at 18:10
  • @torek - agreed. As more details of the input are learned, the script could be updated. But split and the dict constructor are fast, so its a good start. – tdelaney Aug 21 '13 at 18:19
2

This is similar to @moliware's solution, but there's no hard-coding of keys required in this solution:

import re

class mydict(dict):
    def __missing__(self, key):
        self.setdefault(key, '')
        return ''

def solve(a, b):
    dic = mydict()
    a % dic
    strs = a
    for x in dic:
        esc = re.escape(x)
        strs = re.sub(r'(%\({}\).)'.format(esc), '(?P<{}>.*)'.format(esc), strs)
    return re.search(strs, b).groupdict()

if __name__ == '__main__':
    a = '/stock/%(symbol)s/%(property)s'
    b = '/stock/AAPL/price'
    print solve(a, b)
    a = "Foo %(bar)s spam %(eggs)s %(python)s"
    b = 'Foo BAR spam 10 3.x'
    print solve(a, b)

Output:

{'symbol': 'AAPL', 'property': 'price'}
{'python': '3.x', 'eggs': '10', 'bar': 'BAR'}

As @torek pointed out for cases with ambiguous output(no space between keys) the answer can be wrong here.

For eg.

a = 'leading/%(A)s%(B)s/trailing'
b = 'leading/helloworld/trailing'

Here looking at just b it's hard to tell the actual value of either either A or B.

Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • Note: you'll need `dic=mydict()` and I get `'/stock/None/None'` as the value in the call used to populate `dic`. I'd just do `dic = collections.defaultdict(str)` though. (Oops, you fixed the missing `dic=` part while I was typing the comment.) – torek Aug 21 '13 at 17:18
  • 1
    BTW this works really well (it's the way to go here) but there are indistinguishable variations for which this just picks "any solution that works", e.g., `a = 'leading/%(A)s%(B)s/trailing'` and `b = 'leading/helloworld/trailing'`. This chooses `A='helloworld'` and `B=''`. (And if string `b` is cannot be generated by format `a` regardless of dictionary values, the `re.search()` returns None.) – torek Aug 21 '13 at 17:27
  • @torek Good test case, I think this one can be considered ambiguous too because `B` can be either `''` or `'helloworld'` or `a` can be `''` or `'helloworld'`.(So a space or some other character is required between two keys to get correct answer). Another issues is returning a `str` for missing keys would raise error for %d or other directives, I am not sure how to fix that. – Ashwini Chaudhary Aug 21 '13 at 17:34
  • @torek I think I could use a couple of try-except blocks to catch those type mismatch errors and pass some other default value. – Ashwini Chaudhary Aug 21 '13 at 17:47
  • The last example won't be a problem, there will always be some sort of delimiter between keys. – nathancahill Aug 21 '13 at 17:54