1

Hi so I'm trying to extract moderator names from a simple piece of code like this:

{
  "_links": {},
  "chatter_count": 2,
  "chatters": {
    "moderators": [
      "nightbot",
      "vivbot"
    ],
    "staff": [],
    "admins": [],
    "global_mods": [],
    "viewers": []
  }
}

I've been trying to grab the moderators using \"moderators\":\s*[(\s*\"\w*\"\,)\s*] but to no success. I'm using regex over json parsing mostly for the challenge.

seiqooq
  • 11
  • 3
  • 3
    You have JSON that you can parse... Why use regex? – OneCricketeer Aug 20 '16 at 14:02
  • Possible duplicate of [Parse JSON in Python](http://stackoverflow.com/questions/7771011/parse-json-in-python) – OneCricketeer Aug 20 '16 at 14:05
  • Hi @cricket_007 , it's mostly for the challenge & practice. – seiqooq Aug 20 '16 at 14:05
  • 1
    I wouldn't practice regex on JSON. It's structure is well defined and better tools exist to get the data that you want – OneCricketeer Aug 20 '16 at 14:07
  • Regex is a poor choice for a non-regular language such as JSON – FujiApple Aug 20 '16 at 14:13
  • This doesn't seem like a coding question. There are plenty of online Python regex debuggers/testers/editors. You can play around with patterns and get immediate feedback - all while studying the docs. And as mentioned above regex isn't the right tool to parse json so whatever you learn may not be applicable next time you use it. – wwii Aug 20 '16 at 14:22
  • 1
    If you read this it will spoil your challenge - pattern: the word moderators followed by a colon, a space, and a left bracket, - then multiple characters that are NOT a right bracket - then a right bracket. You want to capture the multiple characters that are NOT a right bracket. – wwii Aug 20 '16 at 14:30
  • As stated, I am aware of the downfalls of using regex for this. – seiqooq Aug 20 '16 at 14:32

1 Answers1

1
moderators = list()
first = re.compile(r'moderators.*?\[([^\]]*)', re.I)
second = re.compile(r'"(.*?)"')

strings = first.findall(string)
for strings2 in strings:
  moderators = moderators + second.findall(strings2)

This should do the trick

The first regular expression extracts everything between 2 square braces. The second regular expression extracts the string from it.

I broke it up into 2 regex expressions for readability and ease of writing

NOW, using the json module, you could do something much easier:

import json
a = json.loads(string)
moderators = a['chatters']['moderators']
engineer14
  • 607
  • 4
  • 13
  • also, the `re` module does not support repeating capture groups, so there is no one single you could use to get the particular members of moderators - except if you want to write a regex where you manually repeat the capture group for however many members you think moderator could have. now, if you look up the regex module for python 3.x, it does support repeating capture groups. – engineer14 Aug 20 '16 at 23:34