Couldn't find the right Regex code to extract the exact numbers

Question

I have extracted an string about 64 bit steam ID's and friendlist using web scraping. I want to get the unique steamid's so that I can store them on a different file. I used regex, but I think I have a mistake in the the notation part.

This is the string.

{"friendslist":{"friends":[{"steamid":"7656xxxxxxx80x76","relationship":"friend","friend_since":1552765824},{"steamid":"76561xxxxxxx4xx89","relationship":"friend","friend_since":1508594830},{"steamid":"765xxxxxxxxxxx3194","relationship":"friend","friend_since":1543773569}]}}

I used regex as this:

import re
re.findall("[^:[0-9]+[0-9]+", soup.text)

However, I got this result:

['"7656xxxxxxx80x76',
'"76561xxxxxxx4xx89',
'"765xxxxxxxxxxx3194']

How am I going to get rid of the ditto marks (") at the beginning of the numbers?

remove it - `'"765xxxxxxxxxxx3194'.replace('"', '')` - or slice it - `'"765xxxxxxxxxxx3194'[1:]` — furas, Sep 03 '19 at 12:07

score 1 · Accepted Answer · answered Sep 03 '19 at 11:48

1

You have JSON string so use module json

import json

text = '{"friendslist":{"friends":[{"steamid":"7656xxxxxxx80x76","relationship":"friend","friend_since":1552765824},{"steamid":"76561xxxxxxx4xx89","relationship":"friend","friend_since":1508594830},{"steamid":"765xxxxxxxxxxx3194","relationship":"friend","friend_since":1543773569}]}}'

data = json.loads(text)

for friend in data["friendslist"]['friends']:
    print(friend['steamid'])

Result:

7656xxxxxxx80x76
76561xxxxxxx4xx89
765xxxxxxxxxxx3194

answered Sep 03 '19 at 11:48

furas

134,197
12
106
148

I guess I named the string wrong when I was asking. Thanks! Just from curiosity, can I still use `re.findall()` to catch the numbers? Oh, @Kellen already answered it! – EEylul Sep 03 '19 at 11:54
you can use `re.findall()` only with `text` but `data` is not string. `data` is dict with dict with list of dicts. You can use `re.findall()` only with strings in last dict but it will need `for`-loop like in answer. – furas Sep 03 '19 at 12:06

score 0 · Answer 2 · answered Sep 03 '19 at 11:44

I have made a recursive function which takes data and key then make a list of results:

data = {"friendslist":{"friends":[{"steamid":"7656xxxxxxx80x76","relationship":"friend","friend_since":1552765824},{"steamid":"76561xxxxxxx4xx89","relationship":"friend","friend_since":1508594830},{"steamid":"765xxxxxxxxxxx3194","relationship":"friend","friend_since":1543773569}]}}
def getDataFromNestedDict(data, dictKey):
    if isinstance(data, dict):
        if dictKey in data.keys():
            steamDataList.append(data[dictKey])
        for key, value in data.items():
            if isinstance(value, dict):
                getDataFromNestedDict(value, dictKey)
            elif isinstance(value, list):
                for item in value:
                    getDataFromNestedDict(item,dictKey)

    elif isinstance(data, list):
        for item in data:
            getDataFromNestedDict(item,dictKey)
steamDataList = []
getDataFromNestedDict(data, 'steamid')
print(steamDataList)

output:

['7656xxxxxxx80x76', '76561xxxxxxx4xx89', '765xxxxxxxxxxx3194']

score 0 · Answer 3 · answered Sep 03 '19 at 11:51

0

The regex you're providing isn't doing what you expect. The first [ is matching with the first ].

Using lookahead/behind to find the double quotes:

(?<=\")(\d+[x\d]+\d)(?=\")

@Furas is right, though. You should just be parsing the JSON instead.

answered Sep 03 '19 at 11:51

Kellen

581
5
18

score 0 · Answer 4 · answered Sep 03 '19 at 11:56

0

I recommend you follow the answer of @furas (use json parser).

But if you really want to use Regex: [^ ["]+[0-9]+[0-9]+

answered Sep 03 '19 at 11:56

ecavard

151
2
10

Couldn't find the right Regex code to extract the exact numbers

4 Answers4