-3

I have extracted an string about 64 bit steam ID's and friendlist using web scraping. I want to get the unique steamid's so that I can store them on a different file. I used regex, but I think I have a mistake in the the notation part.

This is the string.

{"friendslist":{"friends":[{"steamid":"7656xxxxxxx80x76","relationship":"friend","friend_since":1552765824},{"steamid":"76561xxxxxxx4xx89","relationship":"friend","friend_since":1508594830},{"steamid":"765xxxxxxxxxxx3194","relationship":"friend","friend_since":1543773569}]}}

I used regex as this:

import re
re.findall("[^:[0-9]+[0-9]+", soup.text)

However, I got this result:

['"7656xxxxxxx80x76',
'"76561xxxxxxx4xx89',
'"765xxxxxxxxxxx3194']

How am I going to get rid of the ditto marks (") at the beginning of the numbers?

EEylul
  • 1
  • 4

4 Answers4

1

You have JSON string so use module json

import json

text = '{"friendslist":{"friends":[{"steamid":"7656xxxxxxx80x76","relationship":"friend","friend_since":1552765824},{"steamid":"76561xxxxxxx4xx89","relationship":"friend","friend_since":1508594830},{"steamid":"765xxxxxxxxxxx3194","relationship":"friend","friend_since":1543773569}]}}'

data = json.loads(text)

for friend in data["friendslist"]['friends']:
    print(friend['steamid'])

Result:

7656xxxxxxx80x76
76561xxxxxxx4xx89
765xxxxxxxxxxx3194
furas
  • 134,197
  • 12
  • 106
  • 148
  • I guess I named the string wrong when I was asking. Thanks! Just from curiosity, can I still use `re.findall()` to catch the numbers? Oh, @Kellen already answered it! – EEylul Sep 03 '19 at 11:54
  • you can use `re.findall()` only with `text` but `data` is not string. `data` is dict with dict with list of dicts. You can use `re.findall()` only with strings in last dict but it will need `for`-loop like in answer. – furas Sep 03 '19 at 12:06
0

I have made a recursive function which takes data and key then make a list of results:

data = {"friendslist":{"friends":[{"steamid":"7656xxxxxxx80x76","relationship":"friend","friend_since":1552765824},{"steamid":"76561xxxxxxx4xx89","relationship":"friend","friend_since":1508594830},{"steamid":"765xxxxxxxxxxx3194","relationship":"friend","friend_since":1543773569}]}}
def getDataFromNestedDict(data, dictKey):
    if isinstance(data, dict):
        if dictKey in data.keys():
            steamDataList.append(data[dictKey])
        for key, value in data.items():
            if isinstance(value, dict):
                getDataFromNestedDict(value, dictKey)
            elif isinstance(value, list):
                for item in value:
                    getDataFromNestedDict(item,dictKey)

    elif isinstance(data, list):
        for item in data:
            getDataFromNestedDict(item,dictKey)
steamDataList = []
getDataFromNestedDict(data, 'steamid')
print(steamDataList)

output:

['7656xxxxxxx80x76', '76561xxxxxxx4xx89', '765xxxxxxxxxxx3194']
SM Abu Taher Asif
  • 2,221
  • 1
  • 12
  • 14
0

The regex you're providing isn't doing what you expect. The first [ is matching with the first ].

Using lookahead/behind to find the double quotes:

(?<=\")(\d+[x\d]+\d)(?=\")

@Furas is right, though. You should just be parsing the JSON instead.

Kellen
  • 581
  • 5
  • 18
0

I recommend you follow the answer of @furas (use json parser).

But if you really want to use Regex: [^ ["]+[0-9]+[0-9]+

ecavard
  • 151
  • 2
  • 10