2

I'm trying to extract a string from a JSON response using regex in Python, but with no success.

{"ao":["jskl|_xx2|020|b503414ff19853ce357413fafe7c612a0b6b0ba3f592f9b551bdc8d0dbdbbd34:J26U1IfsvZ0kiJwLm3xoVhZNN/Xr+Z2gRkJA===|true|900"]}

I'm trying to get

b503414ff19853ce357413fafe7c612a0b6b0ba3f592f9b551bdc8d0dbdbbd34:J26U1IfsvZ0kiJwLm3xoVhZNN/Xr+Z2gRkJA=== 

from the string. However, the | in the string won't allow me to use the methods I have seen on Stack Overflow because it keeps missing the |. I would appreciate any help.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
sakow0
  • 33
  • 6
  • The `|` is a special character in the regex language, so it needs to be [escaped](https://stackoverflow.com/questions/4202538/escape-regex-special-characters-in-a-python-string) before it can be matched. – metatoaster May 13 '19 at 00:46
  • Welcome to SO! It's helpful to post your code attempt so we can help guide you. Otherwise, have you tried `your_json_dict_name["ao"][0].split("|")[3]`? – ggorlen May 13 '19 at 00:47
  • @ggorlen hello, this worked perfectly and that without regex, didnt know it was possible like this thank you very much ! – sakow0 May 13 '19 at 00:51

3 Answers3

1

There's no need to reinvent json.loads() with regex. Parse your JSON string to a dictionary with json.loads() and access the string you're interested in by indexing into the dictionary. Once you've extracted the string, split on the pipe character and access the third index of the list:

your_json_dict_name["ao"][0].split("|")[3]

Here's a full example:

import json

raw_json_str = r'{"ao":["jskl|_xx2|020|b503414ff19853ce357413fafe7c612a0b6b0ba3f592f9b551bdc8d0dbdbbd34:J26U1IfsvZ0kiJwLm3xoVhZNN/Xr+Z2gRkJA===|true|900"]}'
json_dict = json.loads(raw_json_str)

print(json_dict["ao"][0].split("|")[3])

Output:

b503414ff19853ce357413fafe7c612a0b6b0ba3f592f9b551bdc8d0dbdbbd34:J26U1IfsvZ0kiJwLm3xoVhZNN/Xr+Z2gRkJA===
ggorlen
  • 44,755
  • 7
  • 76
  • 106
0

here it is, you can escape special characters inside character class:

import re
text = '{"ao":["jskl|_xx2|020|b503414ff19853ce357413fafe7c612a0b6b0ba3f592f9b551bdc8d0dbdbbd34:J26U1IfsvZ0kiJwLm3xoVhZNN/Xr+Z2gRkJA===|true|900"]}'


match = re.search(r'[|]b.*===[|]', text).group()[1:-1]
print(match)

output:

b503414ff19853ce357413fafe7c612a0b6b0ba3f592f9b551bdc8d0dbdbbd34:J26U1IfsvZ0kiJwLm3xoVhZNN/Xr+Z2gRkJA===
Mahmoud Elshahat
  • 1,873
  • 10
  • 24
0

Ok so, for starters i dont quite understand why are you not using json.loads on this string, so you could refer to this json as a map and go to "ao" key, and use the regex on the strings inside the array.

But putting that aside, if you still with to extract the data from the json as string, you could use regex groups and some escaping ("\") on the "|" character.

Which would look somwthing like this :

.?[\"(.?\|){3}(.?)\|.

Then you can access group 2 and get your desired result That assumming the json looks the same always

If your array on "ao" property has more than 1 string, this wont get the second value. Therefore i wouls suggest to transform this string into map before hand, and then loop every string on its own.

Good luck

Eden Eliel
  • 66
  • 7