How to convert an string built up of a list of mongo ObjectIds into a python list of just ids

Question

I have a dataframe with a column containing string representation of a list of ObjectIds. i.e:

"[ObjectId('5d28938629fe749c7c12b6e3'), ObjectId('5caf4522a30528e3458b4579')]"

And I want to convert it from string literal to a python list of just ids like:

['5d28938629fe749c7c12b6e3', '5caf4522a30528e3458b4579']

json.loads & ast.literal_eval both fails because string contains ObjectId

Oh right, I should have mentioned it. I fetch data from mongodb version 2.6 — Hossein Jazayeri, Jul 13 '19 at 20:51
In this case it's easier to supply an example dataframe, read more [here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Erfan, Jul 13 '19 at 20:52

frankegoesdown · Accepted Answer · 2019-07-13T21:19:01.047

I share this regex: https://regex101.com/r/m5rW2q/1

You can click on codegenerator for example:

import re

regex = r"ObjectId\('(\w+)'\)"

test_str = "[ObjectId('5d28938629fe749c7c12b6e3'), ObjectId('5caf4522a30528e3458b4579')]"

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

output:

Match 1 was found at 1-37: ObjectId('5d28938629fe749c7c12b6e3')
Group 1 found at 11-35: 5d28938629fe749c7c12b6e3
Match 2 was found at 39-75: ObjectId('5caf4522a30528e3458b4579')
Group 1 found at 49-73: 5caf4522a30528e3458b4579

for your example:

import re 
regex = r"ObjectId\('(\w+)'\)" 

test_str = "[ObjectId('5d28938629fe749c7c12b6e3'), ObjectId('5caf4522a30528e3458b4579')]" 

matches = re.finditer(regex, test_str, re.MULTILINE) 
[i.groups()[0] for i in matches]

output:

['5d28938629fe749c7c12b6e3', '5caf4522a30528e3458b4579']

And all about regex you can find here: https://docs.python.org/3/library/re.html

Thanks, `r"ObjectId\('(\w+)'\)"` works & I used `re.findall` instead to avoid loop — Hossein Jazayeri, Jul 22 '19 at 20:41

score 1 · Answer 2 · answered Jul 13 '19 at 21:00

1

Well, you can use replace

a = "[ObjectId('5d28938629fe749c7c12b6e3'), ObjectId('5caf4522a30528e3458b4579')]"
a.replace('ObjectId(', '').replace(")","")
#Output:
"['5d28938629fe749c7c12b6e3', '5caf4522a30528e3458b4579']"

answered Jul 13 '19 at 21:00

Francesco Zambolin

601
6
9

This is not safe because second replace `.replace(")","")` might malform data because I'm not guaranteed to only have strings with `ObjectId()` – Hossein Jazayeri Jul 13 '19 at 21:11

score 1 · Answer 3 · answered Jul 14 '19 at 01:54

1

Locate the rows; split at '; select items 1 and 3 from list:

my_df.loc[my_df["my_column"].str.contains("ObjectId"),"my_column"].str.split("'")[0][1:4:2]

Gives exactly a list of two elements:

['5d28938629fe749c7c12b6e3', '5caf4522a30528e3458b4579']

answered Jul 14 '19 at 01:54

Francesco Zambolin

601
6
9

How to convert an string built up of a list of mongo ObjectIds into a python list of just ids

3 Answers3