1

I have a json file like this:

{
    "title": "Pilot",
    "image": [
        {
            "resource": "http://images2.nokk.nocookie.net/__cb20110227141960/notr/images/8/8b/pilot.jpg",
            "description": "not yet implemented"
        }
    ],
    "content": "<p>The pilot ...</p>"
},
{
    "title": "Special Christmas (Part 1)",
    "image": [
        {
            "resource": "http://images1.nat.nocookie.net/__cb20090519172121/obli/images/e/ed/SpecialChristmas.jpg",
            "description": "not yet implemented"
        }
    ],
    "content": "<p>Last comment...</p>"
}

I need to replace the content from all the resource values in the file, so if a string has this format:

"http://images1.nat.nocookie.net/__cb20090519172121/obli/images/e/ed/SpecialChristmas.jpg"

the result should be:

"../img/SpecialChristmas.jpg"

Could someone tell me how to match that pattern in order to modify the file?

I tried something like this recommendation:

https://stackoverflow.com/a/4128192/521728

but I don't know how to adapt it to my situation.

Thanks in advance!

Community
  • 1
  • 1
Boel
  • 917
  • 2
  • 11
  • 23
  • Are there any non-image resources, or are they all going to be images of the form `"../img/*"`? – Ben S. Oct 11 '13 at 00:02
  • Is the file so big that it's prohibitive to just `json.load` it, treat it as a dict, and then `json.dump` it? – kojiro Oct 11 '13 at 00:17

3 Answers3

1

If they're all going to be images in "../img", I believe that you can do it like this:

resourceVal = "http://images1.nat.nocookie.net/__cb20090519172121/obli/images/e/ed/SpecialChristmas.jpg"
lastSlash = resourceVal.rfind('/')
result = "../img" + resourceVal[lastSlash:]

If there are other kinds of resources, this might be a little more complicated - let me know and I will try to edit this answer to help.

Ben S.
  • 1,133
  • 7
  • 7
1

Here's my answer, not quite as succinct, but you can adjust the regular expression used in the re.search(".jpg",line) line to any regex you want.

import re

with open("new.json", "wt") as out:
for line in open("test.json"):
    match = re.search(".jpg",line)
    if match:
      sp_str = line.split("/")
      new_line = '\t"resource":' + '"../img/'+sp_str[-1]
      out.write(new_line)

    else:
      out.write(line)
emhart
  • 804
  • 5
  • 9
1

I'd use regex with groups:

from StringIO import StringIO    
import re

reader = StringIO("""{
    "title": "Pilot",
    "image": [
        {
            "resource": "http://images2.nokk.nocookie.net/__cb20110227141960/notr/images/8/8b/pilot.jpg",
            "description": "not yet implemented"
        }
    ],
    "content": "<p>The pilot ...</p>"
},
{
    "title": "Special Christmas (Part 1)",
    "image": [
        {
            "resource": "http://images1.nat.nocookie.net/__cb20090519172121/obli/images/e/ed/SpecialChristmas.jpg",
            "description": "not yet implemented"
        }
    ],
    "content": "<p>Last comment...</p>"
}""")

# to open a file just use reader = open(filename)

text = reader.read()
pattern = r'"resource": ".+/(.+).jpg"'
replacement = '"resource": "../img/\g<1>.jpg"'
text = re.sub(pattern, replacement, text)

print(text)

To explain the pattern. "resource": ".+/(.+)?.jpg" : Look for any text starting with "resource": " that then has one or more characters before a forward slash then has one or more characters before .jpg". The brackets () mean I want what is found inside as a group. As I only have one set of brackets I can access that in my replacement with '\g<1>'. (note that '\g<0>' would match the whole string : '"resources": etc'`)

rtrwalker
  • 1,021
  • 6
  • 13