0

I'm trying to learn how to use regex in python to scrape recipe ingredients off of a website. Using regex101 I have r".recipeIngredient(.*)" which is matching just "recipeIngredient": [ And I'm trying to get it to match

      "recipeIngredient": [
      "1 ¾ cups HONEY MAID Graham Cracker Crumbs",      
      "⅓ cup butter, melted",
      "1 ¼ cups sugar, divided",
      "3 (8 ounce) packages PHILADELPHIA Cream Cheese, softened",
      "1 cup BREAKSTONE'S or KNUDSEN Sour Cream",       
      "2 teaspoons vanilla",
      "3 medium (blank)s eggs",
      "1 (21 ounce) can cherry pie filling"
    ]```

Is ther a way to set parameters with regex to match everything between two [] but only after the "recipeIngredients"? Or would it be better for me to write a for loop to establish those parameters?

        "cookTime": "P0DT0H0M",
        "totalTime": "P0DT6H25M",
        "recipeYield": "16 servings",
        "recipeIngredient": [
          "1 ¾ cups HONEY MAID Graham Cracker Crumbs",      
          "⅓ cup butter, melted",
          "1 ¼ cups sugar, divided",
          "3 (8 ounce) packages PHILADELPHIA Cream Cheese, softened",
          "1 cup BREAKSTONE'S or KNUDSEN Sour Cream",       
          "2 teaspoons vanilla",
          "3 medium (blank)s eggs",
          "1 (21 ounce) can cherry pie filling"
        ],
        "recipeInstructions": [
          {
            "@type": "HowToStep",
            "text": "Heat oven to 350 degrees F.\n"
          },
Selcuk
  • 57,004
  • 12
  • 102
  • 110
andy399
  • 35
  • 2
  • 5
    Why are you trying to parse JSON with regexes? Python has a `json` module built-in. [JSON is not a regular language](https://cstheory.stackexchange.com/a/4017), and trying to parse it with regex [invokes TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡](https://stackoverflow.com/a/1732454/364696). – ShadowRanger Sep 30 '20 at 00:56
  • Possible duplicate of [How to parse JSON in Python? - Stack Overflow](https://stackoverflow.com/questions/7771011/how-to-parse-json-in-python) – user202729 Sep 30 '20 at 00:59

1 Answers1

0

As mentioned before, it is highly recommended to load the JSON instead of using REGEX. But if you must use REGEX:

Using regex101 I have r".recipeIngredient(.*)" which is matching just "recipeIngredient": [

You can use the re.DOTALL flag if you want . to including newlines.

>>> text = """
... "recipeIngredient": [
...     "1¾ cups HONEY MAID Graham Cracker Crumbs",
...     "⅓ cup butter, melted",
...     "1¼ cups sugar, divided",
...     "3 (8 ounce) packages PHILADELPHIA Cream Cheese, softened",
...     "1 cup BREAKSTONE'S or KNUDSEN Sour Cream",
...     "2 teaspoons vanilla",
...     "3 medium (blank)s eggs",
...     "1 (21 ounce) can cherry pie filling"
... ]
"""

>>> re.search(".recipeIngredient(.*)", text, re.DOTALL).group()

Is there a way to set parameters with regex to match everything between two '['

This should work:

>>> re.search("\"recipeIngredient\" *: *\[(.*?)\]", text, re.DOTALL).group(1).strip()
jeremyr
  • 425
  • 4
  • 12