0

I have a json file that has repetive parts and I'm trying to write a script to remove a certain block of text from multiple files. A Python script would be the most preferred otherwise from my searching sed can work too though I know nothing about it. Here is a sample of the format of my json file:

    {
      "Animal": {
        "Type_species": "Reptile"
      },
      "FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
      "Description": "Most are cold blooded."
    },
    {
      "Animal": {
        "Type_species": "Mammal"
      },
      "FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
      "Description": "There Are Approximately 5,000 Mammal Species."
    },
    {
      "Animal": {
        "Type_species": "Amphibian"
      },
      "FindMe": "https://en.wikipedia.org/wiki/Amphibian",
      "Description": "Most amphibians have thin, moist skin that helps them to breathe"
    },
  1. How would I remove the following from the json file?
    {
      "Animal": {
        "Type_species": "Mammal"
      },
      "FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
      "Description": "There Are Approximately 5,000 Mammal Species."
    },

My other question is, 2. How do I adapt the script to account for different "FindMe" Urls across multiple files? For example a second file would have the below and so on for multiple files?

    {
      "Animal": {
        "Type_species": "Mammal"
      },
      "FindMe": "https://kids.nationalgeographic.com/animals/mammals/facts/arctic-fox",
      "Description": "There Are Approximately 5,000 Mammal Species."
    },

I think using a regex can help but I'm having trouble understanding them and implementing them within a script.

Any help is appreciated, thank you.

Update: I would like the end result to look like this:

    {
      "Animal": {
        "Type_species": "Reptile"
      },
      "FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
      "Description": "Most are cold blooded."
    },
    {
      "Animal": {
        "Type_species": "Amphibian"
      },
      "FindMe": "https://en.wikipedia.org/wiki/Amphibian",
      "Description": "Most amphibians have thin, moist skin that helps them to breathe"
    },
A. H.
  • 3
  • 5

2 Answers2

0

Assuming your complete JSON contains a list of dictionaries (which your sample suggests) then:

JSON = {"data": [{
    "Animal": {
        "Type_species": "Reptile"
    },
    "FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
    "Description": "Most are cold blooded."
},
    {
    "Animal": {
        "Type_species": "Mammal"
    },
    "FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
    "Description": "There Are Approximately 5,000 Mammal Species."
},
    {
    "Animal": {
        "Type_species": "Amphibian"
    },
    "FindMe": "https://en.wikipedia.org/wiki/Amphibian",
    "Description": "Most amphibians have thin, moist skin that helps them to breathe"
}]}

JSON['data'] = [d for d in JSON['data'] if d['Animal']['Type_species'] != 'Mammal']

print(JSON)
  • Can you explain the syntax and what is being done? I'd like to understand it better. – A. H. Nov 30 '21 at 16:26
  • I recommend section [5.1.3](https://docs.python.org/3.9/tutorial/datastructures.html) –  Dec 01 '21 at 07:36
  • Thanks DarkKnight. How do I keep the output/print the output in the structured json format it was previously in? I need to read in a formatted json file, remove the text and then save the file with the removed text. – A. H. Dec 01 '21 at 16:28
  • Check out [this answer](https://stackoverflow.com/questions/12309269/how-do-i-write-json-data-to-a-file) –  Dec 01 '21 at 18:02
0

This might work for you (GNU sed):

sed '/^\s*{/{:a;N;/^\(\s*\){.*\n\1},/!ba;/"Type_species": "Mammal"/d}' file

Gather up details for each animal and remove animal if it contains "Type_species": "Mammal".

potong
  • 55,640
  • 6
  • 51
  • 83