0

I am relatively new to Mongodb and I'd like to use R to learn to insert data to it. For learning, I created a small dataset of states and it's zipcodes in the following way:

{
  "_id": {
    "$oid": "6415b4a15b8b78d4a80cd861"
  },
  "slug": "alabama",
  "name": "Alabama",
  "abbr": "AL",
  "capital_city": "Montgomery",
  "largest_city": "Huntsville",
  "established": "Dec 14, 1819",
  "population": "5,024,279",
  "total_area_miles": "52,420",
  "total_area_km": "135,767",
  "land_area_miles": "50,645",
  "land_area_km": "131,171",
  "water_area_miles": "1,775",
  "water_area_km": "4,597",
  "number_of_representatives": "7",
  "date_update": {
    "$date": "2023-03-18T12:54:57.771Z"
  },
  "zipcode_url": "https://www.zipdatamaps.com/en/us/zip-list/state/zip-codes-in-alabama",
  "zipcodes": [
    {
      "zipcode": "35004",
      "city": "Moody",
      "county": "Saint Clair County",
      "type": "Standard",
    },
    {
      "zipcode": "35005",
      "city": "Adamsville",
      "county": "Jefferson County",
      "type": "Standard",
    }
    ]
  }

In the dataset, I have a zipcodes array in which I inserted some data related to the zipcode. Now, I want to insert a dataframe to the zipcodes array as an array, but I want to insert this dataframe by year and month. How do I do this in R such that the dataframe that I want to insert to the zipcodes should first find the current year (and create if it doesn't exist), then find the current month (and create if it doesn't exist), and then insert the data but only if the id doesn't exist. So that if the program crashes or I re-run, it doesn't create a duplicate.

The final data should look something like this:

{
  "_id": {
    "$oid": "6415b4a15b8b78d4a80cd861"
  },
  "slug": "alabama",
  "name": "Alabama",
  "abbr": "AL",
  "capital_city": "Montgomery",
  "largest_city": "Huntsville",
  "established": "Dec 14, 1819",
  "population": "5,024,279",
  "total_area_miles": "52,420",
  "total_area_km": "135,767",
  "land_area_miles": "50,645",
  "land_area_km": "131,171",
  "water_area_miles": "1,775",
  "water_area_km": "4,597",
  "number_of_representatives": "7",
  "date_update": {
    "$date": "2023-03-18T12:54:57.771Z"
  },
  "zipcode_url": "https://www.zipdatamaps.com/en/us/zip-list/state/zip-codes-in-alabama",
  "zipcodes": [
    {
      "zipcode": "35004",
      "city": "Moody",
      "county": "Saint Clair County",
      "type": "Standard",
      "data": [
        {
          "2023": [
            {
              "03": [
                {
                  "id": 1,
                  "name": "bar"
                },
                {
                  "id": 2,
                  "name": "cat"
                },
                {
                  "id": 3,
                  "name": "city"
                }
              ]
            }
          ]
        }
      ]
    },
    {
      "zipcode": "35005",
      "city": "Adamsville",
      "county": "Jefferson County",
      "type": "Standard",
      "data": [
        {
          "2023": [
            {
              "03": [
                {
                  "id": 1,
                  "name": "apple"
                },
                {
                  "id": 2,
                  "name": "banana"
                },
                {
                  "id": 3,
                  "name": "foo"
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Of course, I'd welcome and appreciate it if you could suggest a better way to store the yearly data too.

And here's my R code:

conn <- get_db_conn()

sapply(nc_zipcodes[[1]]$url[1:1], function(x) {
  ## search the query and insert 
  results <- get_information(query=sprintf("https://www.example.com",x))
  
  # I don't know what to do here. There are so many things I found on google but none worked for me.
  ????
  
  ## store the results to the mongodb database
  Sys.sleep(ceiling(runif(1, min=5, max=30)))
  
})
 
user1828605
  • 1,723
  • 1
  • 24
  • 63
  • Which library do you want to use to connect to Mongo, e.g. mongolite, RMongo or rmongodb? – Andre Wildberg Mar 18 '23 at 14:15
  • I'm using `mongolite` – user1828605 Mar 18 '23 at 14:28
  • Kinda hard to pin down a single concise answer. I think a good start is described here https://stackoverflow.com/questions/52673292/use-rs-mongolite-to-correctly-insert-update-add-data-to-existing-collection which tackles your ID question. – Andre Wildberg Mar 20 '23 at 19:04
  • @AndreWildberg Thanks for the suggestion. So, based on the answer in the SO you linked, there is no way to update or insert the data directly to an array in the collection? So, I'll have to create a loop? – user1828605 Apr 05 '23 at 13:30

0 Answers0