I am relatively new to Mongodb
and I'd like to use R to learn to insert data to it. For learning, I created a small dataset of states and it's zipcodes in the following way:
{
"_id": {
"$oid": "6415b4a15b8b78d4a80cd861"
},
"slug": "alabama",
"name": "Alabama",
"abbr": "AL",
"capital_city": "Montgomery",
"largest_city": "Huntsville",
"established": "Dec 14, 1819",
"population": "5,024,279",
"total_area_miles": "52,420",
"total_area_km": "135,767",
"land_area_miles": "50,645",
"land_area_km": "131,171",
"water_area_miles": "1,775",
"water_area_km": "4,597",
"number_of_representatives": "7",
"date_update": {
"$date": "2023-03-18T12:54:57.771Z"
},
"zipcode_url": "https://www.zipdatamaps.com/en/us/zip-list/state/zip-codes-in-alabama",
"zipcodes": [
{
"zipcode": "35004",
"city": "Moody",
"county": "Saint Clair County",
"type": "Standard",
},
{
"zipcode": "35005",
"city": "Adamsville",
"county": "Jefferson County",
"type": "Standard",
}
]
}
In the dataset, I have a zipcodes array in which I inserted some data related to the zipcode. Now, I want to insert a dataframe to the zipcodes array as an array, but I want to insert this dataframe by year and month. How do I do this in R such that the dataframe that I want to insert to the zipcodes should first find the current year (and create if it doesn't exist), then find the current month (and create if it doesn't exist), and then insert the data but only if the id doesn't exist. So that if the program crashes or I re-run, it doesn't create a duplicate.
The final data should look something like this:
{
"_id": {
"$oid": "6415b4a15b8b78d4a80cd861"
},
"slug": "alabama",
"name": "Alabama",
"abbr": "AL",
"capital_city": "Montgomery",
"largest_city": "Huntsville",
"established": "Dec 14, 1819",
"population": "5,024,279",
"total_area_miles": "52,420",
"total_area_km": "135,767",
"land_area_miles": "50,645",
"land_area_km": "131,171",
"water_area_miles": "1,775",
"water_area_km": "4,597",
"number_of_representatives": "7",
"date_update": {
"$date": "2023-03-18T12:54:57.771Z"
},
"zipcode_url": "https://www.zipdatamaps.com/en/us/zip-list/state/zip-codes-in-alabama",
"zipcodes": [
{
"zipcode": "35004",
"city": "Moody",
"county": "Saint Clair County",
"type": "Standard",
"data": [
{
"2023": [
{
"03": [
{
"id": 1,
"name": "bar"
},
{
"id": 2,
"name": "cat"
},
{
"id": 3,
"name": "city"
}
]
}
]
}
]
},
{
"zipcode": "35005",
"city": "Adamsville",
"county": "Jefferson County",
"type": "Standard",
"data": [
{
"2023": [
{
"03": [
{
"id": 1,
"name": "apple"
},
{
"id": 2,
"name": "banana"
},
{
"id": 3,
"name": "foo"
}
]
}
]
}
]
}
]
}
Of course, I'd welcome and appreciate it if you could suggest a better way to store the yearly data too.
And here's my R code:
conn <- get_db_conn()
sapply(nc_zipcodes[[1]]$url[1:1], function(x) {
## search the query and insert
results <- get_information(query=sprintf("https://www.example.com",x))
# I don't know what to do here. There are so many things I found on google but none worked for me.
????
## store the results to the mongodb database
Sys.sleep(ceiling(runif(1, min=5, max=30)))
})