I'm building a web scraper with node.js + puppeteer + mongoose. I'm getting the data from the web page and I'm able to save it to the database. Next step is to be able to check if the document already exists in the database. Been searching and trying many approaches without succeeding. Here is the part of my code what saves the data to the db:
try {
const newCar = new Car({
make: make,
model: model,
year: year,
km: km,
price: price
});
let saveCar = await newCar.save();
console.log(saveCar);
console.log('car saved!');
} catch (err) {
console.log('err' + err);
}
In my Schema, I've added the timestamps option:
const mongoose = require('mongoose');
const Schema = mongoose.Schema;
const carSchema = new Schema({
make: {
type: String
},
model: {
type: String
},
year: {
type: String
},
km: {
type: String
},
price: String
}, {timestamps: true });
module.exports = mongoose.model('Car', carSchema);
So I hope someone could push me in the right direction with this. Is there a way to use the createdAt timestamp to check if document already is in the database and skip that when scraping?
EDIT. I've been trying to solve this using that hash. This is my code:
const hash = md5(assetsUrl);
const existingCar = Car.find({
'hash': { $exists: true }
});
if (!existingCar) {
try {
const newCar = new Car({
make: make,
model: model,
year: year,
km: kmInt,
price: priceInt,
currency: currencyString,
carUrl: carUrl,
imageUrl: imageUrls,
hash: hash
});
let saveCar = await newCar.save();
console.log(saveCar);
console.log('car saved!');
} catch (err) {
console.log('err' + err);
}
} else {
console.log('car already in db');
}
This doesn't work, the code falls to the else block every time. What am I missing here?