Background
I'm currently trying to parse a .txt com zone file. It is structured like so
blahblah.com xx xx examplens.com
Currently my code is structured like so
extern crate regex;
use regex::Regex;
use rusqlite::{params, Connection, Result};
#[derive(Debug)]
struct Domain {
id: i32,
domain: String,
}
use std::io::stdin;
fn main() -> std::io::Result<()> {
let mut reader = my_reader::BufReader::open("data/com_practice.txt")?;
let mut buffer = String::new();
let conn: Connection = Connection::open("/Users/alex/Projects/domain_randomizer/data/domains.db").unwrap();
while let Some(line) = reader.read_line(&mut buffer) {
let regexed_i_think = rexeginton_the_domain_only(line?.trim());
println!("{:?}", regexed_i_think);
sqliting(regexed_i_think, &conn).unwrap();
}
let mut stmt = conn.prepare("SELECT id, domain FROM domains").unwrap();
let domain_iter = stmt.query_map(params![], |row| {
Ok(Domain {
id: row.get(0)?,
domain: row.get(1)?,
})
}).unwrap();
for domain in domain_iter {
println!("Found person {:?}", domain.unwrap());
}
Ok(())
}
pub fn sqliting(domain_name: &str, conn: &Connection) -> Result<()> {
let yeah = Domain {
id: 0,
domain: domain_name.to_string()
};
conn.execute(
"INSERT INTO domains (domain) VALUES (?1)",
params![yeah.domain],
)?;
Ok(())
}
mod my_reader {
use std::{
fs::File,
io::{self, prelude::*},
};
pub struct BufReader {
reader: io::BufReader<File>,
}
impl BufReader {
pub fn open(path: impl AsRef<std::path::Path>) -> io::Result<Self> {
let file = File::open(path)?;
let reader = io::BufReader::new(file);
Ok(Self { reader })
}
pub fn read_line<'buf>(
&mut self,
buffer: &'buf mut String,
) -> Option<io::Result<&'buf mut String>> {
buffer.clear();
self.reader
.read_line(buffer)
.map(|u| if u == 0 { None } else { Some(buffer) })
.transpose()
}
}
}
pub fn rexeginton_the_domain_only(full_line: &str) -> &str {
let regex_str = Regex::new(r"(?m).*?.com").unwrap();
let final_str = regex_str.captures(full_line).unwrap().get(0).unwrap().as_str();
return final_str;
}
Issue
So I am Parsing a single domain at a time, each time making an Insert. As I've gathered, Inserting would be far more efficient if I was making thousands of Inserts in a single transaction. However, I'm not quite sure what an efficient approach is to refactoring my parsing around this.
Question
How should I reorient my parsing process around my Insertion process? Also how can I actively gauge the speed and efficiency of the process in the first place so I can compare and contrast in an articulate manner?