With the csv crate and the latest Rust version 1.31.0, I would want to read CSV files with ANSI (Windows 1252) encoding, as easily as in UTF-8.
Things I have tried (with no luck), after reading the whole file in a Vec<u8>
:
Indeed, in my company, we have a lot of CSV files, ANSI encoded.
Also, I would want, if possible, not to load the entire file in a Vec<u8>
but a reading line by line (CRLF
ending), as many of the files are big (50 Mb or more…).
In the file Cargo.toml, I have this dependency:
[dependencies]
csv = "1"
test.csv consist of the following content saved as Windows-1252 encoding:
Café;au;lait
Café;au;lait
The code in main.rs file:
extern crate csv;
use std::error::Error;
use std::fs::File;
use std::io::BufReader;
use std::path::Path;
use std::process;
fn example() -> Result<(), Box<Error>> {
let file_name = r"test.csv";
let file_handle = File::open(Path::new(file_name))?;
let reader = BufReader::new(file_handle);
let mut rdr = csv::ReaderBuilder::new()
.delimiter(b';')
.from_reader(reader);
// println!("ANSI");
// for result in rdr.byte_records() {
// let record = result?;
// println!("{:?}", record);
// }
println!("UTF-8");
for result in rdr.records() {
let record = result?;
println!("{:?}", record);
}
Ok(())
}
fn main() {
if let Err(err) = example() {
println!("error running example: {}", err);
process::exit(1);
}
}
The output is:
UTF-8
error running example: CSV parse error: record 0 (line 1, field: 0, byte: 0): invalid utf-8: invalid UTF-8 in field 0 near byte index 3
error: process didn't exit successfully: `target\debug\test-csv.exe` (exit code: 1)
When using rdr.byte_records()
(uncommenting the relevant part of code), the output is:
ANSI
ByteRecord(["Caf\\xe9", "au", "lait"])