Use a proper (and fast) CSV parser. With univocity-parsers the entire process should take some seconds.
First create a RowProcessor
that will receive each row parsed from the input, transform it and write the result to a given output.
public RowProcessor createProcessor(final File output){
CsvWriterSettings outputSettings = new CsvWriterSettings();
//configure the CSV writer - format and other settings.
//create a writer for the output you want with the given settings.
final CsvWriter writer = new CsvWriter(output, "UTF-8", outputSettings);
return new com.univocity.parsers.common.processor.RowProcessor(){
private Map<String, String> roleMap;
private Map<String, String> deptMap;
@Override
public void processStarted(ParsingContext context) {
roleMap = buildMapOfRoles();
deptMap = buildMapOfDepartments();
}
@Override
public void rowProcessed(String[] row, ParsingContext context) {
row[2] = roleMap.get(row[2]);
row[3] = deptMap.get(row[3]);
writer.writeRow(row);
}
@Override
public void processEnded(ParsingContext context) {
writer.close();
}
};
}
Then run the parser with this:
String encoding = "UTF-8";
File input = new File("/path/to/input.csv");
File output = new File("/path/to/output.csv");
RowProcessor processor = createProcessor(output, encoding);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setProcessor(processor);
//configure the parser settings as needed.
//then run the parser. It will submit all rows to the processor created above.
new CsvParser(parserSettings).parse(input, encoding);
All rows will be submitted to your processor
and write the transformed row directly to the output
Here is my amazing implementation of buildMapOfRoles
and buildMapOfDepartments
:
private Map<String, String> buildMapOfRoles(){
Map<String,String> out = new HashMap<>();
out.put("2", "Operator");
out.put("1", "Assistant");
return out;
}
private Map<String, String> buildMapOfDepartments(){
Map<String,String> out = new HashMap<>();
out.put("3", "Grinding");
out.put("5", "HR");
return out;
}
This will produce the exact output you expect. Hope this helps
Disclaimer: I'm the author of this library. It's open source and free (Apache 2.0 license)