1

I'm parsing a Twitter Archive .csv and having some problems since the fields are separeted by commas but at the tweet field (text field) there's also commas. So I've tried this:

String line;
while ((line = br.readLine()) != null){ 
    String[] split = line.split("\\s*,\\s*");
    for (int i = 0; i < split.length; i++) {
        if (!(split[i] == null) || !(split[i].length() == 0)) {
            // adding some stuff
            line.split("\\s*\"\\s*");
            // adding some other stuff

The thing is, trying to change the way fields are splitted when it comes to the tweet field. But the second split doesn't work at all and tweets with commas are not added.

What should I do? Thanks a lot!

kadota
  • 33
  • 5
  • maybe you can split at every comma which is not inside 2 `"` – Aelop Dec 13 '16 at 11:49
  • where is the csv? – Jobin Dec 13 '16 at 11:50
  • 2
    Seriously: when you are dealing with "third party" CSV files (so, you dont control that content, and it can go to the "full range" of what CSV allows for) then do **not** try to write your own parser. Use existing tools that do that for you. Re-inventing this wheel is an absolute waste of time (and **much** harder than calling some splits). – GhostCat Dec 13 '16 at 11:52
  • @Aelop Getting a CSV parser right is **hard work**. There are many subtle corner cases one has to reason about; and a *robust* solution that works with "real" CSV files ... will need a lot of work. – GhostCat Dec 13 '16 at 11:53
  • @GhostCat I'll go for openCSV then. Thanks! – kadota Dec 13 '16 at 12:05

0 Answers0