1

Suppose we have a .csv file like below:

Country,num1,num2,remarks
USA, 1, 1, string 1
USA, 1, 2, "string 2, string 3, string 1"

I need to split each line for a Map-Reduce Task. The "problem" as you can see is that, if there are commas at remarks field, the provider of the file insert double quotes around the string (i can see the double quotes when i open the file with a text editor). Is there any way to split up the remarks field ?

My final purpose is to create keys with values like below:

USA, string 1
USA, string 2
USA, string 3
USA, string 1

Assuming that i have a variable called line which contains the whole line string, i've tried something like that:

String [] temp;
temp = line.split(",");

but in this case the temp[3] has the value of string 2 and not the value

string 2, string 3, string 1
Dimitris
  • 133
  • 2
  • 13
  • 3
    Use a proper CSV reader and `String.split(",")` the remarks field? – Rob Audenaerde May 02 '20 at 17:49
  • 1
    Splitting on `,` is the poorsman CSV parser, bound to fail for complex data. Use an existing CSV parsing library. It's not worth re-inventing the wheel for standard data-interchange formats. – plalx May 02 '20 at 18:12
  • You may choose to open your CSV file with a spreadsheet editor and save it as a tab delimited file and then use the tab as your separator instead of the comma – Trevor May 02 '20 at 18:57

1 Answers1

1

After (longhour) search i found a similar problem here that helps.

In Practice this regex should be used :

String[] tokens = line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", -1);

in order to consider the "string 2, string 3, string 1" as a specific field.

Thanks community!

Dimitris
  • 133
  • 2
  • 13