-2

I want the email addresses in a field from a line of CSV data, but the list contains commas.

Therefore I split the contents within the field as well, which means I can't control the contents of this field because I only know how to do this with the split command.

Example data:

12/01/2017, billybob123, billybob@bobthebomb.com, roxy@roxmysox.com, joey@rosytosy.com, AB, tom@tomsticles.com, \\123\abc
    # Open file for read

    while ( my $fileLine = <READ> ) {
        chomp $fileLine;
        my @row = (split ',', $fileLine);
        print $fileLine[3]\n\n";
      }

I use $fileLine[3]. The result I want is

roxy@roxmysox.com, joey@rosytosy.com

but I get

roxy@roxmysox.com

The number of comma-delimited email addresses inside this field is dynamic.

Borodin
  • 126,100
  • 9
  • 70
  • 144
fexlneb
  • 27
  • 6
  • 1
    How can you do that? If you have `A,B,C,D` how can any tool know that you need fields `A` and `B,C` and `D`? Unless there is _some_ criterion for what a "field" is. – zdim Dec 01 '17 at 19:11
  • 1
    USE A STANDARD CSV PARSING LIBRARY!!! Seriously, we get this question several times a week. Parsing CSV is not trivial, and the code has been written and tested. Do yourself a favor and look in cpan for a CSV library and use that. It will handle all the corner cases. For your specific case it looks like the CSV is broken. If a field contains commas it MUST be enclosed in quotes, or there will always be ambiguity. – Jim Garrison Dec 01 '17 at 19:13
  • 1
    If you have control over the input, enclose each field in `"`. Then it's easy. See for instance [this post](https://stackoverflow.com/q/47544385/4653379), of a few days ago. – zdim Dec 01 '17 at 19:13
  • I recommend having the database (mysql? it is its default) output the file with TAB as separators... This simplifies things a lot. – Olivier Dulac Dec 01 '17 at 19:16
  • CSV data fields should be separated by a single comma, not a comma and a space – Borodin Dec 01 '17 at 20:32

1 Answers1

3

First, your CSV input is broken. Fields containing commas must be enclosed in quotes. If you can't change the input format, the I suggest the following approach:

  1. Use a standard CSV parser. This will give you each email in a separate field.
  2. For the email field(s), start at the proper index (the third field in your example), and accumulate the field values as long as the field "looks" like an email address. Email address regexes are easily findable on the web.
  3. When you encounter a field that does not look like an email, verify that it looks the way you expect and then take that field and the following ones as if they were the fourth and subsequent fields regardless of their actual index positions.
Jim Garrison
  • 85,615
  • 20
  • 155
  • 190