0

I have a comma separated CSV file contains NASDAQ symbols . I use Scanner to read a file

  s = new Scanner(new File("C:\\nasdaq_companylist.csv")).useDelimiter("\\s*,\\s*");    

I'm getting exception on second field .The problem is that this field , like some others fields in file contain commas too, for example "1-800 FLOWERS.COM, Inc.":

FLWS,"1-800 FLOWERS.COM, Inc.",2.8,76022800,n/a,1999,Consumer Services,Other Specialty Stores,http://www.nasdaq.com/symbol/flws    

How to avoid this problem ? My code is :

List<Stock> theList = new ArrayList<Stock>();
    StringBuilder sb = new StringBuilder();

    //get the title
    String title = s.nextLine();
    System.out.println("title: "+title);

    while (s.hasNext()) 
    {

        String symbol = s.next();
        String name = s.next();
        double lastSale = s.nextDouble();           
        long marketCap = s.nextLong();
        String adr =s.next();
        String ipoYear=s.next();
        String sector=s.next();
        String industry = s.next();
        String summaryQuote = s.next();
        theList.add(newStock(symbol,lastSale));} 

Thanks

Toren
  • 6,648
  • 12
  • 41
  • 62
  • 1
    possible duplicate of [Dealing with commas in a CSV file](http://stackoverflow.com/questions/769621/dealing-with-commas-in-a-csv-file) – dogbane Nov 07 '11 at 09:44
  • You may want to look at this question http://stackoverflow.com/questions/1757065/java-splitting-a-comma-separated-string-but-ignoring-commas-in-quotes – Bojan Dević Nov 07 '11 at 10:38

4 Answers4

3

Unless this is homework you should not parse CSV yourself. Use one of existing libraries. For example this one: http://commons.apache.org/sandbox/csv/

Or google "java csv parser" and choose another.

But if you wish to implement the logic yourself you should use negative lookahead feature of regular expressions (see http://download.oracle.com/javase/1,5.0/docs/api/java/util/regex/Pattern.html)

AlexR
  • 114,158
  • 16
  • 130
  • 208
1

Your safest bet is you use csv parsing library. Your comma is enclosed in quotes. You'd need to implement logic to look for quoted commas. However you'd also need to plan for other situations, like quote within a quote, escape sequences etc. Better use some ready for use and tested solution. Use google, you'll find some. CSV files can be tricky to use on your own.

MadWizard
  • 56
  • 2
1

As others have correctly pointed out, rolling your own csv parser is not a good idea as it will usually leave huge security holes in your system.

That said, I use this regex:

"((?:\"[^\"]*?\")*|[^\"][^,]*?)([,]|$)"

which does a good job with well-formed csv data. You will need to use a Pattern and a Matcher with it.

This is what it does:

/*
 ( - Field Group
   (?: - Non-capturing (because the outer group will do the capturing) consume of quoted strings
    \"  - Start with a quote
    [^\"]*? - Non-greedy match on anything that is not a quote
    \" - End with a quote
   )* - And repeat
  | - Or
   [^\"] - Not starting with a quote
   [^,]*? - Non-greedy match on anything that is not a comma
 ) - End field group
 ( - Separator group
  [,]|$ - Comma separator or end of line
 ) - End separator group 
*/

Note that it parses the data into two groups, the field and the separator. It also leaves the quote characters in the field, you may wish to remove them and replace "" with " etc.

OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213
  • Paul , thank you for sharing the regex . Since I saw your answer I started to work on my own regex. – Toren Nov 09 '11 at 21:14
0

I hope you can remove \ \ s * from your regular expression. Then have:

while (s.hasNext() {
    String symbol = s.next();
    if (symbol.startsWith("\"")) {
        while ((symbol.endsWith("\"") || symbol.length() == 1) && s.hasNext()) {
            symbol += "," + s.next();
        }
    }
...
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138