-1

I am getting stackoverflow for validating a csv for large string field.

Regex:

(?![^\",][^,]*\")(\"(\"\"|[^\"])*\"|[^\",]*),[0-9]*

TargetString:

"The Nuvi 1450LMT is a portable global positioning system receiver from Garmin that offers a step-up from the company's standard 1450 and 1450T models. Including free lifetime map and traffic updates, this model can be updated once every three months to ensure to the most-up-to-date location information. A built-in FM signal transmitter can provide up-to-the-minute traffic information concerning accidents, construction and other forms of road blockage, providing users with sufficient time to select an alternate route. A back-lighted 5-inch touchscreen TFT display is included that provides clear visual instruction, complete with "Lane Assist" technology that provides virtual first-person instruction on precisely what lanes to use. Comprehensive "City Navigator" maps are included for Canada, the US and Mexico, with two and three-dimensional support and over 6 million user-selected points of interest. Pedestrian navigation is also fully supported on the 1450LMT, with the "CityXplorer" service offering bus, rail, tram and other public transportation information for a wide variety of major cities. Fuel-effecient routes can be determined with the "EcoRoute" mode, while "HotFix" predictive satellite technology helps to maintain the most accurate locational information even when signal is temporarily lost. Photo navigation is supported through Garmin's "Photo Connect" service, and additional car marker and narration voices can be downloaded via the "Garmin Garage" website. 

Features

  • 5-inch backlit TFT color touchscreen
  • Free lifetime traffic updates
  • Free maps
  • MicroSD card support
  • Voice prompts
  • Lane assist function
  • Auto Re-route
  • Route avoidance
  • FM traffic compatibility
  • EcoRoute routing
  • Custom Points Of Interest
  • Garmin garage car marker and voice customization",9

    Can someone help to optimize it. Can you optimize using possessive quantifiers

  • oksayt
    • 4,333
    • 1
    • 22
    • 41
    Trupti Swain
    • 31
    • 1
    • 1
    • 3

    2 Answers2

    3

    I think the best advice would be to not try to use regexes to parse CSV files. Any way you formulate the regex there is the possibility of an unbounded number of branch points ... and hence stack overflow for pathological input strings.

    A better approach is to select and use a decent CSV library for Java. Check the answers to this Question:

    Can you recommend a Java library for reading (and possibly writing) CSV files?

    Community
    • 1
    • 1
    Stephen C
    • 698,415
    • 94
    • 811
    • 1,216
    1

    You can make that error go away by adding a few plus signs:

    "(?![^\",][^,]*\")(\"(\"\"|[^\"]+)*\"|[^\",]+),[0-9]+"
                                    ^           ^       ^
    

    Note that those are just regular plus signs, not possessive modifiers. The second and third plus signs replaced asterisks, but it's the first one that makes the real difference. That [^\"]+ is what consumes most of the text, and it was doing so one character at a time before I added that plus sign.

    But it still won't match, it will just fail more quickly. That regex is for matching CSV fields with properly escaped quotes, and if I understand you correctly, your problem is that they're not escaped. That's a much more challenging problem, but I wonder if you really need to deal with those inner quotes at all. Won't this work?

    ".*?",\d+
    

    ...or as a Java string literal:

    "\".*?\",\\d+"
    

    Or are you trying to correct the string by escaping the quotes yourself?

    Alan Moore
    • 73,866
    • 12
    • 100
    • 156