0

I stuck at a problem where, i need to split a sentence containing multiple sentences has to be split at comma delimiter. The problem am facing is that, even the string also has commas which should not be considered as delimiter. The commas as values are present at random, so cannot have regex approach either. I think am going have to use NLP approach for this.

I tried regex and pattern matching but it is not able to handle all the cases, hence I think am going to have to go with NLP approach

Here is an example for input :- 'Reduction of manual triaging effort, Built browser based test application, Real-Time Monitoring of Audio and Video frame by frame,Test Assessment, Test Design, Test Optimization, Test Report and Test Bench creation, Providing high-level and low-level requirements for Automation of key modules, Test Analysis and Use-case generation and Scope identification for the CFRs, Network protocols testing and Security testing using in-home solutions and external third party tools, Effective Escape Defect Analysis strategy'

Output:- [Reduction of manual triaging effort, Built browser based test application, Real-Time Monitoring of Audio and Video frame by frame, Test Assessment, Test Design, Test Optimization, Test Report and Test Bench creation, Providing high-level and low-level requirements for Automation of key modules, Test Analysis and Use-case generation and Scope identification for the CFRs, Network protocols testing and Security testing using in-home solutions and external third party tools, Effective Escape Defect Analysis strategy']

  • You need to use a library like spaCy or NLTK to tokenize the input sentence into individual words. – Abdulmajeed Mar 08 '23 at 17:17
  • Can you please help me solve it, I tried using nltk but not getting the desired results. – Shashank Joshi Mar 08 '23 at 17:19
  • I can help you but can you provide me of sentence you're trying to split, and the desired output? – Abdulmajeed Mar 08 '23 at 17:24
  • I have updated the question, with input and output – Shashank Joshi Mar 08 '23 at 18:53
  • Cool, I will look into it now – Abdulmajeed Mar 08 '23 at 19:01
  • I don't think there's an easy solution to your problem because your data is not formatted properly. For example, any string that includes a comma in a CSV file needs to be wrapped within double quotes (`""`) so that it can be parsed correctly. Here's an example thread: https://stackoverflow.com/questions/4617935/is-there-a-way-to-include-commas-in-csv-columns-without-breaking-the-formatting . Unless you have access to the raw data in another format (e.g., line-by-line), you may be out of luck here. – umit1010 Mar 19 '23 at 06:46

0 Answers0