0

I have a set of data inside the csv as below:

 Given Data:
 (12,'hello','this girl,is lovely(adorable \r\n actually)',goodbye),
 (13,'hello','this fruit,is super tasty (sweet actually)',goodbye)

I want to print the given data into 2 rows starting from ( till ) and ignore delimiter , and () inside the ' ' field.

How can I do this using awk or sed in linux?

Expected result as below:

 Expected Result: 
 row 1 = 12,'hello','this girl,is lovely(adorable actually)',goodbye
 row 2 = 13,'hello','this fruit,is super tasty (sweet actually)',goodbye

UPDATE: I just noticed that there are a comma between the 2 rows. So how can i separate it into 2 rows using the , after ) and before (?

Derek Lee
  • 475
  • 1
  • 6
  • 20
  • 1
    What have you tried? Most of us here are happy to help you improve your craft, but are less happy acting as short order unpaid programming staff. Show us your work so far in an [MCVE](http://stackoverflow.com/help/mcve), the result you were expecting and the results you got, and we'll help you figure it out. – ghoti Jan 22 '18 at 00:51
  • @ghoti i tried using awk -F"[()]" '{print $2}' test.csv but it didn't work as these rows are inside my test.csv – Derek Lee Jan 22 '18 at 01:23
  • 1
    Derek, I'm hoping to see your attempt to solve the problem, rather than just a bit of code in a comment. I want to help you *understand* a solution, not just get you past a programming hurdle without helping you grow your skills. Add what you've tried to your question, describe the process you think you need to follow to solve the overall issue, and tell us where you got stuck following that strategy. – ghoti Jan 22 '18 at 03:32

1 Answers1

0

You can use the following awk command to achieve your goal:

awk -i.bak '{str=substr($0,2,length($0)-2); gsub("\\\\r ?|\\\\n ?","",str); print "row "NR" = "str;}' file.in

tested on your input:

enter image description here

explanations:

  • -i.bak will take a backup of your file and
  • {str=substr($0,2,length($0)-2); gsub("\\\\r ?|\\\\n ?","",str); print "row "NR" = "str;} will first remove the first and last parenthesis of your string before removing the \r,\n and printing it in the format you want
  • you might need to add before the {...} the following condition if you have a header NR>1 -> 'NR>1{str=substr($0,2,length($0)-2); gsub("\\\\r ?|\\\\n ?","",str); print "row "NR" = "str;}'

following the changes in your requirements, I have adapted the awk command to be able to take into account your , as a record separator (row separator)

awk -i.bak 'BEGIN{RS=",\n|\n"}{str=substr($0,2,length($0)-2); gsub("\\\\r ?|\\\\n ?","",str); print "row "NR" = "str;}' file.in

where BEGIN{RS=",\n|\n"} defines your row separator constraint

Allan
  • 12,117
  • 3
  • 27
  • 51
  • I just noticed another problem, there is a comma between the 2 data. How can i use the data as delimiter then? – Derek Lee Jan 22 '18 at 01:15
  • my output is exactly as what you have described in your post! what is your desired output then? – Allan Jan 22 '18 at 01:23
  • if you see my update, I added a comma between the 2 ( ) values. So instead of newline as the row seperator, comma becomes the row seperator – Derek Lee Jan 22 '18 at 01:25
  • I have adapted my answer! let me know if this helps you. By the way after the comma you have a carriage return right? – Allan Jan 22 '18 at 02:34
  • Thanks for your answer, now it does really helps..thanks :) – Derek Lee Jan 22 '18 at 18:54