2

I have a string list that is printed to the console. I need to convert back it to quoted string.

Assume the sample file is like below

List(UT_LVL_17_CD, UT_LVL_20_CD, 2018 1Q, 2018 2Q, 2018 3Q, 2018 4Q, 2018 FY)
List(UT_LVL_17_CD,UT_LVL_20_CD,2018 1Q,2018 2Q,018 3Q,2018 4Q,2018 FY)
List( UT_LVL_17_CD,    UT_LVL_20_CD,2018 1Q,2018 2Q, 2018 3Q, 2018 4Q, 2018 FY )

For all the 3 combinations above, the output should be

List("UT_LVL_17_CD", "UT_LVL_20_CD", "2018 1Q", "2018 2Q", "2018 3Q", "2018 4Q", "2018 FY")

note that spaces at the start, end or between elements is acceptable.

List(  "UT_LVL_17_CD", "UT_LVL_20_CD", "2018 1Q", "2018 2Q", "2018 3Q", "2018 4Q",    "2018 FY" )

but not within the string value, like below

"     UT_LVL_17_CD"
"UT_LVL_20_CD   ",

the spaces that are already in each element should be preserved "2018 4Q"

I'm trying something like below, but not able to get the correct result.

$ perl -pe ' s/(?<=\()|(?=,)|(?=\))/\"/sg ' list.txt
List("UT_LVL_17_CD", UT_LVL_20_CD", 2018 1Q", 2018 2Q", 2018 3Q", 2018 4Q", 2018 FY")
List("UT_LVL_17_CD",UT_LVL_20_CD",2018 1Q",2018 2Q",018 3Q",2018 4Q",2018 FY")
List(" UT_LVL_17_CD",    UT_LVL_20_CD",2018 1Q",2018 2Q", 2018 3Q", 2018 4Q", 2018 FY ")
$
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
stack0114106
  • 8,534
  • 3
  • 13
  • 38

6 Answers6

3
perl -wpe'
    s{ \(\K ([^)]+) }
     { join ", ", map { s/^\s+|\s+$//g; qq("$_") } split /,/, $1 }ex
' file
zdim
  • 64,580
  • 5
  • 52
  • 81
2

Another option could be using the \G anchor and match word characters optionally repeated by spaces and word characters.

(?:\G(?!^),|\bList\((?=[^()\r\n]*\)))\K\h*(\w+(?:\h+\w+)*)\h*

Explanation

  • (?: Non capture group
    • \G(?!^), Assert the position at the end of the previous match, but not at the start (as \G can match at those 2 positions)
    • | Or
    • \bList\((?=[^()\r\n]*\)) Word boundary, then match List( and assert a closing ) on the same line
  • ) Close non capture group
  • \K\h* Forget what is matched so far (to not remove the matched List( and the comma's) and match optional spaces to be removed
  • ( Capture group 1
    • \w+(?:\h+\w+)* Match 1+ word chars optionally repeated by spaces and word chars
  • )\h* Close group 1 and match optional trailing spaces to be removed

Regex demo

In the replacement use group 1 between double quotes "\1"

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • yes.. it works.. but finding it difficult to understand – stack0114106 Dec 23 '20 at 12:10
  • do you have any other simple examples for understanding the \G in regex – stack0114106 Dec 23 '20 at 12:39
  • @stack0114106 You can see for example https://stackoverflow.com/questions/21971701/when-is-g-useful-application-in-a-regex or http://www.rexegg.com/regex-anchors.html#G) or[https://www.regular-expressions.info/continue.html – The fourth bird Dec 23 '20 at 12:46
  • I tried in the \G style for this question.. https://stackoverflow.com/questions/65435848/concatenating-hierarchical-paths-from-the-root#65436025 can you have a look at it – stack0114106 Dec 24 '20 at 09:00
1

try this

(?<=\(|,)\s*(.*?)\s*(?=\)|,)

by this regex u can match every text with group that not contain space at start and at the end then append to it ""
look at demo

aziz k'h
  • 775
  • 6
  • 11
1

See if the following works for you:

[(,]\K\s*(.*?)\s*(?=[),])

See the online demo


  • [(,] - Match a comma or opening paranthesis.
  • \K - Reset starting point of reported match.
  • \s* - Match zero or more spaces.
  • (.*?) - 1st Capture group to capture any character with lazy quantifier.
  • \s* - Match zero or more spaces.
  • (?=[),]) - Positive lookahead to match a comma or closing paranthesis.

As per the linked demo, replace with "\1".

JvdV
  • 70,606
  • 8
  • 39
  • 70
1

Yet another variant:

$ perl -pne 's/\(\s+/\(/; /([^(]+\()(.+)\)/; $_="$1\"".join("\",\"",split(/,\s*/,$2)).")\n"; ' file
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY)
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","018     3Q","2018 4Q","2018 FY)
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY )

Input test file:

$ cat file
List(UT_LVL_17_CD, UT_LVL_20_CD, 2018 1Q, 2018 2Q, 2018 3Q, 2018 4Q, 2018 FY)
List(UT_LVL_17_CD,    UT_LVL_20_CD,2018 1Q,2018 2Q,018     3Q,2018 4Q,2018 FY)
List( UT_LVL_17_CD,    UT_LVL_20_CD,2018 1Q,2018 2Q, 2018 3Q, 2018 4Q, 2018 FY )
terdon
  • 3,260
  • 5
  • 33
  • 57
1

OP mentions that leading/trailing spaces are acceptable ... I take this to mean that it's also acceptable to strip out unnecessary leading/trailing spaces.

Sample input:

$ cat string.dat
List(UT_LVL_17_CD, UT_LVL_20_CD, 2018 1Q, 2018 2Q, 2018 3Q, 2018 4Q, 2018 FY)
List(UT_LVL_17_CD,UT_LVL_20_CD,2018 1Q,2018 2Q,018 3Q,2018 4Q,2018 FY)
List( UT_LVL_17_CD,    UT_LVL_20_CD,2018 1Q,2018 2Q, 2018 3Q, 2018 4Q, 2018 FY )

One not-so-compact awk idea:

awk -F'[()]' '                         # input field delimiters are "(" and ")"
{ printf "%s(", $1                     # print field #1 + "("
  n=split($2,a,",")                    # split field #2 by ",", save in array a[]
  pfx=""                               # initial prefix is ""
  for (i=1 ; i<=n ; i++)               # loop through a[] elements
      { gsub(/^ *| *$/,"",a[i])        # strip leading/trailing spaces
        printf "%s\"%s\"", pfx, a[i]   # print prefix + current a[] element wrapped in double quotes
        pfx=","                        # set prefix to "," for rest of a[] elements
      }
   printf ")\n"                        # print final ")"
}
' string.dat

This generates:

List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY")
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","018 3Q","2018 4Q","2018 FY")
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY")
markp-fuso
  • 28,790
  • 4
  • 16
  • 36