reformat list string with spaces

Question

I have a string list that is printed to the console. I need to convert back it to quoted string.

Assume the sample file is like below

List(UT_LVL_17_CD, UT_LVL_20_CD, 2018 1Q, 2018 2Q, 2018 3Q, 2018 4Q, 2018 FY)
List(UT_LVL_17_CD,UT_LVL_20_CD,2018 1Q,2018 2Q,018 3Q,2018 4Q,2018 FY)
List( UT_LVL_17_CD,    UT_LVL_20_CD,2018 1Q,2018 2Q, 2018 3Q, 2018 4Q, 2018 FY )

For all the 3 combinations above, the output should be

List("UT_LVL_17_CD", "UT_LVL_20_CD", "2018 1Q", "2018 2Q", "2018 3Q", "2018 4Q", "2018 FY")

note that spaces at the start, end or between elements is acceptable.

List(  "UT_LVL_17_CD", "UT_LVL_20_CD", "2018 1Q", "2018 2Q", "2018 3Q", "2018 4Q",    "2018 FY" )

but not within the string value, like below

"     UT_LVL_17_CD"
"UT_LVL_20_CD   ",

the spaces that are already in each element should be preserved "2018 4Q"

I'm trying something like below, but not able to get the correct result.

$ perl -pe ' s/(?<=\()|(?=,)|(?=\))/\"/sg ' list.txt
List("UT_LVL_17_CD", UT_LVL_20_CD", 2018 1Q", 2018 2Q", 2018 3Q", 2018 4Q", 2018 FY")
List("UT_LVL_17_CD",UT_LVL_20_CD",2018 1Q",2018 2Q",018 3Q",2018 4Q",2018 FY")
List(" UT_LVL_17_CD",    UT_LVL_20_CD",2018 1Q",2018 2Q", 2018 3Q", 2018 4Q", 2018 FY ")
$

Same idea: [`(?<=[\(,])\s*(.*?)\s*(?=[,\)])`](https://regex101.com/r/pvIqff/2) — Hao Wu, Dec 23 '20 at 06:52
@JvdV.. it is just a scratch work.. performance is not a concern.. — stack0114106, Dec 23 '20 at 07:04

zdim · Answer 1 · 2020-12-23T09:04:57.157

3

perl -wpe'
    s{ \(\K ([^)]+) }
     { join ", ", map { s/^\s+|\s+$//g; qq("$_") } split /,/, $1 }ex
' file

edited Dec 23 '20 at 09:04

answered Dec 23 '20 at 07:00

zdim

64,580
5
52
81

the last value is appearing as ````"2018 FY "```` it should be ````"2018 FY"```` – stack0114106 Dec 23 '20 at 07:06
@stack0114106 Fixed. Can't see anything nicer than simply stripping leading/trailing spaces inside `map`, will look again a little later... – zdim Dec 23 '20 at 07:19
np.. have a nice day – stack0114106 Dec 23 '20 at 07:20
Edit: Have to strip trailing space in the `map` -- but then can get all cleaned out, so removed other `\s` – zdim Dec 23 '20 at 10:15

The fourth bird · Answer 2 · 2020-12-23T10:51:45.473

2

Another option could be using the \G anchor and match word characters optionally repeated by spaces and word characters.

(?:\G(?!^),|\bList\((?=[^()\r\n]*\)))\K\h*(\w+(?:\h+\w+)*)\h*

Explanation

(?: Non capture group
- \G(?!^), Assert the position at the end of the previous match, but not at the start (as \G can match at those 2 positions)
- | Or
- \bList\((?=[^()\r\n]*\)) Word boundary, then match List( and assert a closing ) on the same line
) Close non capture group
\K\h* Forget what is matched so far (to not remove the matched List( and the comma's) and match optional spaces to be removed
( Capture group 1
- \w+(?:\h+\w+)* Match 1+ word chars optionally repeated by spaces and word chars
)\h* Close group 1 and match optional trailing spaces to be removed

Regex demo

In the replacement use group 1 between double quotes "\1"

edited Dec 23 '20 at 10:51

answered Dec 23 '20 at 10:42

The fourth bird

154,723
16
55
70

yes.. it works.. but finding it difficult to understand – stack0114106 Dec 23 '20 at 12:10
do you have any other simple examples for understanding the \G in regex – stack0114106 Dec 23 '20 at 12:39
@stack0114106 You can see for example https://stackoverflow.com/questions/21971701/when-is-g-useful-application-in-a-regex or http://www.rexegg.com/regex-anchors.html#G) or[https://www.regular-expressions.info/continue.html – The fourth bird Dec 23 '20 at 12:46
I tried in the \G style for this question.. https://stackoverflow.com/questions/65435848/concatenating-hierarchical-paths-from-the-root#65436025 can you have a look at it – stack0114106 Dec 24 '20 at 09:00

score 1 · Answer 3 · answered Dec 23 '20 at 06:50

1

try this

(?<=\(|,)\s*(.*?)\s*(?=\)|,)

by this regex u can match every text with group that not contain space at start and at the end then append to it ""
look at demo

answered Dec 23 '20 at 06:50

aziz k'h

775
6
11

JvdV · Accepted Answer · 2020-12-23T09:02:02.040

See if the following works for you:

[(,]\K\s*(.*?)\s*(?=[),])

See the online demo

[(,] - Match a comma or opening paranthesis.
\K - Reset starting point of reported match.
\s* - Match zero or more spaces.
(.*?) - 1st Capture group to capture any character with lazy quantifier.
\s* - Match zero or more spaces.
(?=[),]) - Positive lookahead to match a comma or closing paranthesis.

As per the linked demo, replace with "\1".

score 1 · Answer 5 · answered Dec 23 '20 at 13:33

Yet another variant:

$ perl -pne 's/\(\s+/\(/; /([^(]+\()(.+)\)/; $_="$1\"".join("\",\"",split(/,\s*/,$2)).")\n"; ' file
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY)
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","018     3Q","2018 4Q","2018 FY)
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY )

Input test file:

$ cat file
List(UT_LVL_17_CD, UT_LVL_20_CD, 2018 1Q, 2018 2Q, 2018 3Q, 2018 4Q, 2018 FY)
List(UT_LVL_17_CD,    UT_LVL_20_CD,2018 1Q,2018 2Q,018     3Q,2018 4Q,2018 FY)
List( UT_LVL_17_CD,    UT_LVL_20_CD,2018 1Q,2018 2Q, 2018 3Q, 2018 4Q, 2018 FY )

great..idea.. splitting on ````/,\s*/```` after initial cleansing.. — stack0114106, Dec 23 '20 at 13:41

markp-fuso · Answer 6 · 2020-12-23T20:22:35.167

OP mentions that leading/trailing spaces are acceptable ... I take this to mean that it's also acceptable to strip out unnecessary leading/trailing spaces.

Sample input:

$ cat string.dat
List(UT_LVL_17_CD, UT_LVL_20_CD, 2018 1Q, 2018 2Q, 2018 3Q, 2018 4Q, 2018 FY)
List(UT_LVL_17_CD,UT_LVL_20_CD,2018 1Q,2018 2Q,018 3Q,2018 4Q,2018 FY)
List( UT_LVL_17_CD,    UT_LVL_20_CD,2018 1Q,2018 2Q, 2018 3Q, 2018 4Q, 2018 FY )

One not-so-compact awk idea:

awk -F'[()]' '                         # input field delimiters are "(" and ")"
{ printf "%s(", $1                     # print field #1 + "("
  n=split($2,a,",")                    # split field #2 by ",", save in array a[]
  pfx=""                               # initial prefix is ""
  for (i=1 ; i<=n ; i++)               # loop through a[] elements
      { gsub(/^ *| *$/,"",a[i])        # strip leading/trailing spaces
        printf "%s\"%s\"", pfx, a[i]   # print prefix + current a[] element wrapped in double quotes
        pfx=","                        # set prefix to "," for rest of a[] elements
      }
   printf ")\n"                        # print final ")"
}
' string.dat

This generates:

List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY")
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","018 3Q","2018 4Q","2018 FY")
List("UT_LVL_17_CD","UT_LVL_20_CD","2018 1Q","2018 2Q","2018 3Q","2018 4Q","2018 FY")

reformat list string with spaces

6 Answers6