How to perform a sed transform within a matching part of a line

Question

It's easy to do a sed transform within a line matching a certain pattern, but what if we only want to transform something in a certain part of the line?

Simple example

Suppose we want to make all characters uppercase in all lines starting with #. We could do that with a command of the following form.

sed '/^#/ y/abcdef/ABCDEF/'

Suppose we only want to turn the first word in these lines uppercase. How would we go about that using a sed translation?

More advanced application

I want to interchange slashes with backslashes in the graph part of the output of git --no-pager log --all --graph --decorate --oneline --color=always | tac.

Before

| * | | 279e9ad (tag: v0.0.4.334, origin/DR) asdfasdf
| | |/ /
| |/| / /
| | |/ / /
| | |\ \ \
| | * | |   1fc7ab7 (tag: v0.0.4.337) Merge branch 'DR' into NextMajor
| | | * | d24e21d (tag: v0.0.4.341, origin/DR-01) DR-010728 Updated unit tests
| | |\ \
| | * |   8c01099 (tag: v0.0.4.338, tag: 0.0.4_MILESTONE_RELEASE) Merge

After

| * | | 279e9ad (tag: v0.0.4.334, origin/DR) asdfasdf
| | |\ \
| |\| \ \
| | |\ \ \
| | |/ / /
| | * | |   1fc7ab7 (tag: v0.0.4.337) Merge branch 'DR' into NextMajor
| | | * | d24e21d (tag: v0.0.4.341, origin/DR-01) DR-010728 Updated unit tests
| | |/ /
| | * |   8c01099 (tag: v0.0.4.338, tag: 0.0.4_MILESTONE_RELEASE) Merge

Notice that any slashes in the commit messages are kept the same, but the slashes in the graphical part are transformed.

Show us a sample of the output of that command, along with what you would like to transform it into. It's not clear to me how it relates to your original requirement. — Tom Fenech, Mar 02 '16 at 16:10
I notice that `origin/DR-01` has changed to `origin\DR-01` in your example - is this intended? By the way, I think that you should maybe just get rid of your original example and focus on the specific issue related to your git output, as it would make your question clearer. — Tom Fenech, Mar 02 '16 at 16:25
So show us exactly what you want, otherwise someone is going to give you an answer that does the same thing! — Tom Fenech, Mar 02 '16 at 16:32

Ed Morton · Accepted Answer · 2016-03-03T13:34:37.050

1

Keep it simple, just use awk. e.g. with GNU awk for the 3rd arg to match():

$ cat tst.awk        
{
    match($0,/([| *\/\\]+)(.*)/,a)
    gsub(/\//,RS,a[1])
    gsub(/\\/,"/",a[1])
    gsub(RS,"\\",a[1])
    print a[1] a[2]
}

$ awk -f tst.awk file
| * | | 279e9ad (tag: v0.0.4.334, origin/DR) asdfasdf
| | |\ \
| |\| \ \
| | |\ \ \
| | |/ / /
| | * | |   1fc7ab7 (tag: v0.0.4.337) Merge branch 'DR' into NextMajor
| | | * | d24e21d (tag: v0.0.4.341, origin/DR-01) DR-010728 Updated unit tests
| | |/ /
| | * |   8c01099 (tag: v0.0.4.338, tag: 0.0.4_MILESTONE_RELEASE) Merge

With any awk and comments added in case it's not obvious what the script does:

$ cat tst.awk        
{
    match($0,/[| *\/\\]+/)              # find the segment of text you want
    tgt = substr($0,RSTART,RLENGTH)     # save that segment in a variable tgt
    gsub(/\//,RS,tgt)                   # change all /s to newlines in tgt
    gsub(/\\/,"/",tgt)                  # change all \s to /s in tgt
    gsub(RS,"\\",tgt)                   # change all newlines to \s in tgt
    print tgt substr($0,RSTART+RLENGTH) # print tgt plus rest of the line
}

We use newlines as the tmp value during the character swap since there's guaranteed to not already be a newline present in the line.

To turn the first word of each line that starts with # to uppercase, btw, might just be:

awk '/^#/{$1=toupper($1)}1' file

or:

awk '/^#/{$2=toupper($2)}1' file

depending on your input data, definition of a word, and white space requirements.

If the text you want to match can contain control characters, as it sounds like from your comments, then just allow that in the regexp, e.g.:

    match($0,/([[:space:][:cntrl:]|*\/\\]+)(.*)/,a)

edited Mar 03 '16 at 13:34

answered Mar 02 '16 at 17:39

Ed Morton

188,023
17
78
185

This does not seem to work with the flag `--color=always`. – chtenb Mar 03 '16 at 13:24
I don't know what that means, awk doesn't have a `--color=always` flag. If you have some input set that the above doesn't work for then edit your question to show that input and the expected output. The script works for the posted sample input or anything else that starts with a sequence of blank, /, \, | and/or * characters. – Ed Morton Mar 03 '16 at 13:26
Oh, I see from the comments under a different answer that you're running some command that injects control chars into the text - just change ` ` in the bracket expression to include control chars (e.g. `^[:print:]`) and the script will work. – Ed Morton Mar 03 '16 at 13:32
Thanks! For the record, the OP includes that I'm trying to process the command `git --no-pager log --all --graph --decorate --oneline --color=always | tac`. – chtenb Mar 03 '16 at 13:33
yes it does but for those of us who don't have `git` and don't know anything about what it outputs, all we have to go on is the sample input you provide in your question and we tend to ignore whatever command you tell us produces that input file since it's irrelevant. I've added a change to the `match()` function at the bottom of my answer with something that I think will work, it just depends what those coloring chars/sequences are in your input file. – Ed Morton Mar 03 '16 at 13:36
I'm actually very surprised to heard that since I just did a quick google of shell command coloring and found it's implemented by neither spaces nor control characters but instead escape sequences that include regular characters so idk why the update I made would have solved your problem. Glad to hear it did though! – Ed Morton Mar 03 '16 at 13:44

score 1 · Answer 2 · answered Mar 02 '16 at 19:10

1

Here's a simple sed solution that should be portable (i.e. works in sed variants other than GNU). This swaps slashes that do not follow a letter (which works in your sample data at least).

sed -e 's:\([^a-z]\)/:\1\\:g;t' -e 's:\([^a-z]\)\\:\1/:g' file

The breakdown of this goes a little like this:

s:\([^a-z]\)/:\1\\:g - replace forward slashes with backslashes
t - If we just did a substitution, skip to the end (avoiding the next substitution)
s:\([^a-z]\)\\:\1/:g - replace backslashes with forward slashes.

The reason to split this into two -e expressions is that some variants of sed require the branch name to be at the end of a line in the script. The end of a -e expression is deemed equivalent to the the end of a line.

answered Mar 02 '16 at 19:10

ghoti

45,319
8
65
104

The command `git --no-pager log --all --graph --decorate --oneline --color=always | tac | sed -e 's:\([^a-z]\)/:\1\\:g;t' -e 's:\([^a-z]\)\\:\1/:g'` doesn't seem to work. The slashes aren't touched. – chtenb Mar 03 '16 at 10:45
1

@ChieltenBrinke - switch to `--color=never` and it works. The ANSI/vt100/xterm codes that are sprinkled between characters of different colours are getting in the way of sed's interpretation of the line. To see what's *really* happening, try looking at a dump of just your `git` command, piped through `cat -et`. When colour is set with something like `^[[33m`, sed sees the `m`. If you have the option of not using colour, the sed script in my answer will work. If not, then something more complex will be required. – ghoti Mar 03 '16 at 12:39
Adding quantifiers like this seems to work: `git --no-pager log --all --graph --decorate --oneline --color=always | tac | sed -e 's:\([^a-z]*\)/:\1\\:g;t' -e 's:\([^a-z]*\)\\:\1/:g'`. – chtenb Mar 03 '16 at 12:41
If you permit "zero or more non-letters" before a (back)slash, you permit the conversion of `origin/DR-01` to `origin\DR-01`. That's not the fix. – ghoti Mar 03 '16 at 12:43

Tom Fenech · Answer 3 · 2016-03-02T16:00:24.880

If your version of sed supports it, you can use \U to transform text to uppercase:

sed -r 's/(^# *)([^ ]*)/\1\U\2/'

This captures the first part of any line starting with # (including optional spaces), then anything up to the next space character. The second capture group is transformed to uppercase.

If it doesn't support it, then you can always use perl:

perl -pe 's/(^#\s*)([\S]*)/$1\U$2/'

I've used \s and \S in this version, which are equivalent to [[:space:]] (space characters) and [^[:space:]] (non-space characters) respectively. You might want to use a slightly different pattern depending on the specifics of the files you're working with.

It would be more useful to show us an example of your input and desired output. — Tom Fenech, Mar 02 '16 at 16:04

score 0 · Answer 4 · answered Mar 02 '16 at 15:57

0

This might work for you (GNU sed):

sed '/^#/s/\w\+/\U&/' file

or:

sed '/^#/!b;s/\w\w*/&\n/;h;y/abcdef/ABCDEF/;G;s/\n.*\n//' file

answered Mar 02 '16 at 15:57

potong

55,640
6
51
83

Would you mind to explain what the latter complex expression does? – chtenb Mar 02 '16 at 16:04

How to perform a sed transform within a matching part of a line

Simple example

More advanced application

4 Answers4

Linked