191

I have a list of words:

bau
ceu
diu
fou
gau

I want to turn that list into:

byau
cyeu
dyiu
fyou
gyau

I unsuccessfully tried the command:

:%s/(\w)(\w\w)/\1y\2/g

Given that this doesn't work, what do I have to change to make the regex capture groups work in Vim?

Christian
  • 25,249
  • 40
  • 134
  • 225
  • possible duplicate of [Matching an expression including arbitrary lines with regex in Vim](http://stackoverflow.com/questions/17471929/matching-an-expression-including-arbitrary-lines-with-regex-in-vim) and http://stackoverflow.com/questions/18627893/vim-match-errors-out-with-regular-expression-ffelrf – Ingo Karkat Nov 11 '13 at 08:50
  • 4
    It's a little bit off-topic so I put it here as a comment but… I'd do `:%norm ay`. – romainl Nov 11 '13 at 09:12
  • 5
    In your case (if it's exactly like described), it's an option to: move to 2nd column with `l`, enter Visual Block mode with `Ctrl+v`, mark whole column with `Shift+g` followed by `l`, then enter Insert mode with `Shift+i`and input 'y'. 7 keystrokes including finishing `Esc` to exit Insert mode. Not posting as an answer because it's not really about capture groups (which is what I searched for when I found this). :-) – LAFK 4Monica_banAI_modStrike Aug 21 '16 at 11:23

5 Answers5

309

One way to fix this is by ensuring the pattern is enclosed by escaped parentheses:

:%s/\(\w\)\(\w\w\)/\1y\2/g

Slightly shorter (and more magic-al) is to use \v, meaning that in the pattern after it all ASCII characters except '0'-'9', 'a'-'z', 'A'-'Z' and '_' have a special meaning:

:%s/\v(\w)(\w\w)/\1y\2/g

See:

johnsyweb
  • 136,902
  • 23
  • 188
  • 247
65

You can also use this pattern which is shorter:

:%s/^./&y
  • %s applies the pattern to the whole file.
  • ^. matches the first character of the line.
  • &y adds the y after the pattern.
Peter Perháč
  • 20,434
  • 21
  • 120
  • 152
Juan
  • 915
  • 1
  • 9
  • 13
  • 2
    Its amazing how after more than 10 years and a quite a bit of expertise in vim, I still learn new tricks like using "&" to add rather than to substitute. thanks – Kiteloopdesign Nov 01 '22 at 08:15
  • 1
    @Kiteloopdesign `&` is actually just another name for `\0`, which is the capture group containing the entire sequence that was matched. – cuddlebugCuller Apr 12 '23 at 05:07
56

If you don't want to escape the capturing groups with backslashes (this is what you've missed), prepend \v to turn Vim's regular expression engine into very magic mode:

:%s/\v(\w)(\w\w)/\1y\2/g
Ingo Karkat
  • 167,457
  • 16
  • 250
  • 324
  • Ingo, sorry for the placing a question in the wrong place: This works find in `:exmode`; is there a way to do it in gvim find/replace dialogue box? – JJoao May 05 '15 at 16:30
  • 3
    @JJoao: No, the find/replace box is for literal search and replacement only. You shouldn't be using that, anyway; it's just training wheels for Notepad users. – Ingo Karkat May 06 '15 at 06:50
  • Ingo, thank you (it is not for me: I am happy with exmode, but for linguists colaborators in a dictionary project): it almost work - with `\v...` regexp work find; in the replacement string, `&` works but `\ ` are protected (`\1\r` are lost) – JJoao May 06 '15 at 08:11
  • @JJoao: Yes, that's what I found out while playing with it, too. I'm still skeptical whether using Vim without Ex mode is a good idea, but you could easily build your own search-and-replace dialog (internally powered by `:s`) via `inputdialog()` and a bit of Vimscript. – Ingo Karkat May 06 '15 at 08:32
  • Ingo: Thank you again; I agree with your skeptical opinion. Inputdialg+:s+vimscript is probably the way gvim's find replace is built. For me `\1 \r ` treatment is a gvim bug. I will try to post it in some vim specific list. In the meanwhile I will try my one vimscript-inputdialog. – JJoao May 06 '15 at 09:10
17

You also have to escape the Grouping paranthesis:

:%s/\(\w\)\(\w\w\)/\1y\2/g

That does the trick.

Christian
  • 25,249
  • 40
  • 134
  • 225
Henkersmann
  • 1,190
  • 8
  • 21
  • 3
    Coming from Sublime Text 3, this is horrible. Why is the syntax like this? It doesn't make sense to escape characters that aren't literal, normal text. – Unknow0059 Jan 02 '21 at 20:09
  • @Unknow0059 the parenthesis in this case aren't literal text. they are meta characters that delimit the groups to save for the replace expression. placing a non-escaped paren in an expression will match the literal character, as one would expect (this was what tripped up the OP). – Azure Heights Mar 02 '21 at 20:38
  • 1
    I'm a regular vim user and I also think this is terrible. @Unknow0059 – icedwater May 19 '21 at 09:48
  • 1
    @Unknow0059 because vim is older than the normal regex syntax that we all use nowadays. Most people that use vim just use the `\v` version described in other answers though, rather than escape every little thing in their regex – CoffeeTableEspresso Aug 07 '22 at 17:49
6

In Vim, on a selection, the following

:'<,'>s/^\(\w\+ - \w\+\).*/\1/

or

:'<,'>s/\v^(\w+ - \w+).*/\1/

parses

Space - Commercial - Boeing

to

Space - Commercial

Similarly,

apple - banana - cake - donuts - eggs

is parsed to

apple - banana

Explanation

  • ^ : match start of line
  • \-escape (, +, ) per the first regex (accepted answer) -- or prepend with \v (@ingo-karkat's answer)
  • \w\+ finds a word (\w will find the first character): in this example, I search for a word followed by - followed by another word)
  • .* after the capturing group is needed to find / match / exclude the remaining text

Addendum. This is a bit off topic, but I would suggest that Vim is not well-suited for the execution of more complex regex expressions / captures. [I am doing something similar to the following, which is how I found this thread.]

In those instances, it is likely better to dump the lines to a text file and edit it "in place"

sed -i ...

or in a redirect

sed ... > out.txt

In a terminal (or BASH script, ...):


echo 'Space Sciences - Private Industry - Boeing' | sed -r 's/^((\w+ ){1,2}- (\w+ ){1,2}).*/\1/'

Space Sciences - Private Industry 

cat in.txt

Space Sciences - Private Industry - Boeing

sed -r 's/^((\w+ ){1,2}- (\w+ ){1,2}).*/\1/' ~/in.txt > ~/out.txt

cat ~/out.txt 

Space Sciences - Private Industry

## Caution: if you forget the > redirect, you'll edit your source.
## Subsequent > redirects also overwrite the output; use >> to append
## subsequent iterations to the output (preserving the previous output).
 
## To edit "in place" (`-i` argument/flag):

sed -i -r 's/^((\w+ ){1,2}- (\w+ ){1,2}).*/\1/' ~/in.txt

cat in.txt

Space Sciences - Private Industry 

sed -r 's/^((\w+ ){1,2}- (\w+ ){1,2}).*/\1/'

(note the {1,2}) allows the flexibility of finding {x,y} repetitions of a word(s) -- see https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html .

Here, since my phrases are separated by -, I can simply tweak those parameters to get what I want.

Victoria Stuart
  • 4,610
  • 2
  • 44
  • 37