Change words order in a long string using regular expression

Question

I have a long string in which I want to change words order. I want to use regular expression as I have multiple elements to change and I want to learn at the same time. Here is an example of my strings:

vec1 <- c("Internet-Devices Used to Access Internet Past 30 Days [Desktop Computer-Owned by Self]", 
     "Internet-Devices Used to Access Internet Past 30 Days [Tablet-Owned by Other HH Member]", 
     "Internet-Devices Used to Access Internet Past 30 Days [Laptop Computer-Made Available by Your Employer]",
     "Radio Stations-Listened to Past Week-Quebec City [FM-CFEL-102.1 (blvd 102.1)]")

vec1
[1] "Internet-Devices Used to Access Internet Past 30 Days [Desktop Computer-Owned by Self]"                 
[2] "Internet-Devices Used to Access Internet Past 30 Days [Tablet-Owned by Other HH Member]"                
[3] "Internet-Devices Used to Access Internet Past 30 Days [Laptop Computer-Made Available by Your Employer]"
[4] "Radio Stations-Listened to Past Week-Quebec City [FM-CFEL-102.1 (blvd 102.1)]"

I want it to become:

[1] "Internet-Devices Used to Access Internet Past 30 Days -Owned by Self[Desktop Computer]"                 
[2] "Internet-Devices Used to Access Internet Past 30 Days -Owned by Other HH Member[Tablet]"                
[3] "Internet-Devices Used to Access Internet Past 30 Days -Made Available by Your Employer[Laptop Computer]"
[4] "Radio Stations-Listened to Past Week-Quebec City [FM-CFEL-102.1 (blvd 102.1)]"

So I think the algorithm should work like this:

Find part of the string following the "Past 30 Days" and stop at the hyphen,
Copy this extracted string exactly before the last character of the main string,
Delete the extracted string from step 1 in the main string (but not what you just add).

For step 1, I've ask a similar question yesterday (Ignore part of a string when splitting using regular expression in R) and used it to find this regular expression (?<=Past 30 Days ).+(?![^-]) which works on regex101.com but not in R (doesn't stop at the hyphen):

reg1 <- regexec(pattern = "(?<=Past 30 Days ).+(?![^-])", vec1, perl=T)
ext1 <- unname(mapply(function(xx,yy) substr(xx, yy, yy+attr(yy,"match.length")), vec1, reg1))
ext1
[1] "[Desktop Computer-Owned by Self]"                  "[Tablet-Owned by Other HH Member]"                
[3] "[Laptop Computer-Made Available by Your Employer]" ""

As you can see, it doesn't stop at the hyphen.

And for the second step, I was thinking of something like this:

vec2 <- unname(mapply(gsub, ext1, vec1, MoreArgs = list(pattern="]")))
vec2
[1] "Internet-Devices Used to Access Internet Past 30 Days [Desktop Computer-Owned by Self[Desktop Computer-Owned by Self]"                                  
[2] "Internet-Devices Used to Access Internet Past 30 Days [Tablet-Owned by Other HH Member[Tablet-Owned by Other HH Member]"                                
[3] "Internet-Devices Used to Access Internet Past 30 Days [Laptop Computer-Made Available by Your Employer[Laptop Computer-Made Available by Your Employer]"
[4] "Radio Stations-Listened to Past Week-Quebec City [FM-CFEL-102.1 (blvd 102.1)"

Which does pretty much what I want except for removing the "]" in the last element of the vector and not adding the right string (because of problem 1).

Finally, I remove the initial part of the string:

unname(mapply(gsub, paste0(stringr::str_sub(ext1, end=-2),"["), vec2, MoreArgs = list(replacement="[", fixed=T)))
[1] "Internet-Devices Used to Access Internet Past 30 Days [Desktop Computer-Owned by Self]"                 
[2] "Internet-Devices Used to Access Internet Past 30 Days [Tablet-Owned by Other HH Member]"                
[3] "Internet-Devices Used to Access Internet Past 30 Days [Laptop Computer-Made Available by Your Employer]"
[4] "Radio Stations-Listened to Past Week-Quebec City [FM-CFEL-102.1 (blvd 102.1)"

This kind of work but I have the same 2 problems as in step 2.

My whole code seems pretty heavy and complicated. Any better way of doing this?

Note:

I'm not looking for a super robust solution
I never have nested brackets
my strings always finish with brackets

score 2 · Accepted Answer · answered Nov 15 '17 at 15:26

You may use

(Past 30 Days\s*)([^-]*)([^]]+)

and replace with \1\3\2. See the regex demo.

Details

(Past 30 Days\s*) - Group 1 (referred to with \1 backreference from the replacement pattern):
- Past 30 Days - a literal substring
- \s* - 0+ whitespaces
([^-]*) - Group 2: zero or more chars other than -
([^]]+) - Group 3: one or more chars other than ].

See an R demo online:

vec1 <- c("Internet-Devices Used to Access Internet Past 30 Days [Desktop Computer-Owned by Self]", 
     "Internet-Devices Used to Access Internet Past 30 Days [Tablet-Owned by Other HH Member]", 
     "Internet-Devices Used to Access Internet Past 30 Days [Laptop Computer-Made Available by Your Employer]",
     "Radio Stations-Listened to Past Week-Quebec City [FM-CFEL-102.1 (blvd 102.1)]")
gsub("(Past 30 Days\\s*)([^-]*)([^]]+)", "\\1\\3\\2", vec1)
# [1] "Internet-Devices Used to Access Internet Past 30 Days -Owned by Self[Desktop Computer]"                 
# [2] "Internet-Devices Used to Access Internet Past 30 Days -Owned by Other HH Member[Tablet]"                
# [3] "Internet-Devices Used to Access Internet Past 30 Days -Made Available by Your Employer[Laptop Computer]"
# [4] "Radio Stations-Listened to Past Week-Quebec City [FM-CFEL-102.1 (blvd 102.1)]"

Way, but way, simpler than my solution. I've heard about the power of regular expression, but now I'm witnessing it first hand! Thanks! — Bastien, Nov 15 '17 at 15:39
Well, that is just a couple of capturing groups with backreferences. — Wiktor Stribiżew, Nov 15 '17 at 15:42

Change words order in a long string using regular expression

1 Answers1