2

In the Winner column of my dataframe, I want to remove all the text starting from the left parenthesis.

Searching stackoverflow.com, I found this response, and I applied its stringr solution in my code, but it does not work. My code is not changing my input.

Input:

Year    Lg  Winner                  Team
1956    NL  Don Newcombe (1 | MVP)  Brooklyn (1)
1957    NL  Warren Spahn (1 | HOF | ASG)    Milwaukee (1)
1958    AL  Bob Turley (1 | ASG)    New York (1)

Here is how I want the output to look:

Year    Lg  Winner                  Team
1956    NL  Don Newcombe            Brooklyn (1)
1957    NL  Warren Spahn            Milwaukee (1)
1958    AL  Bob Turley              New York (1)

dput(dfx):

structure(list(Year = 1956:1958, Lg = structure(c(2L, 2L, 1L), .Label = c("AL", 
"NL"), class = "factor"), Winner = structure(c(2L, 3L, 1L), .Label = c("Bob Turley (1 | ASG)", 
"Don Newcombe (1 | MVP)", "Warren Spahn (1 | HOF | ASG)"
), class = "factor"), Team = structure(1:3, .Label = c("Brooklyn (1)", 
"Milwaukee (1)", "New York (1)"), class = "factor")), class = "data.frame", row.names = c(NA, 
-3L))

Code:

library(stringr)
dfnoparens <- dfx
str_replace(dfnoparens$Winner, " \\(.*\\)", "")
head(dfnoparens)
akrun
  • 874,273
  • 37
  • 540
  • 662
Metsfan
  • 510
  • 2
  • 8
  • 1
    `m <- regexpr('^[^\\(]*', dfnoparens$Winner);regmatches(dfnoparens$Winner, m)`. – Rui Barradas May 27 '20 at 16:06
  • 2
    You need to assign it, i.e. `dfnoparens$Winner <-str_replace(dfnoparens$Winner, " \\(.*\\)", "")`, otherwise no need for packages, a simple `sub` would do – arg0naut91 May 27 '20 at 16:07
  • 2
    In Base R, you canf use `sub("\\(.*", "", dfx$Winner)` – G5W May 27 '20 at 16:10
  • When I assigned it using the code below, the result did not change: library(stringr) dfnoparens <- dfx dfnew <- str_replace(dfnoparens$Winner, " \\(.*\\)", "") head(dfnew) – Metsfan May 27 '20 at 16:20
  • 1
    You have to assign it to the right place @Metsfan `dfnoparens` is just a copy of your original `dfx` try `dfnoparens$newwinner <- str_replace(dfnoparens$Winner, " \(.*\)", "")` and then `head(dfnoparens)` – Chuck P May 27 '20 at 18:06

4 Answers4

3

With the test data in the question (only relevant column).

x <- c('Don Newcombe (1 | MVP)', 'Warren Spahn (1 | HOF | ASG)', 'Bob Turley (1 | ASG)')

Use regexpr/regmatches.

m <- regexpr('^[^\\(]*', x)
y <- regmatches(x, m)
y
#[1] "Don Newcombe " "Warren Spahn " "Bob Turley "

This output strings still have the white space before the left parenthesis, if needed remove it now.

trimws(y)
#[1] "Don Newcombe" "Warren Spahn" "Bob Turley"
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
3

We can use trimws with whitespace

 trimws(x, whitespace = "\\s*\\(.*")
 #[1] "Don Newcombe" "Warren Spahn" "Bob Turley"  

data

x <- c('Don Newcombe (1 | MVP)', 'Warren Spahn (1 | HOF | ASG)', 'Bob Turley (1 | ASG)')
akrun
  • 874,273
  • 37
  • 540
  • 662
1
df <- structure(list(Year = 1956:1958, 
                     Lg = structure(c(2L, 2L, 1L), .Label = c("AL", "NL"), class = "factor"), 
                     Winner = structure(c(2L, 3L, 1L), 
                                        .Label = c("Bob Turley (1 | ASG)", "Don Newcombe (1 | MVP)", 
                                                   "Warren Spahn (1 | HOF | ASG)"), class = "factor"),
                     Team = structure(1:3, .Label = c("Brooklyn (1)", "Milwaukee (1)", "New York (1)"), 
                                      class = "factor")), class = "data.frame", row.names = c(NA,-3L))

Here is a strsplit solution.

df$Winner <- unlist(lapply(strsplit(as.character(df$Winner)," (",fixed=TRUE), `[[`, 1))
df
  Year Lg       Winner          Team
1 1956 NL Don Newcombe  Brooklyn (1)
2 1957 NL Warren Spahn Milwaukee (1)
3 1958 AL   Bob Turley  New York (1)
1

Use str_extract from the stringr library:

df$Winner <- str_extract(df$Winner, ".*(?=\\s\\(\\d)")

This solution uses positive lookahead in (?=...); the lookahead can be glossed as "Match anything (.*) that occurs prior to a white space (\\s) followed by an opening round bracket (\\() followed by a number (\\d)".

Result:

df
  Year Lg       Winner          Team
1 1956 NL Don Newcombe  Brooklyn (1)
2 1957 NL Warren Spahn Milwaukee (1)
3 1958 AL   Bob Turley  New York (1)
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34