0

I'm pretty brand new to R and having an issue with a data frame.

So i have a dataframe dataf that looks like this:

#         PlayerName           playerValue
#5     Tammy Abraham          10,00 Mill. €  
#6     Abdul Rahman Baba      8,00 Mill. €  
#7     Mario Pasalic          8,00 Mill. €  
#8     Lewis Baker            5,50 Mill. €  
#9     Ola Aina               4,00 Mill. €  
#10    Jamal Blackman         500 Th. €  

Then I use the line:

dataf$playerValue <- gsub(",", ".", gsub("[[:space:]].*", "", dataf$PlayerValue))

The output of this is:

#         PlayerName           playerValue        playerValue
#5     Tammy Abraham          10,00 Mill. €           10
#6     Abdul Rahman Baba      8,00 Mill. €            8
#7     Mario Pasalic          8,00 Mill. €            8
#8     Lewis Baker            5,50 Mill. €            5.5
#9     Ola Aina               4,00 Mill. €            4
#10    Jamal Blackman         500 Th. €               500

Is there anyway to make the final value from 500 to .5? Because obviously 500 thousand is smaller than 4 million, but here the int 500 is going to be larger than 4.

Also, how do I just exclude the original PlayerValue column? When I run my code it prints out the column twice, once with the String at the end and the converted column too.

Thank you for any help.

Sotos
  • 51,121
  • 6
  • 32
  • 66
Paul R
  • 79
  • 2
  • 7
  • As far as the printing twice goes, note that `playerValue` isn't the same as `PlayerValue`. You have a typo. As far as you other question, I don't see any shortcut to using `ifelse()` with a condition that checks for `"Th"`, but maybe someone with more regular expression expertise than I has a slicker approach. – John Coleman Dec 12 '18 at 12:48

3 Answers3

1

Here is an idea where it extracts the values from each string and if the word Mill is not found in the string, it divides by 1000, i.e.

Assume the data frame,

 playerName         playerValue
1  Tammy Abraham    10,00 Mill. €
2 Jamal Blackman    500 Th. €

then,

v1 <- as.numeric(gsub('\\D+', '', gsub(',.*', '', df$playerValue)))
v1[!grepl('Mill', df$playerValue)] <- v1[!grepl('Mill', df$playerValue)] / 1000
v1
#[1] 10.0  0.5

Here is a very similar question although not exactly the same

Sotos
  • 51,121
  • 6
  • 32
  • 66
0

You can use regex to separate millions/thousands when extracting the numbers.

# dummy data
dataf <- data.frame(playerValue = c("500 Th. € ","4,00 Mill. € "))
# Extract with regex
mils <- 10^3 * as.numeric(gsub("^(\\d+?)\\,(\\d+?)\\s.*|^(\\d+?)\\s.*", "\\1", 
                               dataf$playerValue, perl = TRUE))
thsd <- as.numeric(gsub("^(\\d+?)\\,(\\d+?)\\s.*|^(\\d+?)\\s.*", "\\2\\3", 
                   dataf$playerValue, perl = TRUE))
# Final result
rowSums(cbind(mils, thsd), na.rm = TRUE)
# returns
[1]  500 4000
niko
  • 5,253
  • 1
  • 12
  • 32
  • I think you will find this post helpful: https://stackoverflow.com/questions/14543627/extracting-numbers-from-vectors-of-strings – Antonios Dec 12 '18 at 12:53
0

Here a simple answer using strsplit and ifelse.

# Dummy data
df <- data.frame(playerValue = c("500 Th. € ","4,00 Mill. € "), stringsAsFactors = FALSE)

# Splitting number and scale into two columns
splits <- strsplit(df$playerValue, split = " ")
splits <- do.call(rbind, splits)

# Replacing commas
splits[,1] <- gsub(",", ".", splits[,1])

# Adding to dataframe
df$value <- as.numeric(splits[,1])
df$scale <- splits[,2]

# Calculating new values
df$new_value <- ifelse(df$scale == "Th.", df$value/1000, df$value)
Esben Eickhardt
  • 3,183
  • 2
  • 35
  • 56