-1

I have a data frame (data) in R with thousands of rows and 10 columns. 9 of the columns contain factors with several levels.

Here is a small portion of the data frame.

A gr1

10 303.90

11 304.1

12 303.6

13 303.90 obs

14 303.90k

As an example, one factor has a level that is "303.90" and another level that is "303.90 obs". I want to change the "303.90 obs" to "303.90". I am using the following command to edit the names of the level.

data[] = as.data.frame(lapply(data, function(x) {x = gsub("303.90 obs","303.90", fixed = T, x)}))

But this is not changing the level "303.90 obs" to "303.90". It just stays the same. Still this command works for other strings, eg. "303.9" gets changed to "303.90" when I use:

data[] = as.data.frame(lapply(data, function(x) {x = gsub("303.9 obs","303.90", fixed = T, x)}))

Any suggestions to why this might be ?

gwarr
  • 51
  • 1
  • 6
  • First, remove `x=` from the `lapply`. Second, please provide a small sample of data: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – emilliman5 Mar 22 '18 at 14:34
  • So you also want that `303.90k` should be changed to `303.90`? Then you need regular expression. – Tobias Mar 22 '18 at 15:00
  • I think my code might be replacing the "303.90" part of "303.90 obs" but still printing the rest of the string. Could this be the case ? – gwarr Mar 22 '18 at 15:35
  • Yes. 303.90k should also be changed. I will try to use regular expressions, but I thing gsub should work for this example anyway. Need to figure out why it is not working before moving on to regular expressions. – gwarr Mar 22 '18 at 15:41
  • @gwarr: I updated my script. Please check if this works. – Tobias Mar 23 '18 at 07:26
  • I suspect this is a issue with the file I´m importing, because I´m not able to convert this level "303.90 obs". I have tried making an example data.frame from scratch in R, including factors with several levels in similar format "303.90 ..." and I have been able to convert those successfully using all the methods mentioned here. I have to take a closer look at the file I´m importing. Maybe there are some hidden characters ? – gwarr Mar 23 '18 at 11:46

1 Answers1

0

I'm not that familiar with lapply therefore my solution simply loops over the columns of the dataframe. This works as it should.

col1 <- 1:10
col2 <- 21:30
col3 <- c("503.90", "303.90 obs", "803.90sfsdf sf", "203.90 obs", "303.90", "103.90 obs", "303.90", "403.90 obs", "803.90sfsdf sf", "303.90 obs")
col4 <- c("303.90", "303.90 obs", "303.90", "203.90 obs", "303.90", "107.40fghfg", "303.90", "303.90 obs", "303.90", "303.90 obs")

data <- data.frame(col1, col2, col3, col4)

data$col3 <- as.factor(data$col3)
data$col4 <- as.factor(data$col4)

for(i in 3:4) {
  matchedExpression = regexpr(pattern = "\\d+\\.\\d+", text = data[,i])
  data[,i] = regmatches(x = data[,i], m = matchedExpression)
  data[,i] <- as.factor(data[,i])
}

EDIT

OP changed description. To change all factors to 303.90regex is a better solution. However, more information are necessary from the OP to give a general solution e.g. is it only 303.90 which should be changed?

EDIT2

Updated the script since OP provided more information e.g. columns can have different factors than 303.90.

Tobias
  • 564
  • 3
  • 13
  • This method is no working for me. The "303.90 obs" is not being changed to "303.90". – gwarr Mar 22 '18 at 15:14
  • When you copy&paste the script and execute it, you will see that it works. If my code does not work on your dataframe you did not provide enough information. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Tobias Mar 22 '18 at 15:18