0

I have a data.frame

test_data <- read.csv("https://stepik.org/media/attachments/course/724/test_data_01.csv", 
                       stringsAsFactors = FALSE)

This data.frame looks like this:

 V1      V2       V3      V4       V5
1 -2. 5935    II 2   0.4984 ST 123E -2.154 4
2  -0.2888 ST 123E   1.5636 ST 123E   0.1053
3 -0.828 6 ST 123E -0.9 791  HP 129 -0. 4989
4  -0. 322 ST 123E  -0.3013  HP 129  -0.4032
5  -0.5588 ST 123E   1.2694  HP 129  0.703 9

My goals: 1) sort only real numeric (V1, V3, V5)

num_test<-test_data[sapply(test_data, function(x) grepl("[A-Za-z]", x, perl = T))==F]

2) in real numeric(V1, V3, V5) remove whitespaces and then change for V1, V3, V5 factor to numeric

str_remove_all(num_test," ")

But I don't understand how can I return data.frame with changes. It should look like this:

V1      V2      V3      V4      V5
1 -2.5935    II 2  0.4984 ST 123E -2.1544
2 -0.2888 ST 123E  1.5636 ST 123E  0.1053
3 -0.8286 ST 123E -0.9791  HP 129 -0.4989
4 -0.3220 ST 123E -0.3013  HP 129 -0.4032
5 -0.5588 ST 123E  1.2694  HP 129  0.7039

Thanks!

r2evans
  • 141,215
  • 6
  • 77
  • 149
Ekaterina
  • 69
  • 8
  • (1) `perl=TRUE` is unnecessary here, you're just slowing it down. (2) `(...) == F` should really be `! (...)` or `isFALSE(...)`. (3) Other than adding `as.numeric` to your whitespace-removal, what part of this doesn't work? Is it just reassigning these values back into the original frame? – r2evans Jun 18 '20 at 17:13

1 Answers1

0

Since you're using str_replace_all, I'm inferring tidyverse. Try this:

library(dplyr)
test_data %>%
  mutate_at(vars(V1, V3, V5), ~ as.numeric(gsub("\\s", "", .)))
#        V1      V2      V3      V4      V5
# 1 -2.5935    II 2  0.4984 ST 123E -2.1544
# 2 -0.2888 ST 123E  1.5636 ST 123E  0.1053
# 3 -0.8286 ST 123E -0.9791  HP 129 -0.4989
# 4 -0.3220 ST 123E -0.3013  HP 129 -0.4032
# 5 -0.5588 ST 123E  1.2694  HP 129  0.7039

since gsub works just fine by itself. If you prefer stringr, then

library(stringr)
test_data %>%
  mutate_at(vars(V1, V3, V5), ~ as.numeric(str_replace_all(., "\\s", "")))

Edit

To determine which columns have no letter-like data, then

test_data %>%
  mutate_if(~ !any(grepl("[A-Za-z]", .)),
            ~ as.numeric(str_replace_all(., "\\s", "")))
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • It's a good idea! But I need a function which changes only real numeric. I mean it's columns that doesn't contain any letters. Because of this I wrote this code: num_test<-test_data[sapply(test_data, function(x) grepl("[A-Za-z]", x, perl = T))==F] – Ekaterina Jun 18 '20 at 17:24
  • But may be dataset where real numeric will be not in columns V1, V3, V5, for example, in V2,V7,V10. In this exercise I should find a way how determine these columns automatically – Ekaterina Jun 18 '20 at 17:48
  • Okay, Ekaterina, see my *real* edit (of my answer this time, not your question ... sry about that). – r2evans Jun 18 '20 at 17:51