2

I am working the R programming language. Suppose I have the following data:

> head(my_data)
  survey_1_var_1 survey_1_var_2 survey_1_var_3 survey_2_var_4 survey_2_var_5 survey_2_var_6 survey_3_var_7 survey_3_var_8 survey_3_var_9 g
1       15.22394       0.000000      16.657620       0.000000       6.646745       9.146625        0.00000        0.00000      0.0000000 C
2        0.00000      21.144729       0.000000       0.000000      13.974305       0.000000       10.83326        0.00000     11.0154182 A
3       28.21113       0.000000      -3.157330       7.730749      -1.919841      19.842216       18.18518       13.45900     10.6051849 C
4        0.00000       0.000000      -2.125495       0.000000       0.000000      16.317981       11.52731       15.25231      0.0000000 C
5        0.00000       0.000000      -1.331926      16.843596       0.000000     -13.215788       10.61635        0.00000     -0.8529851 B
6      -11.25795       7.150576       0.000000       0.000000       0.000000       8.292532       11.43462        0.00000      0.0000000 A

My Question Is there a way to replace all non-zero data with 1?

I can do this the long way:

my_data$survey_1_var_1 = ifelse(survey_1_var_1 >0,1,0)
my_data$survey_1_var_2 = ifelse(survey_1_var_2 >0,1,0)

etc..

But is there a way to do this all at once?

Thanks!

stats_noob
  • 5,401
  • 4
  • 27
  • 83
  • How about `my_data %>% mutate(across(everything(), ~ifelse(.x >0, 1, 0)))` ? You need to load dplyr first. This will perform the ifelse functions on all columns of your data. – jpiversen Dec 22 '21 at 07:42

4 Answers4

2

You may try

library(dplyr)

my_data %>%
  mutate(across(where(is.numeric), ~ifelse(.x >0, 1, 0)))

  survey_1_var_1 survey_1_var_2 survey_1_var_3 survey_2_var_4 survey_2_var_5 survey_2_var_6 survey_3_var_7 survey_3_var_8 survey_3_var_9 g
1              1              0              1              0              1              1              0              0              0 C
2              0              1              0              0              1              0              1              0              1 A
3              1              0              0              1              0              1              1              1              1 C
4              0              0              0              0              0              1              1              1              0 C
5              0              0              0              1              0              0              1              0              0 B
6              0              1              0              0              0              1              1              0              0 A
Park
  • 14,771
  • 6
  • 10
  • 29
2

Using dplyr and across syntax:

df %>% 
  mutate(across(starts_with("survey"), ~ ifelse(.>0,1,0)))
MonJeanJean
  • 2,876
  • 1
  • 4
  • 20
  • Thank you so much for your answer! I found a way to adapt your code so that it makes all these variables as "Factors": df %>% mutate(across(starts_with("survey"), ~ as.factor(ifelse(.>0,1,0)))) – stats_noob Dec 22 '21 at 08:03
  • @stats555 It looks like these do not catch the negative numbers though. Negative numbers are assigned a 0. – AndrewGB Dec 22 '21 at 08:15
2

I'd push it one step more, convert the values to boolean (TRUE for nonzero, FALSE for 0), and then convert them back to integers with + prefix:

library(dplyr)
df %>% mutate(across(where(is.numeric), ~ +as.logical(.x)))

Output:

  survey_1_var_1 survey_1_var_2 survey_1_var_3 survey_2_var_4 survey_2_var_5 survey_2_var_6 survey_3_var_7 survey_3_var_8 survey_3_var_9 g
1              1              0              1              0              1              1              0              0              0 C
2              0              1              0              0              1              0              1              0              1 A
3              1              0              1              1              1              1              1              1              1 C
4              0              0              1              0              0              1              1              1              0 C
5              0              0              1              1              0              1              1              0              1 B
6              1              1              0              0              0              1              1              0              0 A
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
1

Here is a base R option too.

isnum <- sapply(df, is.numeric)

df[,isnum] <- as.data.frame(ifelse(df[,isnum] > 0 | df[,isnum] < 0, 1, 0))

Output

  survey_1_var_1 survey_1_var_2 survey_1_var_3 survey_2_var_4 survey_2_var_5 survey_2_var_6 survey_3_var_7 survey_3_var_8 survey_3_var_9 g
1              1              0              1              0              1              1              0              0              0 C
2              0              1              0              0              1              0              1              0              1 A
3              1              0              1              1              1              1              1              1              1 C
4              0              0              1              0              0              1              1              1              0 C
5              0              0              1              1              0              1              1              0              1 B
6              1              1              0              0              0              1              1              0              0 A

Data

df <- structure(
  list(
    survey_1_var_1 = c(15.22394, 0, 28.21113, 0, 0,-11.25795),
    survey_1_var_2 = c(0, 21.144729, 0, 0, 0, 7.150576),
    survey_1_var_3 = c(16.65762, 0,-3.15733,-2.125495,-1.331926, 0),
    survey_2_var_4 = c(0, 0, 7.730749, 0, 16.843596, 0),
    survey_2_var_5 = c(6.646745, 13.974305,-1.919841, 0, 0, 0),
    survey_2_var_6 = c(9.146625, 0, 19.842216, 16.317981,-13.215788, 8.292532),
    survey_3_var_7 = c(0, 10.83326, 18.18518, 11.52731, 10.61635, 11.43462),
    survey_3_var_8 = c(0, 0, 13.459, 15.25231, 0, 0),
    survey_3_var_9 = c(0, 11.0154182, 10.6051849, 0,-0.8529851, 0),
    g = c("C", "A", "C", "C", "B", "A")
  ),
  class = "data.frame",
  row.names = c(NA,-6L)
)
AndrewGB
  • 16,126
  • 5
  • 18
  • 49