2

I have the following data

Year    <- c("2021","2021","2021","2021","2021","2021")
Month   <- c("8","8","8","8","8","8")
Day <- c("10","15","18","20","22","25")
Hour <- c("171110","171138","174247","183542","190156","190236")
Id_Type <-  c("2","2","1","","1","")
Code_Intersecction <- c("340","","","210","750","980")

Data = data.frame(Year,Month,Day,Hour,Id_Type,Code_Intersecction)

I need to count the amount of "" that exists in the base, for that I use the following if it is greater than 5% it takes the value of 1 otherwise 0

Data_Null = as.data.frame(purrr::map_dbl(Data, .f = function(x){ifelse(round(sum(x == '')/nrow(Data)*100L,3) >= 5, 1, 0)}))
colnames(Data_Null) = "Null"

the problem comes when I see the data frame it only takes me one column and not 2; names and value 0/1

enter image description here

How can I make it appear as follows

enter image description here

redondo
  • 107
  • 1
  • 1
  • 6

3 Answers3

3

We may use colMeans on a logical matrix in base R, convert the named vector to a two column data.frame with stack

stack(+(colMeans(Data == "") > 0.05))[2:1]

Explanation - Data == "" returns a logical matrix, colMeans get the mean of the logical vector for each column (which would be the percentage (*100) of TRUE values), then convert to logical vector by comparing with 0.05 (5 percent). The logical can be coeced to binary with either (+) or use as.integer. The output of colMeans is a named vector, which remains as such. stack converts the logical named vector to a two column data.frame. Indexing ([2:1]) will reorder the columns i.e. 2nd column appears first, followed by first column.

-output

                 ind values
1               Year      0
2              Month      0
3                Day      0
4               Hour      0
5            Id_Type      1
6 Code_Intersecction      1

With tidyverse, the equivalent is enframe (from tibble)

library(dplyr)
library(tidyr)
library(purrr)
map(Data, ~ +(round(mean(.x == ""), 3) * 100 >= 5)) %>%
  enframe(name = 'Variables') %>%
  unnest(value)
# A tibble: 6 × 2
  Variables          value
  <chr>              <int>
1 Year                   0
2 Month                  0
3 Day                    0
4 Hour                   0
5 Id_Type                1
6 Code_Intersecction     1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Can you explain to me what the + sign means in the code stack(+(colMeans(Data == "") > 0.05))[2:1] – redondo Jan 17 '22 at 19:09
2

Use tibble:rownames_to_column:

tibble::rownames_to_column(Data_Null, var ="Variables")

# A tibble: 6 x 2
  Variables           Null
  <chr>              <dbl>
1 Year                   0
2 Month                  0
3 Day                    0
4 Hour                   0
5 Id_Type                1
6 Code_Intersecction     1
Maël
  • 45,206
  • 3
  • 29
  • 67
2

Base R:

Data$Variables <- rownames(Data)
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • There's a couple more steps you'll probably want to do: change the order of columns, since this appends the new column as the last column, and drop the row names, as they're no longer included in the OP's desired output – camille Jan 17 '22 at 18:37