0

I have a tibble with 27 columns of numeric values. I want to count the number of digits after the decimal point, especially, I want to know the maximum number of digits that can happen in each column.

I tried to convert the numerical values to a string variable, then subset a string after the decimal point, then count the number of the new string, then find max.

afterdecimal_val1 <- data %>%
  mutate(
    across(where(is.numeric), as.character),
  ) %>%
  rowwise() %>%
  mutate(
   init = str_split(value_1,"[.]"),
   init2 = init[2],
   init3 = nchar(init2)
  ) %>%
  ungroup() %>%
  mutate(init4 = max(init3, na.rm=TRUE))

Well, this works, however only for the Column "value_1" and I am pretty sure that this is not the prettiest way to do it.

Do you know a more feasible way? Or: Can you help me to upgrade the code so it works for all 27 variables and not only "value_1"?

I really like answers using dplyr!

Thanks for your help!

C. Sebastian
  • 123
  • 8
  • 1
    You could improve your chances of finding help here by adding a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). Adding a MRE and an example of the desired output (in code form, not tables and pictures) makes it much easier for others to find and test an answer to your question. That way you can help others to help you! And do you mean number of digits after the decimal point *that are not zero"? Also: You could look into `tidyr::pivot_longer`. – dario Oct 13 '21 at 09:58

3 Answers3

2

Does this work:

df1 <- data.frame(c1 = c(1.11, 1.121,1.1212),
                  c2 = c(1.1, 1.121,1.121),
                  c3 = c(1.1111, 1.121,1.12111))
apply(apply(df1, 2, function(x) gsub('(^\\d)(\\.)(\\d+)','\\3',as.character(x))), 2, function(y) max(nchar(y)))
c1 c2 c3 
 4  3  5 
Karthik S
  • 11,348
  • 2
  • 11
  • 25
2

dpylr solution, untested due to lack of MRE:

library(dplyr)
library(tidyr)

data %>% 
  pivot_longer(starts_with("value"), names_to = "variable", values_to = "value") %>% 
  mutate(N_digits = nchar(gsub(".*\\.", "", as.character(value)))) %>% 
  group_by(variable) %>% 
  summarise(max_N_digits = max(N_digits)) %>% 
  pivot_wider(names_from = variable, values_from = max_N_digits)

Edit

This should also work for numbers without decimal point (i.e. give 0):

data1 <- data.frame(value1 = c(1.11, 1.121,1.1212),
                   value2 = c(6666, 5,5),
                   value3 = c(1.1111, 1.121,1.12111))
library(dplyr)
library(tidyr)
data1 %>% 
  pivot_longer(starts_with("value"), names_to = "variable", values_to = "value") %>% 
  mutate(N_digits = nchar(gsub(".*\\.|^[^.]+$", "", as.character(value)))) %>% 
  group_by(variable) %>% 
  summarise(max_N_digits = max(N_digits)) %>% 
  pivot_wider(names_from = variable, values_from = max_N_digits)

Returns:

  value1 value2 value3
   <int>  <int>  <int>
1      4      0      5
dario
  • 6,415
  • 2
  • 12
  • 26
  • I really like your answer, thank you! One question: When there are numbers with no deciaml points, i.e. the number 1200, your code counts 4 digits. How to adjust the code, so it is 0 when there are no points? And one little remark: In the second line of the pipe, the last "names_to" is too much. – C. Sebastian Oct 13 '21 at 11:29
0

Tidyverse solutions:

library(tidyverse)
# Option 1 using `purrr::`:
df1 %>% 
  summarise_if(is.double, ~max(
    str_length(
      map_chr(
        str_split(
          as.character(.),
          "\\.",
        ),
        function(x) x[[2]]
      )
    )
  )
)

# Option 2 using regex:
df1 %>%
  summarise_if(
    is.double,
    function(x){
      max(
        str_length(
          str_replace(
            x,
            ".*\\.(\\d+)",
            "\\1"
          )
        )
      )
    }
  )

Base solution:

# Option 1:
# Resolve the name double vectors: double_vecs => character vector
double_vecs <- names(df1)[vapply(df1, is.double, logical(1))]

# calculate the max number of decimal points in each column: 
# res => named integer vector
res <- setNames(
  vapply(
    lapply(
        data.frame(
          Vectorize(gsub)(
            ".*\\.(\\d+)",
            "\\1",
            df1[,double_vecs]
          )
        ),
      nchar
    ),
    max,
    integer(1),
    USE.NAMES = FALSE
  ),
  double_vecs
)

# Option 2:
# Resolve the index of double vectors: col_idx => logical vector
col_idx <- vapply(df1, is.double, logical(1))

# Matrix holding values representing the number of characters 
# after a decimal point: len_mat => matrix
len_mat <- apply(
  Vectorize(gsub)(
    ".*\\.(\\d+)",
    "\\1",
    df1[,col_idx]
  ),
  seq_len(2),
  nchar
)

# Get the maximum of each column: res => named integer vector
res <- setNames(
  do.call(
    pmax, 
    c(
      as.data.frame(
        t(len_mat)
      )
    )
  ),
  names(df1)[col_idx]
)
hello_friend
  • 5,682
  • 1
  • 11
  • 15