0

I'm writing a function that will subset a dataframe based on different conditions. I need to return the dataframe with maximum row count.

df2 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=200))
df3 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=90))
df4 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=600))
df5 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=70))

max_row_df = ifelse(nrow(df) > nrow(df2) & nrow(df) > nrow(df5), deparse(substitute(df)),
                    ifelse(nrow(df2) > nrow(df3), deparse(substitute(df2)),
                           ifelse(nrow(df3) > nrow(df4), deparse(substitute(df3)),
                                  ifelse(nrow(df4) > nrow(df5),deparse(substitute(df4)),
                                  deparse(substitute(df5))))))
max_row_df

This statement has flaw in logic but is only method to return the name of the dataframe, which is what I need in order to return the selected dataframe from the function.

row_lengths <- c(nrow(df), nrow(df2), nrow(df3), nrow(df4), nrow(df5))
max_row <- max(row_lengths)

Can't deparse the df names in the method above. Is there a better approach as if and for only returning boolean values.

Any insight appreciated.

Magnetar
  • 85
  • 8
  • Hello Magnetar, please clarify: why you chose to use deparse(substitute(df)) in first place?? do you must return the "name" of the df variable or the df itself? – Ric Nov 14 '22 at 01:09
  • 1
    [Put them in a `list`](https://stackoverflow.com/a/24376207/903061), `df_list = mget(paste0("df", c("", as.character(2:5)))` find the one with the max rows i_most = which.max(sapply(df_list, nrow))` return it `df_list[[i_most]]`. – Gregor Thomas Nov 14 '22 at 01:26

2 Answers2

1

Here's a solution. You put your data.frames in a list and use purrr::reduce to compare them and keep the largest one:

library(purrr)

df2 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=200))
df3 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=90))
df4 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=600))
df5 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=70))

reduce(
  list(df2, df3, df4, df5),
  ~ if (nrow(.x) > nrow(.y)) .x else .y
)
Santiago
  • 641
  • 3
  • 14
0

This worked, thanks Gregor! Don't need to pull df name in this instance.

# create a list of objects collected based on specified name
  df_list = mget(paste0("df", c("", as.character(2:5))))
  
  # find the object in the list with max rows and return
  i_most = which.max(sapply(df_list, nrow))
  
  return(df_list[[i_most]])
Magnetar
  • 85
  • 8