0

I have applied cross tabulation with table() to two columns of my dataframe and I get something like this cross tabulation

Now I would like to add here a new column with the row totals but when I try this...

tablaCruzada$Total<-0

Warning message: In tablaCruzada$Total <- 0 : Realizando coercion de LHD a una lista

Any idea, please?

regards

kintela
  • 1,283
  • 1
  • 14
  • 32
  • convert matrix to table with `as.data.frame(tablaCruzada)` or `dplyr::as_tibble(tablaCruzada)` and then use the `tablaCruzada$Total <- 0 ` – nikn8 May 08 '20 at 09:06
  • 1
    Does this answer your question? [R: \`ID : Coercing LHS to a list\` in adding an ID column, why?](https://stackoverflow.com/questions/40681628/r-id-coercing-lhs-to-a-list-in-adding-an-id-column-why) – nikn8 May 08 '20 at 09:07
  • If I convert my table object to a dataframe object I lose my columns. I get only 3 columns. var1 with the names, var2 with the Project Codes and freq with the sum by Project and I would like to see all my table columns plus one ore with the sum by each row – kintela May 08 '20 at 10:40
  • would you mind sharing sample data using `dput(head(tablaCruzada))`? – nikn8 May 08 '20 at 11:44
  • Here is the file result of dput; https://www.dropbox.com/s/1zqhjdpum7r1dji/tablacruzada.csv?dl=0 – kintela May 08 '20 at 14:34

3 Answers3

1

Convert table to dataframe/matrix first , then add new column using rowSums.

Using reproducible example from mtcars.

temp <- table(mtcars$cyl, mtcars$am)
df <- as.data.frame.matrix(temp)
df$Total <- rowSums(df)
#Or if you just want to initialize
#df$Total <- 0
df

#   0 1 Total
#4  3 8    11
#6  4 3     7
#8 12 2    14
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

You may want to explore janitor package. Here's an example using iris dataset:

library(dplyr)
library(janitor)

iris %>% 
  tabyl(Sepal.Length, Species) %>% 
  adorn_totals(where = "col")

# Sepal.Length setosa versicolor virginica Total
# 4.3      1          0         0     1
# 4.4      3          0         0     3
# 4.5      1          0         0     1
# 4.6      4          0         0     4
# 4.7      2          0         0     2
# 4.8      5          0         0     5
# ....
  • tabyl() provides a 2-way frequency table, where the result is a data.frame
  • adorn_totals() includes a column for the total
HNSKD
  • 1,614
  • 2
  • 14
  • 25
  • tabyl(df$JefeProyecto,df$Proyecto) Error in show_na && sum(is.na(result[[1]])) > 0 : invalid 'x' type in 'x && y' df$JefeProyecto is character and df$Proyecto numeric – kintela May 08 '20 at 14:49
0

Also, based on @Ronak's data, use apply

df$Total <- apply(df, 1, sum)
df
   0 1 Total
4  3 8    11
6  4 3     7
8 12 2    14
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • Why to use `apply(df, 1, sum)` to sum each row instead of `rowSums(df)`? It's inefficient. – Darren Tsai May 08 '20 at 09:27
  • When I post a question to SO I'm always grateful for multiple solutions because I can learn from them even though the answers may differ in efficiency. And a solution that may lag in efficiency in the context of one problem may outshine other solutions in efficiency in other contexts. – Chris Ruehlemann May 08 '20 at 09:37
  • @Darren Tsai Forgot to say a word about your assessment "It's inefficient." How do you measure efficiency? The main criterion, in my mind, is whether code accomplishes what it is supposed to accomplish. That is the case. So surely you must apply some other criterion to assess efficiency. Maybe a good idea to specify that criterion when assessing somebody's solution. Without such specification, the assessment comes across as rude. – Chris Ruehlemann May 08 '20 at 12:15
  • I meant no offense. To focus on how to sum each row of a dataset, `rowSums()` is a "standard" way in R. For example, if you need to sum many numbers, you must use `sum()` instead of writing a for loop. It's why I said "standard". In addition, `apply(df, 1, sum)` is indeed much more time-consuming. It's why I said "inefficient". You can use the following code to test: `m <- 1000 ; n <- 50 ; df <- matrix(rnorm(m*n), m, n) ; library(microbenchmark) ; bm <- microbenchmark(apply = apply(df, 1, sum),rowSums = rowSums(df)) ; bm` – Darren Tsai May 08 '20 at 12:57
  • Fair point. But there's no indication in the OP that they have such a huge dataframe that the difference in split seconds would make itself felt. And even if size does matter, the qualification of `apply(df, 1, sum)` as "inefficient" is unjust and should be replaced by "less efficient" – Chris Ruehlemann May 08 '20 at 13:36