How can i calculate ratio of values in a dataframe which have been splitted already using lapply inside a for loop?

Question

My data is splitted into a few tables. In each of these tables I find how many times A,B,C,D occur. Here is the output I want:- https://docs.google.com/spreadsheets/d/1lAyaWEDE3WYYEj5pR0iYy1E7JweIO_KIV8OLkxqc1LE/edit#gid=0

I created this function( I have explained below what it does and I have also attached my data using dput(), I have also attached a spreadsheet which shows how the output should look like):-

u<- c("D", "B", "C", "A")

    for (i in u) {
      print(data.frame(sapply(split(final,Q), function(x) sum(x == i, na.rm = TRUE))))
      
    }

I have 2 variables -> 'u' and 'final'. I wanted to see how many times each character in u(A,B,C,D) occurs in my 'final'(name of variable) matrix which has been splitted in a few parts(keep reading for further explanation). So I created a for loop. Inside the loop, I created a function to count how many times each element of 'u'(A,B,C,D) occurs in 'final' matrix. The function coerces the whole dataframe to a logical matrix using == and then uses sum() to add up the TRUE values (i.e., 1's). This gives me number of times 'u'(A,B,C,D) occurs in 'final'.

I also have a function that splits my 'final' matrix based on a column in the final matrix called Q(it's the end column).

The output I get looks like this:-

sapply.split.final..Q...function.x..sum.x....i..na.rm...TRUE..
1                                                            191
2                                                            310
3                                                            127
4                                                            152
  sapply.split.final..Q...function.x..sum.x....i..na.rm...TRUE..
1                                                            217
2                                                            361
3                                                            121
4                                                            168
  sapply.split.final..Q...function.x..sum.x....i..na.rm...TRUE..
1                                                            236
2                                                            373
3                                                            136
4                                                            204
  sapply.split.final..Q...function.x..sum.x....i..na.rm...TRUE..
1                                                            142
2                                                            320
3                                                            136
4                                                            226

This is close to what I wanted. I wanted to divide my data in a few parts according to a column Q and I wanted to count how many times each element 'u' occurs in each splitted table.

However I want to modify this table a bit. I want to see the ratio of each value in each splitted table instead of the values themselves like this

final<-structure(list(X70 = c("D", "A", "D", "D", "D", "D", "D", "D", 
"D", "B", "B", "D", "A", "D", "D", "C", "D", "C", "D", "D", "D", 
"D", "D", "D", "B", "D", "D", "D", "D", "D", "D", "D", "D", NA, 
"D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", 
"D"), X71 = c("A", "B", NA, "A", "B", "C", "A", "B", "B", "B", 
"C", "C", "C", "D", "A", "A", "C", "D", "C", "C", "C", "B", "C", 
"A", "C", "C", NA, "D", "B", "C", "A", "C", "B", "C", "C", "C", 
"C", "A", "D", "C", "A", "B", "B", "D", "C", "C", "C", "C"), 
    X72 = c("B", "B", "C", "D", "C", "D", "C", "A", "C", "A", 
    "A", "C", "C", NA, "A", "C", "B", "B", "A", "C", "D", "C", 
    NA, "C", "C", NA, "C", "C", "B", "A", "B", "C", "C", "C", 
    "D", "C", "D", "B", "C", "A", "D", "C", "B", "B", NA, "B", 
    NA, "A"), X73 = c(NA, NA, "D", "B", "D", "D", "D", "B", "D", 
    "C", "C", "C", "C", "C", "B", "D", "B", "A", "D", "C", "D", 
    "D", "B", "D", "C", NA, "D", "A", "A", "A", "D", "A", "B", 
    "D", "C", "B", NA, "C", "C", "D", "D", "C", "A", "B", NA, 
    "A", "A", "C"), X74 = c("C", NA, "A", "D", "C", "B", "C", 
    "B", "D", "A", "D", "C", "C", "D", "C", "B", "D", "B", "A", 
    "D", "C", "C", "C", "C", "B", "B", "D", NA, NA, "D", "A", 
    "B", "D", "B", "D", "A", NA, "D", "C", "A", "D", "C", "D", 
    "A", NA, "D", "D", "D"), X75 = c("C", NA, "C", "B", "C", 
    "C", "C", "C", "C", "C", "C", "B", "C", "B", NA, "C", "C", 
    "D", "C", "C", "B", "C", "C", "C", "D", "C", "D", "C", "C", 
    "C", "C", "C", "A", "C", "C", "B", "C", "C", "B", "C", "C", 
    "C", "C", "D", "C", "D", "C", "C"), X76 = c("D", NA, "D", 
    "A", "D", "D", "D", "D", "D", "A", "D", "D", "D", "A", NA, 
    "D", "B", "B", "A", "D", "D", "A", "D", "D", "D", "D", "C", 
    "D", "D", "D", "D", "D", "A", "D", "A", "D", "D", "D", "D", 
    "D", "D", "A", "D", "D", "B", "D", "D", "A"), X77 = c(NA, 
    NA, "B", "B", "B", "D", "D", "D", "B", "C", "B", "B", "B", 
    "B", NA, "B", "B", "B", "C", "B", "D", "B", NA, "B", "B", 
    "B", "C", "B", "A", "B", "B", NA, "B", NA, "B", "C", "B", 
    "B", "B", "B", "D", "C", "B", "D", "D", "D", NA, "C"), X78 = c("C", 
    NA, "C", "C", "D", "C", "C", "B", "C", "C", "B", "C", "C", 
    "C", NA, "D", "B", "B", "C", "B", "C", "C", "A", "A", "A", 
    "B", "A", "C", "A", "C", "C", "B", "B", "D", "A", "C", "D", 
    "C", "C", "A", "C", "C", NA, "C", NA, "C", "C", "C"), X79 = c("D", 
    NA, "D", "A", "D", "D", "A", "C", "D", "C", "D", NA, "D", 
    "D", NA, "D", NA, NA, "B", "D", "A", NA, "D", "A", "D", "A", 
    "A", "A", "A", "D", "D", "D", "A", "A", "D", "D", "D", "A", 
    "B", "D", "D", "A", NA, "A", "D", "D", "C", "C"), X80 = c("C", 
    NA, "C", NA, "C", "C", "C", "B", "C", "A", "A", NA, "C", 
    "C", NA, "C", NA, NA, "D", "C", "C", NA, NA, "C", "B", "B", 
    "C", "B", "C", "C", "C", "C", NA, "C", "B", "C", "C", "C", 
    "C", "C", "B", "C", NA, "C", "A", "C", "C", "A"), s = c(17, 
    16, 22, 18, 19, 20, 23, 18, 19, 17, 10, 12, 18, 17, 20, 13, 
    11, 14, 22, 18, 16, 16, 17, 21, 17, 13, 15, 13, 22, 22, 18, 
    7, 13, 21, 23, 19, 18, 20, 14, 24, 19, 13, 13, 18, 11, 13, 
    18, 26), Q = c(2, 2, 4, 2, 3, 3, 4, 2, 3, 2, 1, 1, 2, 2, 
    3, 1, 1, 2, 4, 2, 2, 2, 2, 4, 2, 1, 2, 1, 4, 4, 2, 1, 1, 
    4, 4, 3, 2, 3, 2, 4, 3, 1, 1, 2, 1, 1, 2, 4)), row.names = c(NA, 
-48L), class = "data.frame")

Edit: My output and your output might look a bit different because I have changed the data a little bit.

Can you please add what you want the outcome to be. I'm having a hard time understanding what you are doing. Or the original data since I guess that is the resutl. Can you make a smaller data set ? — Elin, Jun 08 '21 at 20:53
I have attached a spreadsheet which shows the outcome . i will link it again https://docs.google.com/spreadsheets/d/1lAyaWEDE3WYYEj5pR0iYy1E7JweIO_KIV8OLkxqc1LE/edit#gid=0 — ujjwal tyagi, Jun 08 '21 at 20:54
My data is splitted into a few tables. In each of these tables I find how many times A,B,C,D occur. — ujjwal tyagi, Jun 08 '21 at 20:56
Why isn't this a cross tab with marginals? I'm not sure why you are makng it as complex as you are if you just want a bunch of proportions. — Elin, Jun 08 '21 at 20:58
The main reason i am doing this is because i want to eventually make a plot like this https://imgur.com/a/bzrCIJt. Groups are the different splitted data and ABCD are shown as proportions — ujjwal tyagi, Jun 08 '21 at 21:02
Really I think you are making t more complicated than you need it to be? Why do you want a list? — Elin, Jun 08 '21 at 21:18
I am very new to R and I started learning it a month ago seriously. Sorry about that — ujjwal tyagi, Jun 08 '21 at 21:26

akrun · Accepted Answer · 2021-06-08T21:32:54.653

2

If we are splitting by the 'Q', select the columns that startsWith 'X', split by the 'Q' column into a list of data.frame, loop over the list with sapply, get the frequency count on the unlist data with table and convert to proportions

out <- sapply(split(final[startsWith(names(final), "X")], 
          final$Q), function(x) proportions(table(unlist(x))))

-output

out
    1         2          3          4
A 0.1293103 0.1297297 0.08450704 0.22641509
B 0.2586207 0.2108108 0.15492958 0.09433962
C 0.3189655 0.3459459 0.33802817 0.36792453
D 0.2931034 0.3135135 0.42253521 0.31132075

if we want to do a plot, use matplot

matplot(t(out), type = 'l', col = 1:4, xaxt = 'n')
axis(side=1, at=1:4, labels=colnames(out))
legend("topleft", legend = row.names(out), fill = 1:4)

edited Jun 08 '21 at 21:32

answered Jun 08 '21 at 20:57

akrun

874,273
37
540
662

https://docs.google.com/spreadsheets/d/1lAyaWEDE3WYYEj5pR0iYy1E7JweIO_KIV8OLkxqc1LE/edit#gid=0 this spreadsheet explains what output i want – ujjwal tyagi Jun 08 '21 at 20:59
@ujjwaltyagi those numbers are not matching with the values I get from your dput calcuation – akrun Jun 08 '21 at 21:01
Yes i changed the data a little bit sorry. – ujjwal tyagi Jun 08 '21 at 21:03
@ujjwaltyagi just to confirm `proportions(c(191, 310, 127, 152))# [1] 0.2448718 0.3974359 0.1628205 0.1948718` gets the same output as i nthe xcel – akrun Jun 08 '21 at 21:05
1

@ujjwaltyagi can you please test the code in your original data and confirm, thanks – akrun Jun 08 '21 at 21:07
2

It just blows my mind that what I had been working on for 4 hours gets solved here in minutes. this is perfect and much better than I imagined! Thank you! can you also help out in how can I display this data like this https://imgur.com/a/bzrCIJt? – ujjwal tyagi Jun 08 '21 at 21:12
The x axis now has 1.5 and 2.5 and 3.5. How can I remove them? 1 2 3 4 are supposed to be groups of people. Decimal will not make sense – ujjwal tyagi Jun 08 '21 at 21:30
1

@ujjwaltyagi i just upddated. You can use `xaxt = 'n'` and then use `axis` to change – akrun Jun 08 '21 at 21:33
https://stackoverflow.com/questions/67909583/find-proportion-of-times-each-charactera-b-c-d-occurs-in-each-column-of-a-list Can you help me figure this one out? It is very similar and I have spent hours on this with no avail. – ujjwal tyagi Jun 09 '21 at 18:08
@ujjwaltyagi i posted a solution there. is that the expected – akrun Jun 09 '21 at 18:13

score 1 · Answer 2 · answered Jun 08 '21 at 21:33

You could also do this

library(dplyr)
library(tidyr)
final_l <- final %>% 
     pivot_longer(values_to = "Letters_f",
              cols = starts_with("X")) 

 prop.table(  table(final_l$Q, final_l$Letters_f), margin = 1)

           A          B          C          D
  1 0.12931034 0.25862069 0.31896552 0.29310345
  2 0.12972973 0.21081081 0.34594595 0.31351351
  3 0.08450704 0.15492958 0.33802817 0.42253521
  4 0.22641509 0.09433962 0.36792453 0.31132075

You did that in such few lines of code!! I think I need to learn tidy and dyplr — ujjwal tyagi, Jun 08 '21 at 21:35

How can i calculate ratio of values in a dataframe which have been splitted already using lapply inside a for loop?

2 Answers2