2

My data is splitted into a few tables. In each of these tables I find how many times A,B,C,D occur. Here is the output I want:- https://docs.google.com/spreadsheets/d/1lAyaWEDE3WYYEj5pR0iYy1E7JweIO_KIV8OLkxqc1LE/edit#gid=0

I created this function( I have explained below what it does and I have also attached my data using dput(), I have also attached a spreadsheet which shows how the output should look like):-

u<- c("D", "B", "C", "A")

    for (i in u) {
      print(data.frame(sapply(split(final,Q), function(x) sum(x == i, na.rm = TRUE))))
      
    }

I have 2 variables -> 'u' and 'final'. I wanted to see how many times each character in u(A,B,C,D) occurs in my 'final'(name of variable) matrix which has been splitted in a few parts(keep reading for further explanation). So I created a for loop. Inside the loop, I created a function to count how many times each element of 'u'(A,B,C,D) occurs in 'final' matrix. The function coerces the whole dataframe to a logical matrix using == and then uses sum() to add up the TRUE values (i.e., 1's). This gives me number of times 'u'(A,B,C,D) occurs in 'final'.

I also have a function that splits my 'final' matrix based on a column in the final matrix called Q(it's the end column).

The output I get looks like this:-

sapply.split.final..Q...function.x..sum.x....i..na.rm...TRUE..
1                                                            191
2                                                            310
3                                                            127
4                                                            152
  sapply.split.final..Q...function.x..sum.x....i..na.rm...TRUE..
1                                                            217
2                                                            361
3                                                            121
4                                                            168
  sapply.split.final..Q...function.x..sum.x....i..na.rm...TRUE..
1                                                            236
2                                                            373
3                                                            136
4                                                            204
  sapply.split.final..Q...function.x..sum.x....i..na.rm...TRUE..
1                                                            142
2                                                            320
3                                                            136
4                                                            226

This is close to what I wanted. I wanted to divide my data in a few parts according to a column Q and I wanted to count how many times each element 'u' occurs in each splitted table.

However I want to modify this table a bit. I want to see the ratio of each value in each splitted table instead of the values themselves like this

final<-structure(list(X70 = c("D", "A", "D", "D", "D", "D", "D", "D", 
"D", "B", "B", "D", "A", "D", "D", "C", "D", "C", "D", "D", "D", 
"D", "D", "D", "B", "D", "D", "D", "D", "D", "D", "D", "D", NA, 
"D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", 
"D"), X71 = c("A", "B", NA, "A", "B", "C", "A", "B", "B", "B", 
"C", "C", "C", "D", "A", "A", "C", "D", "C", "C", "C", "B", "C", 
"A", "C", "C", NA, "D", "B", "C", "A", "C", "B", "C", "C", "C", 
"C", "A", "D", "C", "A", "B", "B", "D", "C", "C", "C", "C"), 
    X72 = c("B", "B", "C", "D", "C", "D", "C", "A", "C", "A", 
    "A", "C", "C", NA, "A", "C", "B", "B", "A", "C", "D", "C", 
    NA, "C", "C", NA, "C", "C", "B", "A", "B", "C", "C", "C", 
    "D", "C", "D", "B", "C", "A", "D", "C", "B", "B", NA, "B", 
    NA, "A"), X73 = c(NA, NA, "D", "B", "D", "D", "D", "B", "D", 
    "C", "C", "C", "C", "C", "B", "D", "B", "A", "D", "C", "D", 
    "D", "B", "D", "C", NA, "D", "A", "A", "A", "D", "A", "B", 
    "D", "C", "B", NA, "C", "C", "D", "D", "C", "A", "B", NA, 
    "A", "A", "C"), X74 = c("C", NA, "A", "D", "C", "B", "C", 
    "B", "D", "A", "D", "C", "C", "D", "C", "B", "D", "B", "A", 
    "D", "C", "C", "C", "C", "B", "B", "D", NA, NA, "D", "A", 
    "B", "D", "B", "D", "A", NA, "D", "C", "A", "D", "C", "D", 
    "A", NA, "D", "D", "D"), X75 = c("C", NA, "C", "B", "C", 
    "C", "C", "C", "C", "C", "C", "B", "C", "B", NA, "C", "C", 
    "D", "C", "C", "B", "C", "C", "C", "D", "C", "D", "C", "C", 
    "C", "C", "C", "A", "C", "C", "B", "C", "C", "B", "C", "C", 
    "C", "C", "D", "C", "D", "C", "C"), X76 = c("D", NA, "D", 
    "A", "D", "D", "D", "D", "D", "A", "D", "D", "D", "A", NA, 
    "D", "B", "B", "A", "D", "D", "A", "D", "D", "D", "D", "C", 
    "D", "D", "D", "D", "D", "A", "D", "A", "D", "D", "D", "D", 
    "D", "D", "A", "D", "D", "B", "D", "D", "A"), X77 = c(NA, 
    NA, "B", "B", "B", "D", "D", "D", "B", "C", "B", "B", "B", 
    "B", NA, "B", "B", "B", "C", "B", "D", "B", NA, "B", "B", 
    "B", "C", "B", "A", "B", "B", NA, "B", NA, "B", "C", "B", 
    "B", "B", "B", "D", "C", "B", "D", "D", "D", NA, "C"), X78 = c("C", 
    NA, "C", "C", "D", "C", "C", "B", "C", "C", "B", "C", "C", 
    "C", NA, "D", "B", "B", "C", "B", "C", "C", "A", "A", "A", 
    "B", "A", "C", "A", "C", "C", "B", "B", "D", "A", "C", "D", 
    "C", "C", "A", "C", "C", NA, "C", NA, "C", "C", "C"), X79 = c("D", 
    NA, "D", "A", "D", "D", "A", "C", "D", "C", "D", NA, "D", 
    "D", NA, "D", NA, NA, "B", "D", "A", NA, "D", "A", "D", "A", 
    "A", "A", "A", "D", "D", "D", "A", "A", "D", "D", "D", "A", 
    "B", "D", "D", "A", NA, "A", "D", "D", "C", "C"), X80 = c("C", 
    NA, "C", NA, "C", "C", "C", "B", "C", "A", "A", NA, "C", 
    "C", NA, "C", NA, NA, "D", "C", "C", NA, NA, "C", "B", "B", 
    "C", "B", "C", "C", "C", "C", NA, "C", "B", "C", "C", "C", 
    "C", "C", "B", "C", NA, "C", "A", "C", "C", "A"), s = c(17, 
    16, 22, 18, 19, 20, 23, 18, 19, 17, 10, 12, 18, 17, 20, 13, 
    11, 14, 22, 18, 16, 16, 17, 21, 17, 13, 15, 13, 22, 22, 18, 
    7, 13, 21, 23, 19, 18, 20, 14, 24, 19, 13, 13, 18, 11, 13, 
    18, 26), Q = c(2, 2, 4, 2, 3, 3, 4, 2, 3, 2, 1, 1, 2, 2, 
    3, 1, 1, 2, 4, 2, 2, 2, 2, 4, 2, 1, 2, 1, 4, 4, 2, 1, 1, 
    4, 4, 3, 2, 3, 2, 4, 3, 1, 1, 2, 1, 1, 2, 4)), row.names = c(NA, 
-48L), class = "data.frame")

Edit: My output and your output might look a bit different because I have changed the data a little bit.

ujjwal tyagi
  • 493
  • 2
  • 8
  • 1
    Can you please add what you want the outcome to be. I'm having a hard time understanding what you are doing. Or the original data since I guess that is the resutl. Can you make a smaller data set ? – Elin Jun 08 '21 at 20:53
  • I have attached a spreadsheet which shows the outcome . i will link it again https://docs.google.com/spreadsheets/d/1lAyaWEDE3WYYEj5pR0iYy1E7JweIO_KIV8OLkxqc1LE/edit#gid=0 – ujjwal tyagi Jun 08 '21 at 20:54
  • My data is splitted into a few tables. In each of these tables I find how many times A,B,C,D occur. – ujjwal tyagi Jun 08 '21 at 20:56
  • Why isn't this a cross tab with marginals? I'm not sure why you are makng it as complex as you are if you just want a bunch of proportions. – Elin Jun 08 '21 at 20:58
  • The main reason i am doing this is because i want to eventually make a plot like this https://imgur.com/a/bzrCIJt. Groups are the different splitted data and ABCD are shown as proportions – ujjwal tyagi Jun 08 '21 at 21:02
  • Really I think you are making t more complicated than you need it to be? Why do you want a list? – Elin Jun 08 '21 at 21:18
  • I am very new to R and I started learning it a month ago seriously. Sorry about that – ujjwal tyagi Jun 08 '21 at 21:26

2 Answers2

2

If we are splitting by the 'Q', select the columns that startsWith 'X', split by the 'Q' column into a list of data.frame, loop over the list with sapply, get the frequency count on the unlist data with table and convert to proportions

out <- sapply(split(final[startsWith(names(final), "X")], 
          final$Q), function(x) proportions(table(unlist(x))))

-output

out
    1         2          3          4
A 0.1293103 0.1297297 0.08450704 0.22641509
B 0.2586207 0.2108108 0.15492958 0.09433962
C 0.3189655 0.3459459 0.33802817 0.36792453
D 0.2931034 0.3135135 0.42253521 0.31132075

if we want to do a plot, use matplot

matplot(t(out), type = 'l', col = 1:4, xaxt = 'n')
axis(side=1, at=1:4, labels=colnames(out))
legend("topleft", legend = row.names(out), fill = 1:4)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • https://docs.google.com/spreadsheets/d/1lAyaWEDE3WYYEj5pR0iYy1E7JweIO_KIV8OLkxqc1LE/edit#gid=0 this spreadsheet explains what output i want – ujjwal tyagi Jun 08 '21 at 20:59
  • @ujjwaltyagi those numbers are not matching with the values I get from your dput calcuation – akrun Jun 08 '21 at 21:01
  • Yes i changed the data a little bit sorry. – ujjwal tyagi Jun 08 '21 at 21:03
  • @ujjwaltyagi just to confirm `proportions(c(191, 310, 127, 152))# [1] 0.2448718 0.3974359 0.1628205 0.1948718` gets the same output as i nthe xcel – akrun Jun 08 '21 at 21:05
  • 1
    @ujjwaltyagi can you please test the code in your original data and confirm, thanks – akrun Jun 08 '21 at 21:07
  • 2
    It just blows my mind that what I had been working on for 4 hours gets solved here in minutes. this is perfect and much better than I imagined! Thank you! can you also help out in how can I display this data like this https://imgur.com/a/bzrCIJt? – ujjwal tyagi Jun 08 '21 at 21:12
  • The x axis now has 1.5 and 2.5 and 3.5. How can I remove them? 1 2 3 4 are supposed to be groups of people. Decimal will not make sense – ujjwal tyagi Jun 08 '21 at 21:30
  • 1
    @ujjwaltyagi i just upddated. You can use `xaxt = 'n'` and then use `axis` to change – akrun Jun 08 '21 at 21:33
  • https://stackoverflow.com/questions/67909583/find-proportion-of-times-each-charactera-b-c-d-occurs-in-each-column-of-a-list Can you help me figure this one out? It is very similar and I have spent hours on this with no avail. – ujjwal tyagi Jun 09 '21 at 18:08
  • @ujjwaltyagi i posted a solution there. is that the expected – akrun Jun 09 '21 at 18:13
1

You could also do this

library(dplyr)
library(tidyr)
final_l <- final %>% 
     pivot_longer(values_to = "Letters_f",
              cols = starts_with("X")) 

 prop.table(  table(final_l$Q, final_l$Letters_f), margin = 1)
           A          B          C          D
  1 0.12931034 0.25862069 0.31896552 0.29310345
  2 0.12972973 0.21081081 0.34594595 0.31351351
  3 0.08450704 0.15492958 0.33802817 0.42253521
  4 0.22641509 0.09433962 0.36792453 0.31132075

Elin
  • 6,507
  • 3
  • 25
  • 47