-3

My problem is that I have data below(input table). But I want the table like the output table. So far I have not found a solution to get this done.

Inpunt table: Input table

Output table:Output table

Input dataset:

set.seed(1)
Data <- data.frame(
  set = (1:10),
  Topic = sample(1:5),
  Label = sample(c("A", "B", "C"), 10, replace = TRUE),
  Score = sample(1:10)
)
Data
   set Topic Label Score
1    1     1     C     3
2    2     2     B     5
3    3     3     A    10
4    4     4     A     9
5    5     5     A     2
6    6     1     A     8
7    7     2     B     4
8    8     3     B     1
9    9     4     B     6
10  10     5     C     7

Output data:
#In the columns I want the Topic (T).

             T1    T2    T3    T4   T5
Label A       1     0     1     1    1
Label B       0     2     1     1    0
Label C       1     0     0     0    1 
Score (avg)  5.5   4.5   5.5   7.5  4.5  
Set (count)   2     2     2     2    2

I have tried from tidyr the spread function but I get a lot of NA values and I get no numbers.

Data_1 <- spread(Data, key = Topic, value = Label

Guest987
  • 15
  • 4
  • 3
    Have you tried `t(data)`? This will transpose your data (i.e. switch rows with columns). Here's a helpful page https://www.r-statistics.com/tag/transpose/ – TheSciGuy May 09 '19 at 15:15
  • 2
    What exactly did you try with `spread`? There are many, *many* SO posts on transposing data, so if there's some reason none of those posts helped, that would be good to be able to see. And why is there a row with no row label in your output table? Is that supposed to be that you have columns named "Topic 1", etc? – camille May 09 '19 at 15:16
  • How about something like: `require(tidyr)` `spread(df, key = topic, value = label)` – bob1 May 09 '19 at 15:21
  • 1
    Please provide a minimal working example. Most users don't have time to transpose your images into data. See [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1) – sempervent May 09 '19 at 16:17
  • Thank you for your comments. I have made a sample data. I have given an example of what spread will look like with NA value. – Guest987 May 10 '19 at 08:48
  • What is `Num` in expected output? – s_baldur May 10 '19 at 09:06
  • Please try reformulating the question. Judging by the table you provided you're actually trying to create a contingency table, and *not* "convert rows to columns and columns to rows". Your output table example is also incorrect. –  May 10 '19 at 09:46

1 Answers1

0

Your question implies that you want to transpose a dataframe, i.e. t(df), in which case you would turn this…

# A tibble: 7 x 4
  set   topic label score
  <chr> <dbl> <chr> <dbl>
1 X1        1 A         5
2 X2        1 A         5
3 X3        2 B        10
4 X4        2 A        10
5 X5        2 C         7
6 X6        3 A        10
7 X7        3 C        10

into this…

      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
set   "X1" "X2" "X3" "X4" "X5" "X6" "X7"
topic "1"  "1"  "2"  "2"  "2"  "3"  "3" 
label "A"  "A"  "B"  "A"  "C"  "A"  "C" 
score " 5" " 5" "10" "10" " 7" "10" "10"

But your example tables make it obvious that you actually need a contingency table:

# Generate a contingency table.
cont_table <- unclass(table(df$label, df$topic))

# Give the columns appropriate names.
colnames(cont_table) <- paste("Topic", colnames(cont_table))

cont_table

#### OUTPUT ####
    Topic 1 Topic 2 Topic 3
  A       2       1       1
  B       0       1       0
  C       0       1       1

To add the means and the totals do the following:

library(dplyr)

# Get the mean for each topic.
means <- df %>% group_by(topic) %>% summarise(mean(score))


# Bind topic means and column sums to contingency table.
out_mat <- rbind(cont_table,
                "Score (avg)" = means[[2]],
                Num = colSums(cont_table)
                )

out_mat

#### OUTPUT ####

            Topic 1 Topic 2 Topic 3
A                 2       1       1
B                 0       1       0
C                 0       1       1
Score (avg)       5       9      10
Num               2       3       2

The final output looks similar to your output table, but there are some differences. I suspect that your output table is incorrect. Please add some clarification to your original question if that's not the case.