1

I Have a Dataframe that is Mean Tables by PSPP. I would like to reshape it in order to manipulate it easier for plots in calc.

What I want to do?

  1. This Table contains descriptive statistics such as Mean, SD, N.
  2. The levels of Categorical variables are populated vertically.
    V1 V1_levelA, V1_levelB, | V2 V2_levelA, V2_levelB ... etc.
  3. Descriptive statistics are vertically displayed

I would like the first variable to be populated horizontally, and the next one vertically. Please see, the attached image for more information.

The result must take into consideration that Table may missing entire factor levels - cause they may have not "values", and therefore not included in the input table in the form of csv.

I hope this huge editing be clearer now what I am asking.

Dput sample df similar to that of the posted image:

    df <-   structure(list(structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "v1", "v2"), class = "factor"), 
    varA = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 1L, 
    2L, 3L, 3L, 4L, 4L), .Label = c("k1", "k2", "k3", "k4", "varA"
    ), class = "factor"), Age = structure(c(1L, 2L, 1L, 2L, 1L, 
    2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L), .Label = c("a1", 
    "a2", "Age"), class = "factor"), Mean = structure(1:15, .Label = c("10", 
    "11", "12", "13", "14", "15", "16", "17", "18", "19", "21", 
    "22", "23", "24", "25", "Mean"), class = "factor"), N = structure(c(1L, 
    8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 2L, 3L, 4L, 5L, 6L, 
    7L), .Label = c("1", "10", "12", "13", "14", "15", "16", 
    "2", "3", "4", "5", "6", "7", "8", "9", "N"), class = "factor")), row.names = 2:16, class = "data.frame")

*Update** Check that: My input and the desired Output: https://postimg.cc/N2GTZd09 enter image description here

Estatistics
  • 874
  • 9
  • 24
  • 1
    *"change the format of table from long to wide"* Why do you want to do that? Data in long format is usually easier to manage/manipulate/summarise. Having 22 `data.frame`s with 3 variable columns (`Mean`, `N`, `sd`) would give you a wide `data.frame` with >66 columns. I don't know what your downstream goal is but that sounds like an awkward structure to manage. – Maurits Evers Aug 07 '19 at 23:36
  • 2
    Even if you do ultimately need it wide, combining them all is much easier and faster in long format - it's just `dplyr::bind_rows(a)`. You could reshape afterwards. Add an id to keep track of which df they came from with `dplyr::bind_rows(a, .id = "id")`. – Marius Aug 07 '19 at 23:56
  • Dear @Marius, I have the DF in long format. I have an output by PSPP statistical program. I would like to change the structure of table. Ultimately, I would like to make it like that (it is easier to manipulate it for plots in calc): See that please: https://postimg.cc/K3fCzbRQ – Estatistics Aug 08 '19 at 00:32
  • Please, consider also, that levels of variables may missing completely, therefore, I would like a function to take care of that. I cannot simply put one line under the other. e.g. "k1" may missing in some "df"s – Estatistics Aug 08 '19 at 00:47
  • @EliasEstatisticsEU I'm struggling to make sense of what you're asking. As explained, it's a lot easier to combine data in long format (which is what you already have); then group/summarise as required. I don't see what this has to do with PSPP. Nor do I understand what missing values have to do with combining data. Perhaps you should take a step back and provide a reproducible example including **minimal** sample data and matching expected output. A few rows/columns for a few `data.frame`s should suffice. – Maurits Evers Aug 08 '19 at 02:03
  • @Murits, check my Update in the original post. I hope I am clear what I am asking now. – Estatistics Aug 08 '19 at 02:36
  • Please, tell me how to improve the post in order to be more "understandable". To remove all the others, and leave only the update? – Estatistics Aug 08 '19 at 02:37
  • The whole question changed, as well its title. I hope now it is a lot of clearer. Thanks again for your comments. – Estatistics Aug 08 '19 at 09:18
  • @EliasEstatisticsEU This is still not clear. Let's go about this a different way. Please review what you can (and should) do in order to provide a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including sample data. You seem to have been around SO for long enough to know what makes a good question. Unfortunately, this is not one of them. Sharing screenshots of data is *never* a good idea as we cannot copy&paste data from an image. So please revise your question according to SO posting guidelines. – Maurits Evers Aug 08 '19 at 11:46
  • added Dput sample df similar to that of the posted image. – Estatistics Aug 08 '19 at 12:04
  • Sample data and expected output don't match. Your sample data has columns `Mean` and `N` while your expected output has `M` and `SD`. Are they supposed to be the same, and this is a typo? Similarly, `Age` in `df` is different from your expected output. – Maurits Evers Aug 08 '19 at 13:13

1 Answers1

1

I am still unclear about your expected output as your input data and expected output don't match.

That aside, perhaps this is what you're after?

library(tidyverse)
df %>%
    rename(group = 1) %>%                # Name first column
    mutate_at(1, na_if, "") %>%          # Replace "" with NA
    fill(group) %>%                      # Fill first column with missing values
    group_by(group) %>%
    nest() %>%                           # Nest data by group
    mutate(data = map(data, ~.x %>%
        gather(k, v, -varA, -Age) %>%    # Wide to long
        unite(k, varA, k) %>%            # Unite varA with variable column
        spread(k, v))) %>%               # Spread from long to wide
    unnest()                             # Unnest
## A tibble: 4 x 10
#  group Age   k1_Mean k1_N  k2_Mean k2_N  k3_Mean k3_N  k4_Mean k4_N
#  <fct> <fct> <chr>   <chr> <chr>   <chr> <chr>   <chr> <chr>   <chr>
#1 v1    a1    10      1     12      3     14      5     16      7
#2 v1    a2    11      2     13      4     15      6     17      8
#3 v2    a1    18      9     NA      NA    22      13    24      15
#4 v2    a2    19      10    21      12    23      14    25      16
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Almost perfect! Thank you! When I saw it, I was scared a little! I am very unfamiliar with that type of syntax. If there is additional vertical variables, I must change gather, unite, and spread, accordingly? gather(k, v, -varB -varA, -Age) %>% unite(k, varB, varA, k) %>% spread(k, v))) %>% – Estatistics Aug 08 '19 at 13:48
  • 1
    @EliasEstatisticsEU Hmm, I think this should work for more variable columns too. `gather(k, v, -varA, -Age)` will turn all wide columns *except* `varA` and `Age` into long format. So it doesn't matter how many additional columns there are. `unite` will then automatically create the new column names, before `spread`ing. – Maurits Evers Aug 08 '19 at 22:25