2

I have a dataframe in R that looks like this:

Word    Base    Number  Type
-       -       -       -
shoe    shoe    4834    singular
shoes   shoe    49955   plural
toy     toy     75465   singular
toys    toy     23556   plural
key     key     39485   singular
keys    key     6546    plural
jazz    jazz    58765   plural

I would like to transform it so that it looks like this:

Word_Sg Word_Pl Base    Num_Singular    Num_Plural
--      --      --      --              --
shoe    shoes   shoe    4834            49955
toy     toys    toy     75465           23556
key     keys    key     39485           6546
NA      jazz    jazz    NA              58765

So rather than having two rows for the values for singular & plural, I want to have two colums, one with the number for singular, and one with the number for plural.

I've tried a few things using dplyr::summarize, but so far, without any success. Here is the code that I've come up with so far:

dataframe1 <- dataframe %>% 
      mutate(Num_Singular = case_when(Type == "singular" ~ Number)) %>%
      mutate(Num_Plural = case_when(Type == "plural" ~ Number)) %>%
      dplyr::select(Word, Base, Num_Singular, Num_Plural) %>%
      group_by(Base) %>%
      dplyr::summarize(Num_Singular = paste(na.omit(Num_Singular)),
                       Num_Plural = paste(na.omit(Num_Plural))

However, it gives me this error:

Error in summarise_impl(.data, dots) : 
  Column `Num_Singular` must be length 1 (a summary value), not 2)

I think the problem might be that there are rows that don't necessarily have singular AND plural, but only either (e.g. "jazz"). Most rows have both though.

So how can I do this in R or dplyr?

CJ Yetman
  • 8,373
  • 2
  • 24
  • 56
rayne
  • 523
  • 1
  • 7
  • 24
  • Related: https://stackoverflow.com/questions/29775461/how-can-i-spread-repeated-measures-of-multiple-variables-into-wide-format – akraf Apr 27 '18 at 08:44
  • 1
    @akraf perhaps, though that solution (`gather`, `unite`, `spread`) relies on the columns having similar classes; in this example, doing so would convert the numbers to `character` when gathering. – r2evans Apr 27 '18 at 08:51

4 Answers4

4

If you first look at the first few columns::

select(dat, Base, Word, Type)[1:2,]
#   Base  Word     Type
# 1 shoe  shoe singular
# 2 shoe shoes   plural

From here, consider it just spreading it into a singular/plural columns, effectively going from "tall" to "wide". (It would be much more apparent if there were more than two categories in Type.)

select(dat, Base, Word, Type) %>%
    spread(Type, Word) %>%
    rename(Word_Pl=plural, Word_Sg=singular)
#   Base Word_Pl Word_Sg
# 1 jazz    jazz    <NA>
# 2  key    keys     key
# 3 shoe   shoes    shoe
# 4  toy    toys     toy

You can easily repeat this for Number as well. From there, it's just a matter of merging/joining them based on the key column, Base:

full_join(
  select(dat, Base, Word, Type) %>%
    spread(Type, Word) %>%
    rename(Word_Pl=plural, Word_Sg=singular),
  select(dat, Base, Number, Type) %>%
    spread(Type, Number) %>%
    rename(Num_Pl=plural, Num_Sg=singular),
  by = "Base"
)
#   Base Word_Pl Word_Sg Num_Pl Num_Sg
# 1 jazz    jazz    <NA>  58765     NA
# 2  key    keys     key   6546  39485
# 3 shoe   shoes    shoe  49955   4834
# 4  toy    toys     toy  23556  75465

Consumable data:

library(dplyr)
library(tidyr)
dat <- read.table(text='Word    Base    Number  Type
shoe    shoe    4834    singular
shoes   shoe    49955   plural
toy     toy     75465   singular
toys    toy     23556   plural
key     key     39485   singular
keys    key     6546    plural
jazz    jazz    58765   plural', header=TRUE, stringsAsFactors=FALSE)
r2evans
  • 141,215
  • 6
  • 77
  • 149
1

tidyr's new pivot_wider() function makes this simple...

library(dplyr)
library(tidyr)

dat <- read.table(header = T, stringsAsFactors = F, text='
Word    Base    Number  Type
shoe    shoe    4834    singular
shoes   shoe    49955   plural
toy     toy     75465   singular
toys    toy     23556   plural
key     key     39485   singular
keys    key     6546    plural
jazz    jazz    58765   plural')

dat %>% 
  pivot_wider(id_cols = Base, names_from = Type, values_from = c(Word, Number))

# # A tibble: 4 x 5
#   Base  Word_singular Word_plural Number_singular Number_plural
#   <chr> <chr>         <chr>                 <int>         <int>
# 1 shoe  shoe          shoes                  4834         49955
# 2 toy   toy           toys                  75465         23556
# 3 key   key           keys                  39485          6546
# 4 jazz  NA            jazz                     NA         58765
CJ Yetman
  • 8,373
  • 2
  • 24
  • 56
0

The core idea is to identify each data point by it's type and whether it's a word or a number... then it's easy to spread to the format you want. (I won't bother renaming the variables or ordering them specifically to match your expected output because that's easy to do and not part of the problem here)

library(dplyr)
library(tidyr)

dat <- read.table(header = T, stringsAsFactors = F, text='
Word    Base    Number  Type
shoe    shoe    4834    singular
shoes   shoe    49955   plural
toy     toy     75465   singular
toys    toy     23556   plural
key     key     39485   singular
keys    key     6546    plural
jazz    jazz    58765   plural')

dat %>% 
  gather(variable, value, Word, Number) %>%
  unite(Type, variable, Type) %>%
  spread(Type, value, convert = T) %>% 
  as_tibble()

# # A tibble: 4 x 5
#   Base  Number_plural Number_singular Word_plural Word_singular
#   <chr>         <int>           <int> <chr>       <chr>        
# 1 jazz          58765              NA jazz        NA           
# 2 key            6546           39485 keys        key          
# 3 shoe          49955            4834 shoes       shoe         
# 4 toy           23556           75465 toys        toy  
CJ Yetman
  • 8,373
  • 2
  • 24
  • 56
  • 1
    If you add `%>% str()` to the end, you will notice that after combining data from a `numeric` column with a `character` column, the numbers are converted to `character`. This may be an acceptable step (reconverting back to `integer`), but with `numeric` (floating point) I suggest that the numbers *will* be different. As such, this method (as suggested in @akraf's first comment) is only applicable when all of the "data" columns (versus "category" columns) are of the same or same-enough type. – r2evans Apr 27 '18 at 15:29
0

You could join the plural and singular subsets of your data by Base, then remove the Type columns and reorder the others...

full_join(filter(dat, Type == "plural"), 
          filter(dat, Type == "singular"),
          by = "Base", 
          suffix = c("_Pl", "_Sg")) %>% 
  select(Word_Sg, Word_Pl, Base, Number_Sg, Number_Pl)

#   Word_Sg Word_Pl Base Number_Sg Number_Pl
# 1    shoe   shoes shoe      4834     49955
# 2     toy    toys  toy     75465     23556
# 3     key    keys  key     39485      6546
# 4    <NA>    jazz jazz        NA     58765
CJ Yetman
  • 8,373
  • 2
  • 24
  • 56