pivot_wider introduces NA values when reshaping from long to wide format

Question

I am encountering a problem when I reshape my data using pivot_wider().

My data looks like this:

df <- data.frame(area = c( "Area1","Area1","Area1","Area2","Area2","Area2","Area3","Area3","Area3"),
                   species = c("species1","species2","species3","species1","species2","species3","species1","species2","species3"),
                   season= c("Season1","Season1","Season1","Season2","Season2","Season2","Season3","Season3","Season3"),
                   value= c(2,3,5,7,9,2,6,9,3))

I am able to change the data frame to the wide format as below.

df_wide <- df %>%
  mutate(row = row_number()) %>%
  pivot_wider(id_cols= c(row,species),
              ,names_from = "season",
              values_from = "value") %>%
  select(-row)

this is the figure of output.

My problem is that it introduces NAs because pivot_wider() makes a new column for each value.

If you help me, I would be great...

score 4 · Answer 1 · answered Oct 14 '20 at 15:24

Try this. You want a wide dataframe so the issue is that row numbers and area are creating additional rows that can mess the expected output. One way to solve the issue can be:

library(dplyr)
library(tidyr)
#Code
newdf <- df %>% select(-area) %>% pivot_wider(names_from = season,values_from=value)

Output:

# A tibble: 3 x 4
  species  Season1 Season2 Season3
  <fct>      <dbl>   <dbl>   <dbl>
1 species1       2       7       6
2 species2       3       9       9
3 species3       5       2       3

s_baldur · Answer 2 · 2020-10-14T15:35:59.217

4

library(tidyr)

pivot_wider(df, id_cols = species, names_from = season, values_from = value)

#   species  Season1 Season2 Season3
#   <chr>      <dbl>   <dbl>   <dbl>
# 1 species1       2       7       6
# 2 species2       3       9       9
# 3 species3       5       2       3

Or base R:

cbind(species = unique(df$species), unstack(df, value ~ season))

#    species Season1 Season2 Season3
# 1 species1       2       7       6
# 2 species2       3       9       9
# 3 species3       5       2       3

edited Oct 14 '20 at 15:35

answered Oct 14 '20 at 15:30

s_baldur

29,441
4
36
69

when I use this to the my real data which is too big to upload here, I have got the message :Warning message: Values are not uniquely identified; output will contain list-cols. * Use `values_fn = list` to suppress this warning. * Use `values_fn = length` to identify where the duplicates arise * Use `values_fn = {summary_fun}` to summarise duplicates – pomatomus Oct 14 '20 at 15:40
when I use "values_fn = {summary_fun}" to summarize the values for same species in same season, there is one more error message like "Error in pivot_wider_spec(data, spec, !!id_cols, names_repair = names_repair, : object 'summary_fun' not found – pomatomus Oct 14 '20 at 15:42
@pomatomus It means in the data the species + season combination are not all unique. So there are more than one value. – s_baldur Oct 14 '20 at 16:04
@sindri_badur yes you are right.. because I have other parameters. for example same season, same area, same species but different depths. So for the species one there is like c(2,3,5,6) in Season 1 column – pomatomus Oct 14 '20 at 16:09
1

I added "values_fn=sum" argument and It solved – pomatomus Oct 14 '20 at 16:33

pivot_wider introduces NA values when reshaping from long to wide format

2 Answers2