6

I have a large tibble (here). I created it by using this original dataset and running the following (previous post here):

#this code seemed to work
    library(tidyverse)
    df_tib <- df_full_subset %>%
      pivot_longer(cols = everything(), names_to = c("name", ".value"), names_pattern = "(.*)_(.*)") %>%
      select(-name) %>%
      pivot_wider(names_from = "01", values_from = "02", values_fn = list)

As can be seen in the previous post, there was a final bit of code to unnest that data. That didn't work for me so I tinkered with the tibble and found a few rubbish columns (e.g. a column of NAs), and removed those thinking that might help. However, I keep getting the same error: "Error: Incompatible lengths: 254, 257". This reads to me like dplyr is struggling with NAs in rows 254 and 257, but I've seen other posts where this seems to be easily dealt with (like this one that used filter), and those solutions did not work for this data.

#cleaning the data
df_tib$habitat <- df_tib$habitat_
df_tib$species <- df_tib$species_
df_tib <- janitor::clean_names(df_tib)
df_tib <- df_tib %>% 
  select(-habitat_,-species_, -na)

df_tib <- df_tib %>% 
  unnest(cols = everything()) #does not work

Any help is very appreciated.

E.O.
  • 351
  • 2
  • 14
  • @akrun sorry, I should clarify. In the un-manipulated dataset (the second link) it can be viewed in R to see the number of rows (observations). the 254 and 257 rows are filled with many NAs, which was the only thing I could think of as to why the lengths would be now incompatible, after having been manipulated into the tibble (yet still retaining that information). However, there are quite a lot of rows with NAs, so it doesn't make sense why those rows in particular would be problematic, in the original dataset. – E.O. Sep 22 '20 at 21:16
  • @akrun `names_pattern` seemed to work in that top code; however, are you suggesting that an unnoticed error there could cause efforts to unnest to fail? E.g., if one of the keys was not captured by it? – E.O. Sep 22 '20 at 21:21

1 Answers1

0

I just noticed you have a first column X in your data, that does not need to be included in your pivot_longer statement. To pivot all columns except for X, you can do pivot_longer(cols = -X, ...).

I also added drop_na which you may have intended after pivot_longer. This NA column otherwise had missing values, objectid, and editMode.

Try this instead:

library(tidyverse)

df %>%
  pivot_longer(cols = -X, names_to = c("name", ".value"), names_pattern = "(.*)_(.*)") %>%
  drop_na %>%
  select(-name) %>%
  pivot_wider(names_from = "01", values_from = "02", values_fn = list) %>%
  unnest(cols = everything())

Output

       X bearing_degrees coordinates distance_meters habitat_ how_many_animal… observers species_ was_a_group_of_… was_the_animal_… width_of_the_ro… what_species_of… livestock
   <int> <chr>           <chr>       <chr>           <chr>    <chr>            <chr>     <chr>    <chr>            <chr>            <chr>            <chr>            <chr>    
 1     1 200             N1.20701 E… 100             open_sc… 4                TW2       giraffe  group_           "alive"          single_lane      NA               NA       
 2     2 300             N1.20195 E… 20              commiph… NA               TW2 o     gerenuk  individual       "alive"          single_lane      NA               NA       
 3     3 10              N1.18823 E… 80              open_sc… NA               TW2       bird     individual       "alive"          single_lane      Ostrich          NA       
 4     4 20              N1.15642 E… 180             open_sc… 300              TW2       livesto… group_           "alive"          single_lane      NA               shoat    
 5     5 300             N1.14868 E… 30              open_sc… 7                TW2       livesto… group_           "alive"          single_lane      NA               cattle   
 6     6 70              N1.13874 E… 200             open_sc… 34               TW2       livesto… group_           "alive"          single_lane      NA               cattle   
 7     7 20              N1.11442 E… 40              disturb… 12               TW2       livesto… group_           "alive"          single_lane      NA               cattle   
 8     8 NA              N1.03003 E… NA              commiph… NA               TW2       jackal   NA               "roadkill"       single_lane      NA               NA       
 9     9 40              N1.97961 E… 50              mixed_s… null             TW2       gerenuk  individual       "alive,\n    wh… NA               NA               null     
10    10 20              N0.85822 E… 20              dense_s… 53               TW2       baboon   group_           "alive"          single_lane      NA               NA       
# … with 251 more rows, and 5 more variables: approximate_age_of_individual <chr>, approximate_number_of_days_sinc <chr>, sex_of_animal <chr>, generated_note_survey <chr>,
#   `_date` <chr>
Ben
  • 28,684
  • 5
  • 23
  • 45
  • 1
    This is perfect, thank you @Ben! My stumbling block was trying to get rid of the NAs upstream (in the original dataset) and just before the unnest command... I should have thought to do it after pivot_longer! – E.O. Sep 23 '20 at 14:48