0

I have a dataframe with crop names and their respective FAO codes. Unfortunately, some crop categories, such as 'other cereals', have multiple FAO codes, ranges of FAO codes or even worse - multiple ranges of FAO codes.

Snippet of the dataframe with the different formats for FAO codes.

> FAOCODE_crops
      SPAM_full_name                          FAOCODE
1              wheat                               15
2               rice                               27
8      other cereals 68,71,75,89,92,94,97,101,103,108
27   other oil crops                  260:310,312:339
31 other fibre crops                          773:821

Using the following code successfully breaks down these numbers,

unlist(lapply(unlist(strsplit(FAOCODE_crops$FAOCODE, ",")), function(x) eval(parse(text = x))))
[1]  15  27  56  44  79  79  83  68  71  75  89  92  94  97 101 103 108

... but I fail to merge these numbers back into the dataframe, where every FAOCODE gets its own row.

> FAOCODE_crops$FAOCODE <- unlist(lapply(unlist(strsplit(MAPSPAM_crops$FAOCODE, ",")), function(x) eval(parse(text = x))))
Error in `$<-.data.frame`(`*tmp*`, FAOCODE, value = c(15, 27, 56, 44,  : 
  replacement has 571 rows, data has 42

I fully understand why it doesn't merge successfully, but I can't figure out a way to fill the table with a new row for each FAOCODE as idealized below:

SPAM_full_name                          FAOCODE
1              wheat                               15
2               rice                               27
8      other cereals                               68
8      other cereals                               71
8      other cereals                               75
8      other cereals                               89

And so on...

Any help is greatly appreciated!

www
  • 38,575
  • 12
  • 48
  • 84
prayner
  • 393
  • 1
  • 10
  • See: https://stackoverflow.com/questions/13773770/split-comma-separated-strings-in-a-column-into-separate-rows. Look up the `unnest` function in `dplyr`. – thc Feb 24 '20 at 18:29
  • I managed to use that to separate values with commas separating them, but still unsure how to fix the number range issue. – prayner Feb 24 '20 at 18:42

1 Answers1

0

We can use separate_rows to separate the ,. After that, we can loop through the FAOCODE using map and ~eval(parse(text = .x)) to evaluate the number range. Finnaly, we can use unnest to expand the data frame.

library(tidyverse)

dat2 <- dat %>%
  separate_rows(FAOCODE, sep = ",") %>%
  mutate(FAOCODE = map(FAOCODE, ~eval(parse(text = .x)))) %>%
  unnest(cols = FAOCODE)
dat2
# # A tibble: 140 x 2
#    SPAM_full_name FAOCODE
#    <chr>            <dbl>
#  1 wheat               15
#  2 rice                27
#  3 other cereals       68
#  4 other cereals       71
#  5 other cereals       75
#  6 other cereals       89
#  7 other cereals       92
#  8 other cereals       94
#  9 other cereals       97
# 10 other cereals      101
# # ... with 130 more rows

DATA

dat <- read.table(text = "      SPAM_full_name                          FAOCODE
1              wheat                               15
2               rice                               27
8      'other cereals' '68,71,75,89,92,94,97,101,103,108'
27   'other oil crops'                  '260:310,312:339'
31 'other fibre crops'                          '773:821'",
                  header = TRUE, stringsAsFactors = FALSE)
www
  • 38,575
  • 12
  • 48
  • 84