0

I'm still a novice to R. Can anyone help me figure out how to spread my dataset under different categories?

Here is what I got in R.

enter image description here

Values with "item code" fall in "broad category", others are in the subcategories (actual items). So now, I wanna separate values in subcategories (that is, values with NA code) and make it into the third column ((Like column#1 is "Item Code", column#2 is "Broad category", and column #3 is " Specific items")

More specifically, I wish the final result just look like this:

enter image description here

I'm thinking about using the spread() command, but it seems it doesn't work. Could anyone give me some suggestions on my following steps?

(I'm thinking about specifying "broad categories" as a variable, and "subcategories" as another variable, and then, I might be able to spread the table? Not sure)

SecretAgentMan
  • 2,856
  • 7
  • 21
  • 41
  • 1
    Welcome to Stack Overflow! You may want to check out [How to make a great R reproducible example](https://stackoverflow.com/q/5963269/8386140) to help others help you; in particular, pasting images of data is not very helpful, and we could reproduce your issue and help you fix it faster if your data were provided as text. One way to do that is to [edit] your question to include the output of the R command `dput(head(df))` (where you replace `df` with the actual name of your dataframe) – duckmayr Mar 14 '20 at 14:43

2 Answers2

2

Here is a tidyverse solution you can consider. I will add sample data so that others can provide alternatives.

library(tidyverse)

df %>%
  fill(Item.Code) %>%
  group_by(Item.Code) %>%
  mutate(Category = first(Item)) %>%
  slice(2:n())

Output

# A tibble: 12 x 3
# Groups:   Item.Code [3]
   Item.Code Item                                        Category                        
       <dbl> <fct>                                       <fct>                           
 1       221 Prunus amygdalus                            Almonds, with shell             
 2       221 Almond (Prunus dulcis or Amygdalus communis Almonds, with shell             
 3       711 Pimpinella anisum (aniseed)                 Anise, badian, fennel, coriander
 4       711 Illicium verum (star anise)                 Anise, badian, fennel, coriander
 5       711 Carum carvi                                 Anise, badian, fennel, coriander
 6       711 Coriandrum sativum (coriander               Anise, badian, fennel, coriander
 7       711 Cuminum cyminum (cumin)                     Anise, badian, fennel, coriander
 8       711 Foeniculum vulgare (fennel)                 Anise, badian, fennel, coriander
 9       711 Juniperus communis (common juniper)         Anise, badian, fennel, coriander
10       800 Agave                                       Agave fibres nes                
11       800 Agave fourcroydes (Henequen)                Agave fibres nes                
12       800 Agave americana (century plant)             Agave fibres nes

Data

df <- data.frame(
  Item.Code = c(800, NA, NA, NA, 221, NA, NA, 711, NA, NA, NA, NA, NA, NA, NA),
  Item = c("Agave fibres nes", "Agave", "Agave fourcroydes (Henequen)", "Agave americana (century plant)", "Almonds, with shell",
           "Prunus amygdalus", "Almond (Prunus dulcis or Amygdalus communis", "Anise, badian, fennel, coriander",
           "Pimpinella anisum (aniseed)", "Illicium verum (star anise)", "Carum carvi", "Coriandrum sativum (coriander",
           "Cuminum cyminum (cumin)", "Foeniculum vulgare (fennel)", "Juniperus communis (common juniper)")
)
Ben
  • 28,684
  • 5
  • 23
  • 45
  • It works! Perfect! Thank you so much! Based on your code, I also find an alternative way to make it look nicer. Here is what I got: crop.sorted %>% mutate(Category = ifelse(!is.na(Code), Item, Code)) %>% fill(Code,Category) %>% group_by(Code)%>% slice(2:n())%>% select(Code,Category,Item)->crop.sorted – stjkddzqfazf Mar 14 '20 at 18:23
0

We can also use data.table

library(data.table)
library(zoo)
setDT(df)[,  c(.SD[-1], .(Category = first(Item))),.(Item.Code = na.locf0(Item.Code))]
#    Item.Code                                        Item                         Category
# 1:       800                                       Agave                 Agave fibres nes
# 2:       800                Agave fourcroydes (Henequen)                 Agave fibres nes
# 3:       800             Agave americana (century plant)                 Agave fibres nes
# 4:       221                            Prunus amygdalus              Almonds, with shell
# 5:       221 Almond (Prunus dulcis or Amygdalus communis              Almonds, with shell
# 6:       711                 Pimpinella anisum (aniseed) Anise, badian, fennel, coriander
# 7:       711                 Illicium verum (star anise) Anise, badian, fennel, coriander
# 8:       711                                 Carum carvi Anise, badian, fennel, coriander
# 9:       711               Coriandrum sativum (coriander Anise, badian, fennel, coriander
#10:       711                     Cuminum cyminum (cumin) Anise, badian, fennel, coriander
#11:       711                 Foeniculum vulgare (fennel) Anise, badian, fennel, coriander
#12:       711         Juniperus communis (common juniper) Anise, badian, fennel, coriander

data

df <- data.frame(
  Item.Code = c(800, NA, NA, NA, 221, NA, NA, 711, NA, NA, NA, NA, NA, NA, NA),
  Item = c("Agave fibres nes", "Agave", "Agave fourcroydes (Henequen)", "Agave americana (century plant)", "Almonds, with shell",
           "Prunus amygdalus", "Almond (Prunus dulcis or Amygdalus communis", "Anise, badian, fennel, coriander",
           "Pimpinella anisum (aniseed)", "Illicium verum (star anise)", "Carum carvi", "Coriandrum sativum (coriander",
           "Cuminum cyminum (cumin)", "Foeniculum vulgare (fennel)", "Juniperus communis (common juniper)")
)
akrun
  • 874,273
  • 37
  • 540
  • 662