0

Well, I'm working on a data frame that has a column where some elements have two or more words. I'm trying to use this column as a vector to make a pie chart, and I need the words on these elements to be split into single ones to be counted as individuals for the chart.

That's the vector in question:

science$area
 [1] "Matematica"                          "Filosofia"                          
 [3] "Arqueologia"                         "Astronomia"                         
 [5] "Biologia, Paleontologia"             "Biologia"                           
 [7] "Psicologia"                          "Astronomia"                         
 [9] "Fisica"                              "Biologia, Paleontologia"            
[11] "Astronomia"                          "Biologia"                           
[13] "Biologia, Fisica, Matematica, Saude" "Fisica"                             
[15] "Paleontologia"                       "Saude"                              
[17] "Biologia, Saude"                     "Biologia, Saude"                    
[19] "Saude"                               NA                                   
[21] "Biologia"                            "Fisica"                             
[23] "Psicologia"                          "Biologia"                           
[25] "Fisica"                              "Biologia"                           
[27] NA                                    "Historia"                           
[29] "Experiencias"                        "Astronomia"                         
[31] "Geografia"                           "Matematica"                         
[33] "Astronomia"                          "Filosofia, Literatura"              
[35] "Biologia"                            "Psicologia"                         
[37] "Biologia, Saude"                     "Saude"                              
[39] "Fisica"                              "Experiencias, Fisica"               
[41] "Biologia, Saude"                     "Biologia"                           
[43] "Computacao"                          "Biologia"                           
[45] "Fisica"                              "Fisica"                             
[47] "Filosofia, Historia, Literatura"     NA                                   
[49] "Literatura"                          "Astronomia"                         
[51] "Geografia, Meio Ambiente"            "Geografia"                          
[53] "Biologia, Paleontologia"             "Computacao"                         
[55] "Fisica, Literatura"                  "Filosofia"                          
[57] "Geografia, Meio Ambiente"            "Fisica"                             
[59] "Biologia"                            "Geografia, Historia" 

When I summarize it, this returns:

> summary(factor(science$area))
                    Arqueologia                          Astronomia 
                              1                                   6 
                       Biologia Biologia, Fisica, Matematica, Saude 
                              9                                   1 
        Biologia, Paleontologia                     Biologia, Saude 
                              3                                   4 
                     Computacao                        Experiencias 
                              2                                   1 
           Experiencias, Fisica                           Filosofia 
                              1                                   2 
Filosofia, Historia, Literatura               Filosofia, Literatura 
                              1                                   1 
                         Fisica                  Fisica, Literatura 
                              8                                   1 
                      Geografia                 Geografia, Historia 
                              2                                   1 
       Geografia, Meio Ambiente                            Historia 
                              2                                   1 
                     Literatura                          Matematica 
                              1                                   2 
                  Paleontologia                          Psicologia 
                              1                                   3 
                          Saude                                NA's 
                              3                                   3

So, as you can see, "Biologia, Paleontologia" is being treated as a level for example, and I need it to count to both "Biologia" and "Paleontologia" instead. How can I do this?? I've already tried, unsuccessfully, to write these elements using c() and using "" amongst them, and also tried to use the split(), but it just split without considering the words...

3 Answers3

1

In base R, we can use strsplit to split the strings at every occurrence of ", ", then unlist the result into one vector. Note though that this vector will be longer than your original data frame, so can't be stored as a column within it.

table(unlist(strsplit(science$area, ", ")))
#> 
#>   Arqueologia    Astronomia      Biologia    Computacao  Experiencias 
#>             1             6            17             2             2 
#>     Filosofia        Fisica     Geografia      Historia    Literatura 
#>             4            11             5             3             4 
#>    Matematica Meio Ambiente Paleontologia    Psicologia         Saude 
#>             3             2             4             3             8

Created on 2022-02-12 by the reprex package (v2.0.1)

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
1

Using strsplit() and unlist() I was able to solve my problem.

Like this:

areas=unlist(strsplit(science$area,","))
pie(summary(factor(areas)))
0

With tidyverse, we can use separate_rows with count

library(dplyr)
library(tidyr)
library(ggplot2)
separate_rows(science, area, sep = ",\\s*") %>%
    count(area) %>%
    ggplot(aes(x = "", y = n, fill = area)) + 
      geom_bar(stat = "identity") +
      coord_polar("y", start=0)
akrun
  • 874,273
  • 37
  • 540
  • 662