Split multiple values from a single variable within a data frame

Question

I have the following dataframe which contains several values for a single variable (Problemas.habituales) (see below)

> read.csv("http://pastebin.com/raw.php?i=gnWRqJnY")
  Nombre.barrio                             Problemas.habituales
1         Actur Robos con violencia, Agresiones, Otros problemas
2         Actur                                  Ningún problema
3        Centro                  Robos con violencia, Agresiones
4     San Pablo                                  Ningún problema
5     San Pablo                                  Ningún problema
6      Delicias                     Hurtos o robos sin violencia

The reason for this structure is that I created an online questionnaire which accepts multiple answers to the same question, but the way data is stored is a problem because there's no way to create a barplot displaying all common problems within every neighborhood without previously manipulating the dataframe.

Unfortunately I do not know how to manipulate the dataframe (I need it to be on a data frame since I need to use ggplot2 later on, which does not accept data tables) in a way that every row contains a single value for the variable "Problemas.habituales".

I've seen that this question has been marked with a -1 and I am wondering why, since I made a search first on duckduckgo and later on in stackoverflow and didn't find any duplicate (other than being easy to solve if you know how to do it, but I don't think being a newbie is something bad). — ccamara, Jun 16 '15 at 08:49
check [this](http://stackoverflow.com/questions/13773770/split-comma-separated-column-into-separate-rows) should be helpful — Veerendra Gadekar, Jun 16 '15 at 10:23

score 3 · Answer 1 · answered Jun 16 '15 at 08:31

3

library(data.table)
DF <- fread("http://pastebin.com/raw.php?i=gnWRqJnY")
setnames(DF, make.names(names(DF)))
DF <- DF[, .(Problemas.habituales = unlist(strsplit(Problemas.habituales, ",", 
                                                    fixed = TRUE))), by = Nombre.barrio]
setDF(DF)

(I assume that you don't see encoding problems with your locale.)

answered Jun 16 '15 at 08:31

Roland

127,288
10
191
288

Hum... aparently doesn't work if I used read.csv instead of fread... still wondering why and how to fix it, since if I change to fread it will break most of the work I've done in other parts of the dataframe due to different column names (read.csv adds . instead of spaces between words) – ccamara Jun 16 '15 at 09:53
I think I found the problem... fread creates a data table, whereas read.csv creates a data frame, which is what I need. Is there any way to make it work with data frames? – ccamara Jun 16 '15 at 10:05
Sure, but why would you want to? – Roland Jun 16 '15 at 10:45
As far as I know, ggplot2 only works with dataframes, not tables, and I need to work with ggplot2. – ccamara Jun 16 '15 at 12:04
2

1) `data.table` inherits the data.frame class and ggplot2 works with data.tables just fine. 2) The last command turns the data.table into an ordinary data.frame. – Roland Jun 16 '15 at 12:31

Veerendra Gadekar · Accepted Answer · 2015-06-16T10:46:12.767

you can do this using splitstackshape

library(splitstackshape)
cSplit(DF, "Problemas habituales", ",", direction = "long")

#   Nombre barrio         Problemas habituales
#1:         Actur          Robos con violencia
#2:         Actur                   Agresiones
#3:         Actur              Otros problemas
#4:         Actur              Ningún problema
#5:        Centro          Robos con violencia
#6:        Centro                   Agresiones
#7:     San Pablo              Ningún problema
#8:     San Pablo              Ningún problema
#9:      Delicias Hurtos o robos sin violencia

Split multiple values from a single variable within a data frame

2 Answers2