2

I have a dataset where I am trying to take rows of data that are strung together:

neighborhoods that are listed as Allendale/Irvington/S. Hilton Beechfield/Ten Hills/West Hills etc

and they are associated with columns of data.

I would like to take those neighborhoods, use a split function to get

Allendale
Irvington
S. Hilton
Beechfield
Ten Hills

but I also want copy the data down so that the column data for Allendale Irvington and S. Hilton are the same!

Then I'll just sort it back to alphabetical order.

I'm a novice and google most of what I do, so if you could also kind of explain what you're doing, that would help a great deal!

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
  • 3
    Try `strsplit` and specify the `split`. Please format your example and expected output. – akrun Jul 16 '15 at 18:40
  • 4
    Please provide a minimal example as in http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610 and specify what your expected output would look like for that example. – mts Jul 16 '15 at 18:44
  • As above commentators mentioend: specify expected output. I can venture an approximate answer: do it in two steps per row. In a first step you visit the cell or cells which need splitting and do the spliting (e.g., values <- strsplit(this_cell,"/") ). On the next step, for those strings splitted in more than one piece, you temporarily store the row and delete it from the dataframe, then you iterate over the pieces creating one row per piece. – Marcelo Bielsa Jul 16 '15 at 18:49
  • use `dput` on your data which will make it copy-pastable – Dean MacGregor Jul 16 '15 at 19:18
  • Thanks everyone! Jaap got me! – Corinne Wiesner Jul 20 '15 at 16:08

1 Answers1

3

You can do that with the cSplit function of the package:

# create some dummy data
df <- data.frame(n=c(12,15),area=c("Allendale/Irvington/S. Hilton","Beechfield/Ten Hills/West Hills"))

# split & convert to long format
library(splitstackshape)
df.new <- cSplit(df, sep="/", "area", "long", type.convert=TRUE)

the result:

> df.new
    n       area
1: 12  Allendale
2: 12  Irvington
3: 12  S. Hilton
4: 15 Beechfield
5: 15  Ten Hills
6: 15 West Hills

An alternative is to use the tstrsplit function from the package:

library(data.table)
dt.new <- setDT(df)[, lapply(.SD, function(x) unlist(tstrsplit(x, "/", fixed=TRUE))), by=n]

this gives:

> dt.new
    n       area
1: 12  Allendale
2: 12  Irvington
3: 12  S. Hilton
4: 15 Beechfield
5: 15  Ten Hills
6: 15 West Hills

You can also use:

dt.new <- setDT(df)[, strsplit(area,"/",fixed=TRUE), by=n]

but that does not preserve the variable name (i.e. area).

Jaap
  • 81,064
  • 34
  • 182
  • 193