R - divide into smaller data frames based on information in a column

Question

Let's say I have a tab-delimited file fileA.txt containing several types of information as follows:

X         123       78000    0        romeo 
X         78000     78004    56       juliet    
Y         78004     78005    12       mario
Y         78006     78008    21       mario   
Y         78008     78056    8        luigi 
Z         123       78000    1        peach 
Z         78000     78004    24       peach    
Z         78004     78005    4        peach
A         78006     78008    12       zelda   
A         78008     78056    14       zelda

I have this data frame saved to a variable as follows:

df <- read.table("fileA.txt",sep="\t",colClasses=c("character","numeric","numeric","numeric","character"))
colnames(df) <- c("location","start","end","value","label")

Let's assume that I don't know how many different strings are contained in the first column df[,1] and call this number n. I would like to automatically generate n new data frames, each containing the information for a single type of string. How do I go about writing a function for that?

matt_k · Answer 1 · 2014-02-08T14:19:04.633

4

You can do this with split, which will return a list containing a data.frame named after each level that you've split on.

df <- data.frame(v = rep(1:10, 2), n = rep(letters[1:10], 2))
split(df, df$n)

edited Feb 08 '14 at 14:19

answered Feb 08 '14 at 14:10

matt_k

4,139
4
27
33

Thank you for your reply. I didn't know about `split()` so this is useful information. However, `df[,1]` does not necessarily contain just the letters of the alphabet, and `length(df[,1])` is not necessarily equal to `10`, so I will go with the more general answer from user:redmode. – biohazard Feb 08 '14 at 14:15
Neither of those things has to be true. I was just providing a reproducible example, since you didn't give us one. Just call split on your data.frame, and the second argument should be whatever you want to split on. – matt_k Feb 08 '14 at 14:22
Sure, I just needed to validate one of the answers, that's all. Thank you for your time :) – biohazard Feb 08 '14 at 14:24

redmode · Accepted Answer · 2014-02-08T14:18:39.077

2

Probably, you need:

library(plyr)
out <- llply(unique(df[,1]), function(x) subset(df, df[,1]==x))
out

It creates list where each element is data.frame with specific location.

Now you can access data.frames as: out[[1]].

If you want to keep names:

names(out) <- unique(df[,1])
out$X # gives data.frame with location=='X'

edited Feb 08 '14 at 14:18

answered Feb 08 '14 at 14:09

redmode

4,821
1
25
30

Nice trick for keeping the names :) – biohazard Feb 08 '14 at 14:20

score 2 · Answer 3 · answered Feb 08 '14 at 14:29

2

for (x in unique(df[, 1]))
  assign(paste("df", x, sep="_"), df[df[, 1] == x, ])

or

list2env(split(df, df$location), environment())

answered Feb 08 '14 at 14:29

lukeA

53,097
5
97
100

This is cool too with `assign()`, I can choose to remove `df[,1]` from the output by changing the last part to `df[df[,1]==x,][-1]` – biohazard Feb 08 '14 at 14:49

R - divide into smaller data frames based on information in a column

3 Answers3