-1

I have CSV data as follows:

 code, label, value
 ABC,  len,   10
 ABC,  count, 20
 ABC,  data,  102
 ABC,  data,  212
 ABC,  data,  443
 ...
 XYZ,  len,   11
 XYZ,  count, 25
 XYZ,  data,  782
 ...

The number of data entries is different for each code. (This doesn't matter for my question; I'm just point it out.)

I need to analyze the data entries for each code. This would include calculating the median, plotting graphs, etc. This means I should separate out the data for each code and make it numeric?

Is there a better way of doing this than this kind of thing:

 x = read.csv('dataFile.csv, header=T)
 ...
 median(as.numeric(subset(x, x$code=='ABC' & x$label=='data')$value))
 boxplot(median(as.numeric(subset(x, x$code=='ABC' & x$label=='data')$value)))
SabreWolfy
  • 5,392
  • 11
  • 50
  • 73
  • 3
    R has many different "group" or "apply" type functions. See this question to get started: http://stackoverflow.com/questions/3505701/r-grouping-functions-sapply-vs-lapply-vs-apply-vs-tapply-vs-by-vs-aggrega – A5C1D2H2I1M1N2O1R2T1 Oct 30 '13 at 15:21

2 Answers2

1

split and list2env allows you to separate your data.frame x for each code generating one data.frame for each level in code:

list2env(split(x, x$code), envir=.GlobalEnv)

or just

my.list <- split(x, x$code)

if you prefer to work with lists.

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
1

I'm not sure I totally understand the final objective of your question, do you just want some pointers of what you could do it? because there are a lot of possible solutions.

When you ask: I need to analyze the data entries for each code. This would include calculating the median, plotting graphs, etc. This means I should separate out the data for each code and make it numeric?

The answer would be no, you don't strictly have to. You could use R functions which does this task for you, for example:

x = read.csv('dataFile.csv', header=T)

#is it numeric?
class(x$value)
# if it is already numeric you shouldn't have to convert it,
# if it strictly numeric I don't know any reason why it 
# should be read as strings but it happens.

aggregate(x,by=list(x$code),FUN="median")

boxplot(value~code,data=x)
# and you can do ?boxplot to look into its options. 
JEquihua
  • 1,217
  • 3
  • 20
  • 40