Analyze CSV data in R

Question

I have CSV data as follows:

 code, label, value
 ABC,  len,   10
 ABC,  count, 20
 ABC,  data,  102
 ABC,  data,  212
 ABC,  data,  443
 ...
 XYZ,  len,   11
 XYZ,  count, 25
 XYZ,  data,  782
 ...

The number of data entries is different for each code. (This doesn't matter for my question; I'm just point it out.)

I need to analyze the data entries for each code. This would include calculating the median, plotting graphs, etc. This means I should separate out the data for each code and make it numeric?

Is there a better way of doing this than this kind of thing:

 x = read.csv('dataFile.csv, header=T)
 ...
 median(as.numeric(subset(x, x$code=='ABC' & x$label=='data')$value))
 boxplot(median(as.numeric(subset(x, x$code=='ABC' & x$label=='data')$value)))

R has many different "group" or "apply" type functions. See this question to get started: http://stackoverflow.com/questions/3505701/r-grouping-functions-sapply-vs-lapply-vs-apply-vs-tapply-vs-by-vs-aggrega — A5C1D2H2I1M1N2O1R2T1, Oct 30 '13 at 15:21

score 1 · Answer 1 · answered Oct 30 '13 at 15:25

1

split and list2env allows you to separate your data.frame x for each code generating one data.frame for each level in code:

list2env(split(x, x$code), envir=.GlobalEnv)

or just

my.list <- split(x, x$code)

if you prefer to work with lists.

answered Oct 30 '13 at 15:25

Jilber Urbina

58,147
10
114
138

JEquihua · Answer 2 · 2013-10-30T22:28:01.580

I'm not sure I totally understand the final objective of your question, do you just want some pointers of what you could do it? because there are a lot of possible solutions.

When you ask: I need to analyze the data entries for each code. This would include calculating the median, plotting graphs, etc. This means I should separate out the data for each code and make it numeric?

The answer would be no, you don't strictly have to. You could use R functions which does this task for you, for example:

x = read.csv('dataFile.csv', header=T)

#is it numeric?
class(x$value)
# if it is already numeric you shouldn't have to convert it,
# if it strictly numeric I don't know any reason why it 
# should be read as strings but it happens.

aggregate(x,by=list(x$code),FUN="median")

boxplot(value~code,data=x)
# and you can do ?boxplot to look into its options.

Yes, I'm looking for suggestions/pointers as to how to easily work with the data structure. — SabreWolfy, Oct 30 '13 at 16:20

Analyze CSV data in R

2 Answers2