0

I have a column of 17000 values that I would like to classify into 48 groups by their ranges (classifying SIC codes into Fama French industries).

df$SIC
[1] 5080 4911 7359 2834 3674 6324 2810 4512 4400 6331 3728 3350 2911 2085 7340 6311 6199 6321 2771 3844 2870 3823 2836 3825

The only way I can think of to do this is to write a bunch of if then statements and place them all in a for loop. However, this will take forever to run.

for(i in c(1:(dim(df)[1])){
if(df$SIC[i] >= 0100 && df$SIC[i] <= 0299){df$FF_IND <- "AGRI"}
}
## and so on for all groups

Do you know of a less taxing way to perform this task?

Many thanks!

  • If you share a sample of the data and your desired output, you'll get much more valuable and timely suggestions. – A5C1D2H2I1M1N2O1R2T1 Jun 01 '13 at 20:06
  • Oh okay (sorry I'm new here!) SIC 5080 4911 7359 2834 3674 6324 2810 4512 for(i in c(1:(dim(financials))[1])){ if(financials$SIC[i] >= 0100 && financials$SIC[i] <= 0299){financials$FF_IND[i] <- "AGRIC"} } And what I would like is something like: SIC FF_IND 5080 AGRI 4911 AGRI 7359 UTIL 2834 FIN 3674 UTIL 6324 CONS 2810 CONS 4512 UTIL where FF_IND is the group name (Sorry, my formatting is terrible) – user2303635 Jun 01 '13 at 20:08
  • That's not really what @AnandaMahto meant, please help us help you by providing us with a reproducible example (i.e. code and example data), see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for details. – Paul Hiemstra Jun 01 '13 at 20:10
  • The ranges are numeric, yes? And you know them beforehand? Can you edit your post to include them, along with perhaps a bit more of your 17,000 values? You can use dput for this. – Peyton Jun 01 '13 at 20:11
  • And please edit the code and data into your question, that works much better than a comment. – Paul Hiemstra Jun 01 '13 at 20:11
  • 3
    If the ranges are just numeric, you can probably use `cut`. – A5C1D2H2I1M1N2O1R2T1 Jun 01 '13 at 20:12

1 Answers1

1

Something like:

cut(df$SIC,breaks=c(100,299,...),labels=c("AGRI",...))

A more thorough solution (which I don't have time for right now) would extract the table found via http://boards.fool.com/famafrench-industry-codes-26799316.aspx (downloading http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Siccodes49.zip and extracting the table) and finding the breakpoints programmatically.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453