-1

I am new to R and coding in general, so please bear with me.

I have a huge .csv file of financial options prices, but some are calls ('c') and some are puts ('p') and they are simply in one continuous list. In the .csv file they alternate, so one row will be the data for a call while the next will be the data for a put, say, of the same security for the same time period. How can I parse out just the data for calls (puts)?

Also, the data are arranged by date, but there are multiple pieces of data per date (intra-day data). OF these intra-day data points, there is (volume) data for multiple different prices. I would like to construct a normal distribution of said data over different prices per single day; how would I do that?

symbol  exchange    date    stock_close_price   option_symbol   expiration  strike  call/put
ALSN    NYSE    7/23/12 17.71   ALSN  120818C00015000   8/18/12 15  C
ALSN    NYSE    7/23/12 17.71   ALSN  120818P00015000   8/18/12 15  P
ALSN    NYSE    7/23/12 17.71   ALSN  120818C00017500   8/18/12 17.5    C
ALSN    NYSE    7/23/12 17.71   ALSN  120818P00017500   8/18/12 17.5    P
ALSN    NYSE    7/23/12 17.71   ALSN  120818C00020000   8/18/12 20  C
ALSN    NYSE    7/23/12 17.71   ALSN  120818P00020000   8/18/12 20  P
ALSN    NYSE    7/23/12 17.71   ALSN  120818C00022500   8/18/12 22.5    C
ALSN    NYSE    7/23/12 17.71   ALSN  120818P00022500   8/18/12 22.5    P
ALSN    NYSE    7/23/12 17.71   ALSN  120818C00025000   8/18/12 25  C
pascale
  • 35
  • 1
  • 9
  • 7
    Edit your question to include the first 10 lines of your file. Also, define huge (how many columns and lines or how many GB). And don't expect an email. If you get an answer, you will find it on this site. – Roland Aug 25 '12 at 15:54
  • I added the first 10 lines but when I saved it, the data got scrambled and was indecipherable. – pascale Aug 25 '12 at 16:48
  • 2
    Any basic R tutorial will tell you how to read in a file and how to select rows from a data frame. Go away and do some basic research first. And only ask one question per SO post. – Spacedman Aug 25 '12 at 16:51
  • 3
    I am afraid that if you are not able to find out how formatting on this site is done, you won't be able to master any programming language. Learn to find and read the help. – Roland Aug 25 '12 at 17:05
  • 1
    You can make the code "fixed" by selecting your parsed code and pressing the `{}` icon. – Roman Luštrik Aug 25 '12 at 17:07
  • 2
    I suggest a checkbutton for new users: "Yes, I included a [Reproducible Example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)" ;) OK, sorry, just joking, but do check out that link, it's very useful. – ROLO Aug 25 '12 at 18:18
  • 1
    You said there'd be volume! I can't hear it!!! – GSee Aug 25 '12 at 18:48

1 Answers1

0

If the data is in an R dataframe, "dfrm", with columns "transtype", "trans_date_time","volume", and the date-time values are in POSIXct format, then this should produce a daily summary of volume for call transactions. I'm not sure I know what you mean by "construct a normal distribution of said data over different prices per single day" but if you mean display the distribution of daily volumes that could easily be done with the hist plotting function.

set.seed(123)
dfrm <- data.frame(transtype = c("c","p")[sample(1:2, 20, rep=TRUE)], 
                   trans_date_time = as.POSIXct( Sys.Date() - 20) + 
                                         rnorm(100, 24*60*60, 3*24*60*60) , 
                   volume = 100*rpois(100, 5) )
str(dfrm)
dailycallvol <- with( subset( dfrm, transtype == "c"), 
                    aggregate( volume,
                      by = list( day= format(trans_date_time, format="%Y-%m-%d")),
                      FUN=sum) )
hist( dailycallvol[[2]] )

enter image description here

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • This is like on Family Feud when the contestant buzzes in before the question has been read. – GSee Aug 25 '12 at 18:52
  • thanks for the response. the code is still a bit foreign to me, but I see where you're going with that. – pascale Aug 25 '12 at 18:55
  • 1
    Imagine whirled peas. (We R programmers are a rather concrete bunch. Perhaps adding the example will make this more real. It did enable me to discover several errors in my imagined code.) – IRTFM Aug 25 '12 at 18:59