0

I'm trying to convert a triple nested list into a dataframe. This question has helped, but I can't get the dataframe I'd like.

The list is an options chain obtained from IBrokers, a summary is shown below. I've uploaded the actual chain here which is more detailed.

Chain <- 
  list(
    list(
      list(
        list(version="8",contract=list(symbol="BHP",right="C",expiry="20180621",strike="25")),
        list(version="8",contract=list(symbol="BHP",right="C",expiry="20180621",strike="26"))
      ),
      list(
        list(version="8",contract=list(symbol="BHP",right="C",expiry="20180730",strike="25")),
        list(version="8",contract=list(symbol="BHP",right="C",expiry="20180730",strike="26"))
      )
    ),
    list(
      list(
        list(version="8",contract=list(symbol="CBA",right="C",expiry="20180621",strike="65")),
        list(version="8",contract=list(symbol="CBA",right="C",expiry="20180621",strike="64"))
      ),
      list(
        list(version="8",contract=list(symbol="CBA",right="C",expiry="20180730",strike="65")),
        list(version="8",contract=list(symbol="CBA",right="C",expiry="20180730",strike="64"))
      )
    )
  )

I'd like to convert the list into a dataframe like this:

Contracts <- data.frame(symbol=c("BHP","BHP","BHP","BHP","CBA","CBA","CBA","CBA"),
                        right=c("C","C","C","C","C","C","C","C"),
                        expiry=c("20180621","20180621","20180730","20180730","20180621","20180621","20180730","20180730"),
                        strike=c("25","26","25","26","65","64","65","64"))

I tried this code, but it didn't give me the dataframe I wanted.

X <- lapply(Chain,function(x) as.data.frame.list(lapply(x,as.data.frame.list)))
dfx <- do.call(rbind,X)

Any suggestions please?

Zeus
  • 1,496
  • 2
  • 24
  • 53

2 Answers2

2

How about the following?

df <- as.data.frame(matrix(unlist(Chain, recursive = T), ncol = 5, byrow = T)[, -1]);
colnames(df) <- c("symbol", "right", "expiry", "strike");
#  symbol right   expiry strike
#1    BHP     C 20180621     25
#2    BHP     C 20180621     26
#3    BHP     C 20180730     25
#4    BHP     C 20180730     26
#5    CBA     C 20180621     65
#6    CBA     C 20180621     64
#7    CBA     C 20180730     65
#8    CBA     C 20180730     64

Explanation: Recursively unlist the nested Chain, then recast as matrix, remove column version and convert to data.frame. The only minor down-side is that we have to manually add column names.


Update

Since your actual data is quite different, here is a possibility. Note: I assume the structure from the Gist is stored in tbl.

tbl;
#Source: local data frame [2 x 6]
#Groups: <by row>
#
## A tibble: 2 x 6
#  symbol sectype exch  currency multiplier Chain
#  <fct>  <fct>   <fct> <fct>    <fct>      <list>
#1 BHP    OPT     ASX   AUD      100        <list [1,241]>
#2 CBA    OPT     ASX   AUD      100        <list [1,204]>

The following list contains two data.frames, one for each row from tbl.

lst <- lapply(tbl$Chain, function(x)
    do.call(rbind.data.frame, lapply(x, function(y) as.data.frame(unclass(y$contract)))))
#List of 2
# $ :'data.frame':  1241 obs. of  16 variables:
#  ..$ conId          : Factor w/ 1241 levels "198440202","198440207",..: 1 2 3 4 5 6 7 8 9 10 ...
#  ..$ symbol         : Factor w/ 1 level "BHP": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ sectype        : Factor w/ 1 level "OPT": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ exch           : Factor w/ 1 level "ASX": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ primary        : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ expiry         : Factor w/ 18 levels "20180628","20181220",..: 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ strike         : Factor w/ 118 levels "25","26","27",..: 1 1 2 2 3 3 4 4 5 5 ...
#  ..$ currency       : Factor w/ 1 level "AUD": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ right          : Factor w/ 2 levels "C","P": 1 2 1 2 1 2 1 2 1 2 ...
#  ..$ local          : Factor w/ 1241 levels "BHPV78","BHPV88",..: 1 2 3 4 5 6 7 8 9 10 ...
#  ..$ multiplier     : Factor w/ 1 level "100": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ combo_legs_desc: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ comboleg       : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ include_expired: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ secIdType      : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ secId          : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
# $ :'data.frame':  1204 obs. of  16 variables:
#  ..$ conId          : Factor w/ 1204 levels "198447027","198447030",..: 1 2 3 4 5 6 7 8 9 10 ...
#  ..$ symbol         : Factor w/ 1 level "CBA": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ sectype        : Factor w/ 1 level "OPT": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ exch           : Factor w/ 1 level "ASX": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ primary        : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ expiry         : Factor w/ 18 levels "20180628","20181220",..: 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ strike         : Factor w/ 179 levels "79.68","81.68",..: 1 1 2 2 3 3 4 4 5 5 ...
#  ..$ currency       : Factor w/ 1 level "AUD": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ right          : Factor w/ 2 levels "C","P": 1 2 1 2 1 2 1 2 1 2 ...
#  ..$ local          : Factor w/ 1204 levels "CBAKT9","CBAKU9",..: 1 2 3 4 5 6 7 8 9 10 ...
#  ..$ multiplier     : Factor w/ 1 level "100": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ combo_legs_desc: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ comboleg       : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ include_expired: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ secIdType      : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
#  ..$ secId          : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • 1
    Thanks, this also works for an uneven list. Adding column names isn't a problem. – Zeus Apr 29 '18 at 22:49
  • I can't get your answer to work on the actual chain which I've uploaded to the question. Can you try again with the actual chain? – Zeus Apr 30 '18 at 00:53
  • @Zeus Your actual chain has a *very different* structure from your sample data. Obviously my solution will not work, as it is based on the keys from your sample data. TBH, your chain data is **very unwieldy** (it's a large, complex & nested `list` with keys that are *not consistent* between entries), and I'm not even sure which entries you want to extract. Is it possible to extract the relevant bits directly when scraping the data? – Maurits Evers Apr 30 '18 at 01:29
  • Sorry my mistake, I should have uploaded the list in the first place. Unfortunately I cannot control how the data is returned from the broker. I don't want all the information, just the data in the lists ``.. .. ..$contract`` – Zeus Apr 30 '18 at 02:01
  • @Zeus But the data you provide seems to be a 2-row `tibble` with a nested `list` in column `Chain`. Is that afore-mentioned `Chain` from which you want to extract `.. .. ..$contract`? Since you have two rows, you also have two nested lists. Is that correct? – Maurits Evers Apr 30 '18 at 02:19
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/170053/discussion-between-zeus-and-maurits-evers). – Zeus Apr 30 '18 at 02:33
  • Both methods work, @Onyambu and Maurits thanks for your help – Zeus Apr 30 '18 at 02:42
1

You can use unstack

 unstack(data.frame(d<-unlist(Chain),names(d)))
  contract.expiry contract.right contract.strike contract.symbol version
1        20180621              C              25             BHP       8
2        20180621              C              26             BHP       8
3        20180730              C              25             BHP       8
4        20180730              C              26             BHP       8
5        20180621              C              65             CBA       8
6        20180621              C              64             CBA       8
7        20180730              C              65             CBA       8
8        20180730              C              64             CBA       8

If you want you can delete the word contract.

unstack(data.frame(d<-unlist(Chain),sub(".*[.]","",names(d))))
    expiry right strike symbol version
1 20180621     C     25    BHP       8
2 20180621     C     26    BHP       8
3 20180730     C     25    BHP       8
4 20180730     C     26    BHP       8
5 20180621     C     65    CBA       8
6 20180621     C     64    CBA       8
7 20180730     C     65    CBA       8
8 20180730     C     64    CBA       8

This can also be written as unstack(data.frame(d<-unlist(Chain),sub("contract[.]","",names(d)))) Although I would prefer to maintain the name contract in order to know which columns indeed form the contract dataframe needed

Or even you can change the names After unstacking.

With the new data:

a=readLines("https://raw.githubusercontent.com/hughandersen/OptionsTrading/master/Stocks_option_chain")
b=eval(parse(text=paste(a,collapse="")))
s=unstack(data.frame(d<-unlist(b[6]),names(d)))
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • The answer doesn't work on the actual options chain - I'll try and link the actual chain output as a dput to the question, I think the dummy list is too simplified – Zeus Apr 29 '18 at 23:18
  • really? huh.. maybe try to use `recursive=T` inside the `unlist`.. i cant tell as to why – Onyambu Apr 29 '18 at 23:25
  • I've uploaded the actual chain to the question - let me know if you have problems downloading it – Zeus Apr 30 '18 at 00:52
  • thanks for your help, this works with the full listed option chain. I've never heard of ``unstack``. – Zeus Apr 30 '18 at 02:07
  • Well hope you are using `Rgui` and not `Rstudio` since the file is just enormous. I guess more than 25MB – Onyambu Apr 30 '18 at 02:10
  • you're right, its just over 25mB, I'm using RStudio with lots of RAM – Zeus Apr 30 '18 at 02:12
  • @Onyambu I have a small question here, how to keep the initial column class after changing the list to a data frame. I used this syntax in my data, but all the columns have been changed from chron/number to character. I want to keep the initial column class. – LEE Feb 23 '21 at 12:19