1

I am working on the gafa_stock dataframe in the tsibbledata package. I want to find the maximum closing stock price for the each of the four stocks in the dataframe. Since the dataframe has four stocks I want to get a table with four rows with each row giving me the maximum value of a stock. I use the instructions here: Extract the maximum value within each group in a dataframe and write this code:

gafa_stock %>%
   group_by(Symbol) %>%
   summarise(maximum = max(Close))

The gafa_stock dataframe looks this

enter image description here

The str(gafa_stock) has these results

str(gafa_stock)
tsibble [5,032 x 8] (S3: tbl_ts/tbl_df/tbl/data.frame)
$ Symbol   : chr [1:5032] "AAPL" "AAPL" "AAPL" "AAPL" ...
$ Date     : Date[1:5032], format: "2014-01-02" "2014-01-03" "2014-01-06" ...
$ Open     : num [1:5032] 79.4 79 76.8 77.8 77 ...
$ High     : num [1:5032] 79.6 79.1 78.1 78 77.9 ...
$ Low      : num [1:5032] 78.9 77.2 76.2 76.8 77 ...
$ Close    : num [1:5032] 79 77.3 77.7 77.1 77.6 ...
$ Adj_Close: num [1:5032] 67 65.5 65.9 65.4 65.8 ...
$ Volume   : num [1:5032] 5.87e+07 9.81e+07 1.03e+08 7.93e+07 6.46e+07 ...
- attr(*, "key")= tibble [4 x 2] (S3: tbl_df/tbl/data.frame)
..$ Symbol: chr [1:4] "AAPL" "AMZN" "FB" "GOOG"
..$ .rows : list<int> [1:4] 
.. ..$ : int [1:1258] 1 2 3 4 5 6 7 8 9 10 ...
.. ..$ : int [1:1258] 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 ...
.. ..$ : int [1:1258] 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 ...
.. ..$ : int [1:1258] 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 ...
.. ..@ ptype: int(0) 
..- attr(*, ".drop")= logi TRUE
- attr(*, "index")= chr "Date"
..- attr(*, "ordered")= logi TRUE
- attr(*, "index2")= chr "Date"
- attr(*, "interval")= interval [1:1] 1D
..@ .regular: logi TRUE

And, my final results look like this

enter image description here

This command creates a table that has all the 5032 rows and three columns - Symbol, Date and the closing price labeled as maximum. What am I doing wrong? Is this because of some special characteristic of a ts or tsibble dataframe?

Piyush Shah
  • 301
  • 4
  • 15
  • I tried your code with `gafa_stock` data and I cannot reproduce your problem. I get a `data.frame` with four rows and two columns, the last one being the max column. Restart your R session and try again. – Taufi Feb 05 '21 at 23:58
  • I added str(gafa_stock) and also tried restarting the R session. But the problem persists. – Piyush Shah Feb 06 '21 at 00:07
  • Even I cannot reproduce the issue, I get only 4 rows of data. Maybe some package is masking other functions. Try using `gafa_stock %>% dplyr::group_by(Symbol) %>% dplyr::summarise(maximum = max(Close))`. What is your `packageVersion('tsibble')` ? I have `‘0.9.3’`. – Ronak Shah Feb 06 '21 at 04:04

2 Answers2

2

We can convert to a tibble first as there are other class attributes as well tbl_ts if the version of tsibble is < 0.9.3

gafa_stock %>%
    as_tibble %>%
     group_by(Symbol) %>%
       summarise(maximum = max(Close), .groups = 'drop')

-output

# A tibble: 4 x 2
#  Symbol maximum
#  <chr>    <dbl>
#1 AAPL      232.
#2 AMZN     2040.
#3 FB        218.
#4 GOOG     1268.

In the newer version (0.9.3), it works without the conversion

gafa_stock %>%
    group_by(Symbol) %>%
    summarise(maximum = max(Close), .groups = 'drop')
# A tibble: 4 x 2
#  Symbol maximum
#  <chr>    <dbl>
#1 AAPL      232.
#2 AMZN     2040.
#3 FB        218.
#4 GOOG     1268.

According to tsibble (0.9.2)

Each observation should be uniquely identified by index and key in a valid tsibble.

Here, the attribute for index is "Date"

attr(gafa_stock, "index")[1]
#[1] "Date"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • This works....but I am curious of why I had to convert it to tibble? What prevents me from using this code in the tsibble format? Am asking this question to try and learn something about tsibble. – Piyush Shah Feb 06 '21 at 00:15
  • 1
    Generally operations such as `summarise()` with a tsibble will return a tsibble. If you want to summarise over the time dimension (and so you no longer want a time series), you should drop the time attributes with `as_tibble()`. – Mitchell O'Hara-Wild Feb 08 '21 at 12:30
  • 1
    Also note @akrun, you can access the index variable of a tsibble using index_var(): `index_var(gafa_stock)` `#> [1] "Date"` – Mitchell O'Hara-Wild Feb 08 '21 at 12:31
1

I think this is what you want:

gafa_stock %>%
  group_by(Symbol) %>% 
  filter(Close == max(Close))

Result:

# A tsibble: 4 x 8 [!]
# Key:       Symbol [4]
# Groups:    Symbol [4]
  Symbol Date        Open  High   Low Close Adj_Close   Volume
  <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
1 AAPL   2018-10-03  230.  233.  230.  232.      230. 28654800
2 AMZN   2018-09-04 2026. 2050. 2013  2040.     2040.  5721100
3 FB     2018-07-25  216.  219.  214.  218.      218. 58954200
4 GOOG   2018-07-26 1251  1270. 1249. 1268.     1268.  2405600
intheflesh
  • 11
  • 1