0

The problem is simple to understand, but I can't do it. I've tried reshape and tidiverse without success. This the dataframe I have:

       X        X.1 close  high   low  open ticker     vol
   A3TV 2005-07-08 10.67 10.67 10.46 10.57   A3TV 2248201
   A3TV 2005-07-11 10.53 10.73 10.50 10.67   A3TV 1372371
   GOOG 2005-07-08 10.47 10.59 10.37 10.59   GOOG 1135093
   GOOG 2005-07-11 10.41 10.59 10.36 10.48   GOOG 722398

And I am trying to get this result:

       X.1    A3TV                                        GOOG
   2005-07-08 [10.67, 10.67, 10.46, 10.57, A3TV, 2248201]      [10.47, 10.59, 10.37, 10.59, GOOG, 1135093]
   2005-07-11 [10.53, 10.73, 10.50, 10.67, A3TV, 1372371]      [10.41, 10.59, 10.36, 10.48, GOOG, 722398]

So, what I want is to group by the X.1 (date), and make a column which contains every other data for every X (ticker) I have.

Manrique
  • 2,083
  • 3
  • 15
  • 38
  • Please post a reproducible version of your data, this will make it easier for people to help you. You can use `dput` to do so. – Lief Esbenshade Nov 19 '19 at 20:22
  • 3
    @LiefEsbenshade you copy past the table in `read.table(text = "the table here", header = T)` when the table is copied the way Manrique did (which is a reproducible version of his data). – denis Nov 19 '19 at 20:29
  • Thanks, I was messing with `readr::read_table` and the variables weren't parsing correctly. As a meta question, is the pasted data version above now the accepted standard for sharing examples as opposed to something that is immediately executable? – Lief Esbenshade Nov 19 '19 at 20:34
  • 2
    @LiefEsbenshade I sure hope it doesn't become the standard. Pasting a printout doesn't ensure that another user will read in the columns as the same data types. For example, when I read this with `readr::read_table` I get 2 columns, when I read with `readr::read_table2` I get 8 columns including a date for `X.1` and a number for `vol`, and when I read with `read.table` I get 8 columns with `X.1` as a factor. Really preferable if the OP can make this more accurately [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – camille Nov 19 '19 at 20:39
  • What do you mean by "array column"? It would be helpful to see what you've tried that hasn't worked so we know where you're trying to get exactly – camille Nov 19 '19 at 21:03
  • Thanks for the comments. I made it reproducible with the read.table @LiefEsbenshade says, but camille is right, I didn't think about that possibility. Also, I've edited my question to made it clearer. – Manrique Nov 25 '19 at 12:06

2 Answers2

1

I have a solution using dcast from data.table

library(data.table)
setDT(df)

catcol = function(x){paste0(x,collapse = " ")}
df[,valcol := apply(df[,-c("X","X.1")],1,catcol)]
dcast(df,X.1~X,value.var = "valcol")

          X.1                                 A3TV                                 GOOG
1: 2005-07-08 10.67 10.67 10.46 10.57 A3TV 2248201 10.47 10.59 10.37 10.59 GOOG 1135093
2: 2005-07-11 10.53 10.73 10.50 10.67 A3TV 1372371 10.41 10.59 10.36 10.48 GOOG  722398

The idea is to create the column your are going to use to fill when you will pass from long to wide:

> apply(df[,-c("X","X.1")],1,catcol)
[1] "10.67 10.67 10.46 10.57 A3TV 2248201" "10.53 10.73 10.50 10.67 A3TV 1372371"
[3] "10.47 10.59 10.37 10.59 GOOG 1135093" "10.41 10.59 10.36 10.48 GOOG  722398"

the data :

df <- read.table(text = "X        X.1 close  high   low  open ticker     vol
A3TV 2005-07-08 10.67 10.67 10.46 10.57   A3TV 2248201
A3TV 2005-07-11 10.53 10.73 10.50 10.67   A3TV 1372371
GOOG 2005-07-08 10.47 10.59 10.37 10.59   GOOG 1135093
GOOG 2005-07-11 10.41 10.59 10.36 10.48   GOOG 722398",header = T)
denis
  • 5,580
  • 1
  • 13
  • 40
  • Another option is to set the `fun.aggreagte` argument in `dcast` to `list`: `dcast(data = melt(df, id.vars = c("X.1", "ticker")), formula = X.1 ~ ticker, value.var = "value", fun.aggregate = list)`–but I am unsure if this is desired output. – markus Nov 19 '19 at 20:32
  • @markus That was my first guess, using `paste0(x,collapse = " ")` as aggregating function, but didn't managed to make it work. `list` is a nice try but outpu is not exactly what is expected – denis Nov 19 '19 at 20:42
1

Here is a tidyverse solution.

First we combine the desired variables into a single column, then we widen the data.

# data read taken from @denis answer
df <- read.table(text = "X        X.1 close  high   low  open ticker     vol
A3TV 2005-07-08 10.67 10.67 10.46 10.57   A3TV 2248201
A3TV 2005-07-11 10.53 10.73 10.50 10.67   A3TV 1372371
GOOG 2005-07-08 10.47 10.59 10.37 10.59   GOOG 1135093
GOOG 2005-07-11 10.41 10.59 10.36 10.48   GOOG 722398",header = T)

library(tidyverse)

df2 <- df %>% 
  mutate(data = paste(close, high, low, open, ticker, vol, sep = " ")) %>% 
  select(X, X.1, data) %>% 
  pivot_wider(names_from = X, values_from = data)
Lief Esbenshade
  • 793
  • 4
  • 13