0

I have a huge data set (~150000*41) looks like

head(Data)

    Time      A1    A2    A3   ...  A40
    12:00:00  0     0     0.1  ...  0.65
    12:00:30  0.15  0.32  0.2  ...  0.54   
    12:01:00  0     0.43  0.14 ...  0
        .
        .
        .

I used ggplot() to plot data as in the question here:

Data <- Data %>%
mutate(data=paste0('Data',data)) %>%
pivot_longer(-c(data,Time))

p <- ggplot(Data, aes(x=factor(Time),y=value,group=name,color=name))+
geom_line()+
facet_wrap(.~data,scales = 'free',ncol=1)+
xlab('Time')

It is known that ggplot() will process data before plotting so it will delete outliers or missing values. Let us call the processed data by "output data" so ggplot() will plot the output data not the original data. In my work, the data frame consists of 150000 rows while when plotting data, ggplot() deletes 33 rows so the output data consists of (150000 - 33) rows.

I am interested, after plotting data, to return a new data frame contains the output data. i.e the data frame consists of the original data except the deleted rows. In my previous question, zx8754 suggested to get the same data as the output data manually by using filter(). Now, I am more interested to know how to get a data frame directly from ggplot(). This question asks for the same thing but the answers return a list not a data frame or a matrix by using :

 Output_data<-ggplot_build(p)

I am trying since many days and I read many documentation but I can't find a solution, especially that I am piping data by mutate()

EDIT: The answer of jzadra in the same similar question presents a close solution for my question by using

    ggplot_build(p)$plot$data

but not returning the same dimensions of the original data. Its gathering all features in the same column as

   data  Time     name   value
   <chr> <chr>    <chr>  <dbl>
 1 Data1 12:00:00 A1         0
 2 Data1 12:00:00 A2         0
 3 Data1 12:00:00 A3         0.1
 4 Data1 12:00:00 A4         0
 5 Data1 12:00:00 A5         0
 6 Data1 12:00:00 A6         0
 7 Data1 12:00:00 A7         0
 8 Data1 12:00:00 A8         0
 9 Data1 12:00:00 A9         0
10 Data1 12:00:00 A10        0
# … with ... more rows

while I am looking to get the output data as

    Time      A1    A2    A3   ...  A40
    12:00:00  0     0     0.1  ...  0.65
    12:00:30  0.15  0.32  0.2  ...  0.54   
    12:01:00  0     0.43  0.14 ...  0
        .
        .
        .

1 Answers1

0

Since you pivoted longer before you plotted you have to pivot_wider to get it back in the original shape.

library(dplyr)
library(tidyr)

Data <- ggplot_build(p)$plot$data

Data %>% 
   pivot_wider(names_from = name, values_from = value) %>%
   select(-data)
#> # A tibble: 1 x 11
#>   Time      A1    A2    A3    A4    A5    A6    A7    A8    A9   A10
#>   <time> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 12:00      0     0   0.1     0     0     0     0     0     0     0
Chuck P
  • 3,862
  • 3
  • 9
  • 20
  • Thanks for your answer. Are you sure that this returns the output data? isn't this the input data? i.e don't you think that this data is exactly the original data before processing and deleting outliers by ggplot()? – Sophie Allan Oct 09 '20 at 13:09
  • According to the doco... **"ggplot_build() takes the plot object, and performs all steps necessary to produce an object that can be rendered. This function outputs two pieces: a list of data frames (one for each layer), and a panel object, which contain all information about axis limits, breaks etc."** your example has one layer but you can certainly test and find out. Run the code do you get the right number of rows? – Chuck P Oct 09 '20 at 13:25
  • It seems for me that it returned the original input data without processing! The number of rows for the returned data is exactly the same for the input data while it should be less as I got a warning that ggplot() deleted 33 rows when plotting – Sophie Allan Oct 09 '20 at 13:32
  • Hmmmm well `layer_data(p)` will give you back the x & y values that were plotted but since they are no longer in the same format it just leads back to a filter operation – Chuck P Oct 09 '20 at 13:52
  • so , as a result, it is impossible to return what I am looking for in a data frame. Thanks a lot :) – Sophie Allan Oct 09 '20 at 14:05
  • Well I don't consider myself the worlds greatest expert on ``ggplot2` perhaps someone else knows a way I'll pull my answer which was mainly about pivotting back to the same dimensions, not the internal workings of what ggplot does or does not keep. `layer_data(p)` does show via an `NA` in the `y` column all the points that did not get plotted – Chuck P Oct 09 '20 at 14:21