0

What is the best way to transform a list of nested characters to numeric? The goal is to apply this to the column 'Times' to then generate a third column with a result (like a mean statistic). It is important to maintain rows that do not contain data (which are rows 1-9 below).

  df   <- with(df, tapply(df$X1, df$X2, FUN))

  library(tidyverse)

  a = map(df, paste0, collapse = " ") %>% bind_rows() %>% gather(Position, Times)

 > a
  # A tibble: 1,619 x 2
     Position Times                                                                               
    <chr>     <chr>                                                                             
  1 1         ""                                                                                
  2 2         ""                                                                                
  3 3         ""                                                                                
  4 4         ""                                                                                
  5 5         ""                                                                                
  6 6         ""                                                                                
  7 7         ""                                                                                
  8 8         ""                                                                                
  9 9         ""                                                                                
 10 10        0.823611571913153 0.00954654673930752 0.0388007144639849 0.0171526506226838 0.219~
 # ... with 1,609 more rows

The dput(a) for rows 1:10:

structure(list(a.Position = c("1", "2", "3", "4", "5", "6", "7", 
"8", "9", "10"), a.Times = c("", "", "", "", "", "", "", "", 
"", "0.823611571913153 0.00954654673930752 0.0388007144639849 0.0171526506226838 0.219081251336788 0.2907764363945 0.378645574334759 0.253864150829567 0.235011993879071 0.0573025939576098 0.292383292892179 0.0752965287131889 0.180009058773757 0.587338807217526 0.240031940583238 0.660037942910535 0.234009418566699"
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Eric K
  • 15
  • 5
  • Can you show the initial subset of data with `dput` and the expected output? Is `a` the expected output – akrun Nov 19 '19 at 18:50
  • It would be great if you update your post with the dput output. Otherwise, without the initial data, it is difficult for anybody to help – akrun Nov 19 '19 at 19:01
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Nov 19 '19 at 19:03
  • I'm not yet sure how to generate a df that is reproducible for this question. I'll keep trying. I have also added a summary dput, and an output would result in a similar tibble but with a third summary column and be for columns 2 and 3. – Eric K Nov 19 '19 at 19:41
  • Hi @EricK, I hope I got your dput(a) correct. I cannot use what you pasted just now – StupidWolf Nov 19 '19 at 22:07

1 Answers1

0

Here's one approach: split the record into multiple rows using separate_rows, and then combine into a single row again using group_by.

library(tidyverse)
a %>%
  separate_rows(Times, sep = " ") %>%
  group_by(Position) %>%
  summarize(MeanTimes = mean(as.numeric(Times))) %>%
  arrange(as.numeric(Position))

I gather from your question that you might actually be looking for something other than the mean; if so, you can replace mean with whatever function you need.

A. S. K.
  • 2,504
  • 13
  • 22