-1

I have a two data table (csv) which contain information about a MOOC course.

The first table contains information about mouse movments (distance). Like this:

1-2163.058../2-20903.66351.../3-25428.5415..

The first number means the day (1- first day, 2- second day, etc.) when it happens, the second number means the distance in pixel. (2163.058, 20903.66351, etc.).

The second table contains the same information but instead of distance, there is the time was recorded. Like this:

1-4662.0/2-43738.0/3-248349.0....

The first number means the day (1- first day, 2- second day, etc.) when it happens, the second number means the time in milliseconds.

In the table, every column records a data from the specific web page, and every row records a user behaviour on this page.

I want to create a new table with the same formation, where I want to count the speed by pixel. Divide the distance table with time table which gives new table with same order, shape.

Here are two links for the two tables goo.gl/AVQW7D goo.gl/zqzgaQ

How can I do this with raw csv?

> dput(distancestream[1:3,1:3]) 

structure(list(id = c(2L, 9L, 10L),
               `http//tanul.sed.hu/mod/szte/frontpage.php` = structure(c(2L,  1L, 1L), 
                                                                       .Label = c("1-0", "1-42522.28760403924"), 
                                                                       class = "factor"), 
               `http//tanul.sed.hu/mod/szte/register.php` = c(0L, 0L, 0L)), 
          .Names = c("id", "http//tanul.sed.hu/mod/szte/frontpage.php",  
                     "http//tanul.sed.hu/mod/szte/register.php"), 
          class = c("data.table", 0x0000000002640788))


> dput(timestream[1:3,1:3]) 

structure(list(id = c(2L, 9L, 10L), 
              `http//tanul.sed.hu/mod/szte/frontpage.php` = structure(c(2L,  1L, 1L), 
                                                                      .Label = c("0", "1-189044.0"), 
                                                                      class = "factor"),
              `http//tanul.sed.hu/mod/szte/register.php` = c(0L,  0L, 0L)), 
         .Names = c("id", 
                    "http//tanul.sed.hu/mod/szte/frontpage.php",  
                    "http//tanul.sed.hu/mod/szte/register.php"), 
         class = c("data.table", 0x0000000002640788))

Picture of two rows from first and second table

gung - Reinstate Monica
  • 11,583
  • 7
  • 60
  • 79
  • 1
    Welcome to SO. Please use `dput` to provide sample data and precise expected results – HubertL Jul 19 '17 at 20:11
  • "Please use dput to provide sample data" ??? – Gábor Kőrösi Jul 19 '17 at 20:14
  • https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – HubertL Jul 19 '17 at 20:17
  • the header is too big to make dup "picture"...[link](goo.gl/AVQW7D) [link](goo.gl/zqzgaQ) – Gábor Kőrösi Jul 19 '17 at 20:21
  • only include columns / rows needed for the question – HubertL Jul 19 '17 at 20:24
  • I understand what you want, but the string is too long (in cell) to make a preview. Here is two links for two table goo.gl/AVQW7D goo.gl/zqzgaQ – Gábor Kőrösi Jul 19 '17 at 20:32
  • You can try with `dput(df1[1:3,1:3])` – HubertL Jul 19 '17 at 20:41
  • dput(distancestream[1:3,1:3]) structure(list(id = c(2L, 9L, 10L), `http://tanul.sed.hu/mod/szte/frontpage.php` = structure(c(2L, 1L, 1L), .Label = c("1-0", "1-42522.28760403924"), class = "factor"), `http://tanul.sed.hu/mod/szte/register.php` = c(0L, 0L, 0L )), .Names = c("id", "http://tanul.sed.hu/mod/szte/frontpage.php", "http://tanul.sed.hu/mod/szte/register.php"), class = c("data.table", "data.frame"), row.names = c(NA, -3L), .internal.selfref = ) – Gábor Kőrösi Jul 19 '17 at 20:51
  • Please edit your question, insert this code for both tables, and precisely describe the values you want to get – HubertL Jul 19 '17 at 20:53

1 Answers1

0

This may not be the most efficient method, but I believe it should yield the result you are looking for.

# Set file paths
  dist.file <- # C:/Path/To/Distance/File.csv
  time.file <- # C:/Path/To/Time/File.csv

# Read data files
  dist <- read.csv(dist.file, stringsAsFactors = FALSE)
  time <- read.csv(time.file, stringsAsFactors = FALSE)

# Create dataframe for speed values
  speed <- dist
  speed[,2:ncol(speed)] <- NA

# Create progress bar
  pb <- txtProgressBar(min = 0, max = ncol(dist) * nrow(dist), initial = 0, style = 3, width = 20)
  item <- 0

# Loop through all columns and rows of distance data
  for(col in 2:ncol(dist)){
    for(r in 1:nrow(dist)){
      # Check that current item has data to be calculated
      if(dist[r,col] != 0 & dist[r,col] != "1-0" & !is.na(time[r,col])){
        # Split the data into it's separate day values
        dists <- lapply(strsplit(strsplit(dist[r,col], "/")[[1]], "-"), as.numeric)
        times <- lapply(strsplit(strsplit(time[r,col], "/")[[1]], "-"), as.numeric)

        # Calculate the speeds for each day
        speeds <- sapply(dists, "[[", 2) / sapply(times, "[[", 2)

        # Paste together the day values and assign to the current item in speed dataframe
        speed[r,col] <- paste(sapply(dists, "[[", 1), format(speeds, digits = 20), sep = "-", collapse = "/")
      } else{
        # No data to calculate, assign 0 to current item in speed dataframe
        speed[r,col] <- 0
      }

      # Increase progress bar counter
      item <- item + 1
      setTxtProgressBar(pb,item)
    }
  }

# Create a csv for speed data
  write.csv(speed, "speed.csv")
Matt Jewett
  • 3,249
  • 1
  • 14
  • 21