0

We created a MOOC course, where everything (clicking, attitudes, video viewing, etc.) was logged by a logging system. 100-150 students signed up to this course.

As a result of this research, we got a log file (json). With R i prepared this dataframe:

log_data <- ndjson::stream_in("log-export-20160721_1030.json")
dplyr::glimpse(log_data)

Observations: 1,443,817
 Variables: 22
 $ _id.$oid          <chr> "5707a89dcbbb4d92129ee44c", "5707a89...
 $ data              <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ page              <chr> "http://elearning.szte.hu/mod/szte/f...
 $ pid               <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2,...
 $ time              <chr> "2016.04.08. 14:48:24.691", "2016.04...
 $ type              <chr> "load", "mousemove", "mousemove", "m...
 $ user              <chr> "3", "3", "3", "3", "3", "3", "3", "...
 $ data.realDistance <dbl> NA, 0.00000, 366.87055, 241.45600, N...
 $ data.x            <dbl> NA, 139, 176, 261, NA, 245, 1905, 21...
 $ data.xDistance    <dbl> NA, 0, 37, 85, NA, 16, NA, 111, NA, ...
 $ data.y            <dbl> NA, 29, 394, 620, NA, 761, 553, 451,...
 $ data.yDistance    <dbl> NA, 0, 365, 226, NA, 141, NA, 310, N...
 $ data.text         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.top          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.target       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.filename     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.length       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.actualTime   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.src          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.totalTime    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.videoId      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.seekTime     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...

My questions are:

How can I count the number of logs by users?

  • Example: User 352 made 1000 log, but User 152 made 2 just log.

How can I group, split or separate the data table by user?

  • If you are interested in using `dplyr`, try looking at `group_by` and its examples. For example, `log_data %>% group_by(user) %>% summaries(N=n())` will give you th count of the number of rows for each `user` in the `log_data` data frame. – aichao Jan 18 '17 at 16:11
  • Welcome to StackOverflow. Please take a look at these tips on how to produce a [minimum, complete, and verifiable example](http://stackoverflow.com/help/mcve), as well as this post on [creating a great example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). In base R something like `split(myDataset, myDataset$users)` will create a list of data.frames. To count the number of logs, take a look at [this post](http://stackoverflow.com/questions/9809166/is-there-an-aggregate-fun-option-to-count-occurrences). – lmo Jan 18 '17 at 16:16
  • thanx.................. – Gábor Kőrösi Jan 18 '17 at 19:22

0 Answers0