We created a MOOC course, where everything (clicking, attitudes, video viewing, etc.) was logged by a logging system. 100-150 students signed up to this course.
As a result of this research, we got a log file (json). With R i prepared this dataframe:
log_data <- ndjson::stream_in("log-export-20160721_1030.json")
dplyr::glimpse(log_data)
Observations: 1,443,817
Variables: 22
$ _id.$oid <chr> "5707a89dcbbb4d92129ee44c", "5707a89...
$ data <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ page <chr> "http://elearning.szte.hu/mod/szte/f...
$ pid <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2,...
$ time <chr> "2016.04.08. 14:48:24.691", "2016.04...
$ type <chr> "load", "mousemove", "mousemove", "m...
$ user <chr> "3", "3", "3", "3", "3", "3", "3", "...
$ data.realDistance <dbl> NA, 0.00000, 366.87055, 241.45600, N...
$ data.x <dbl> NA, 139, 176, 261, NA, 245, 1905, 21...
$ data.xDistance <dbl> NA, 0, 37, 85, NA, 16, NA, 111, NA, ...
$ data.y <dbl> NA, 29, 394, 620, NA, 761, 553, 451,...
$ data.yDistance <dbl> NA, 0, 365, 226, NA, 141, NA, 310, N...
$ data.text <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ data.top <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ data.target <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ data.filename <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ data.length <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ data.actualTime <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ data.src <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ data.totalTime <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ data.videoId <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ data.seekTime <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
My questions are:
How can I count the number of logs by users?
- Example: User 352 made 1000 log, but User 152 made 2 just log.
How can I group, split or separate the data table by user?