I have a file that looks like this:
type created_at repository_name
1 IssuesEvent 2012-03-11 06:48:31 bootstrap
2 IssuesEvent 2012-03-11 06:48:31 bootstrap
3 IssuesEvent 2012-03-11 06:48:31 bootstrap
4 IssuesEvent 2012-03-11 06:52:50 bootstrap
5 IssuesEvent 2012-03-11 06:52:50 bootstrap
6 IssuesEvent 2012-03-11 06:52:50 bootstrap
7 IssueCommentEvent 2012-03-11 07:03:57 bootstrap
8 IssueCommentEvent 2012-03-11 07:03:57 bootstrap
9 IssueCommentEvent 2012-03-11 07:03:57 bootstrap
10 IssuesEvent 2012-03-11 07:03:58 bootstrap
11 IssuesEvent 2012-03-11 07:03:58 bootstrap
12 IssuesEvent 2012-03-11 07:03:58 bootstrap
13 WatchEvent 2012-03-11 07:15:44 bootstrap
14 WatchEvent 2012-03-11 07:15:44 bootstrap
15 WatchEvent 2012-03-11 07:15:44 bootstrap
16 WatchEvent 2012-03-11 07:18:45 hogan.js
17 WatchEvent 2012-03-11 07:18:45 hogan.js
18 WatchEvent 2012-03-11 07:18:45 hogan.js
The dataset that I'm working with can be accessed on https://github.com/aronlindberg/VOSS-Sequencing-Toolkit/blob/master/twitter_exploratory_analysis/twitter_events_mini.csv.
I want to create a table that has a column for each entry in the "repository_name" column (e.g. bootstrap, hogan.js). In that column I need to have the data from the "type" column that corresponds to that entry (i.e. only rows form the current "type" column that also has the value "bootstrap" in the current "repository_name" column should fall under the new "bootstrap" column). Hence:
- Time stamps is just for ordering and do not need to by synchronized across the row (in fact they can be deleted, as the data is already sorted according to timestamps)
- Even if "IssuesEvent" is repeated 10x I need to retain all of these, since I will be doing sequence analysis using the R package TraMineR
- Columns can be of unequal length
- There is no relationship between the columns for different repos ("repository_name")
In other words, I would want a table that looks something like this:
bootstrap hogan.js
1 IssuesEvent PushEvent
2 IssuesEvent IssuesEvent
3 OssueCommentEvent WatchEvent
How can I accomplish this in R?
Some of my failed attempts using the reshape package can be found on https://github.com/aronlindberg/VOSS-Sequencing-Toolkit/blob/master/twitter_exploratory_analysis/reshaping_bigqueries.R.