I have the following the following data frame (‘mydf’):
ID Type Count Mark
U1 A 50 R1
U1 A 50 R3
U1 A 50 R4
U1 A 50 R3
U2 B 24 R2
U2 B 24 R5
U3 A 88 R1
Each row contains 4 variables:
ID: the user’s ID.
Type: the user’s type, which is categorical and consistent for all rows with that ID.
Count: the user’s count, which is continuous and consistent for all rows with that ID.
Mark: one of 15 categorical symbols, which may vary between different rows with the same ID.
Presently, there is one row for each mark which appears, so that there are multiple rows per ID.
The number of marks is different for different IDs (anything from 4-50).
What I a need to do:
Consolidate rows with the same ID into a single row, which contains all the consistent information (Type/Count), as well as a listing (or count) of all the marks which appear for that ID. This listing needs to reflect how often each mark appears for that ID (for example, if the same mark appears twice for a single ID, this needs to be reflected, as happens for U1 in rows 2 and 4).
Get a count of the number of time each mark appears for each ID.
Get a count of the total number of marks for each ID.
Listing of the marks for each ID is not necessary as long as the counts described in 2 and 3 exist (see for example the expected output below).
Expected output (approximately):
ID Type Count Mark_R1 Mark_R2 Mark_R3 Mark_R4 Mark_R5 Mark_Total
U1 A 50 1 0 2 1 0 3
U2 B 24 0 1 0 0 1 2
U3 A 88 1 0 0 0 0 1
Thank you for your help!