I'm sorry if this is a duplicate question, but I have looked around at similar problems and haven't been able to find a real solution. Anyway, here goes:
I've read a .csv file into a table. There I'm dealing with 3 columns: "ID"(author's ID), "num_pub"(number of articles published), and "year"(spans from 1930 to 2017).
I would like to get a final table where I would have "num_pub" for each "year", for every "ID". So rows would be "ID"s, columns would be "year"s, and underneath each year there would be the corresponding "num_pub" or 0 value if the author hasn't published anything.
I have tried creating two new tables and merging them in a few different ways described here but to no avail.
So first I read my file into a table:
tab<-read.table("mytable.csv",sep=",",head=T,colClasses=c("character","numeric","factor"))
head(tab,10)
ID num_pub year
1 00002 1 1977
2 00002 2 1978
3 00002 1 1983
4 00002 4 1984
5 00002 3 1990
6 00002 1 1994
7 00002 2 1996
8 00004 3 1957
9 00004 1 1958
10 00004 1 1959
With that, I was then able to create a table where for each "ID", there was every single "year", and if the author published in that year, the value was 1, otherwise it was 0:
a<-table(tab[,1], tab[,3])
Calling head(a,1)
returns the following table: pic
I would like to know how to achieve the desired result I described above. Namely, having a table where rows would be populated with "ID"s, columns would be populated with "year"s (from 1930 to 2017), and underneath each year, there would be an actual "num_pub" value or a 0 value. The structure of the table would be just like the one shown in the pic
Thank you for your time and help. I'm very new to R, and kind of stuck in the mud with this.
Edit: the reshape approach as explained here does not solve my problem. I need zeros in place of "NA"s, and I want my year to start with 1930 instead of the first year that the author has published.