That is a great article to learn about pagerank. I implemented a Perl version from it here to use with Textrank. However, if you want to just learn about pagerank and how the various aspects discussed in the article affect the results (dampening factor, direct or undirected graph, etc.), I would recommend running experiments in R or Octave. If you want to learn how to implement it efficiently, then programming it up from scratch, as you are doing, is best.
Most web graphs (or networks) are very sparse, which means most of the entries in the matrix representation of the graph are zero. A common data structure used to represent a sparse matrix is a hash-map, where the zero values are not stored. For example, if the matrix was
1, 0, 0
0, 0, 2,
0, 3, 0
a two dimension hash-map would store only the values for hm(0,0)=1, hm(1,2)=2, and hm(2,1)=3. So in a 1,000,000 by 1,000,000 matrix of a web graph, I would expect only a few million values to be non-zero. If each row averages only 5 non-zero values, a hash-map will use about 5*(8+8+8)10^6 bytes ~ 115mb to store it (8 for the left int index, 8 for the right int index, and 8 for the double value). The square matrix will use 810^6*10^6 ~ 7 terabytes.
Implementing an efficient sparse matrix-vector multiply in Java is not trivial, and there are some already implemented if you don't want to devote time to that aspect of the algorithm. The sparse-matrix multiply is the most difficult aspect to implement of the pagerank algorithm, so after that it gets easier (and more interesting).