Octree's are really help you focus on keeping your cube world chunks divided into equal portions. The trick that they play is that as you keep feeding in cubes into the Octree branches you are automatically subdividing these into pieces that you can later extract on a proximity basis (ie Chunk01(128,0,0) Chunk02(256,0,0) etc..)
As you store the cubes into the octree you also define their "type" (material or some sort of enum that describes them). By doing this you also allow the Octree to group these cubes together if they become neighbours (ie grass, stone, sand etc). This then enables the octree algorithm to handle some of the intelligence for you in around how to collapse / condense these into 1x larger square/rectangle primitive enabling you to reduce your cube count.
At the end of the day you have two approaches to the cube world, the first is you want the end user to believe that you made the world 1x cube at a time but in reality you're actually merging your mesh/vertex count to give that illusion as a chunk of a surface is really just 1x big mesh and not 100's of cubes. It's only when the user interacts with the cube (ie remove/add) that you recalculate your mesh and bring the illusion of multiple cubes into reality ..but then you immediately swap it back out for a giant consolidated mesh.
Once you have got your octree running and you have your chunks consolidating down into single mesh forms (by feeding an array of vertex's into the GPU buffer) you then also have to do some additional work outside the octree around culling. There's a whole bunch of ways you can do this, volume culling (ray tracing) is typically what i've found most recommend but i think this could be done better though.
In my example here: https://vimeo.com/71330826 i basically approach it differently, in that given I know where the camera centre point is and I also track which direction the camera is facing i then basically use this as a proximity focused culling whereby if the chunk in question is behind the camera i destroy them reducing my GPU load as I go.
Also you'll note that each "Chunk" still has its face still visible on the outside, in that I cull the inside of each chunk from being created but given each Chunk is handling its own workload i still render the actual chunk "walls" beneath the camera... that I've later fixed by also keeping track of the Z depth of the user and the camera bounds