19

I have been reading up on Java 8 Streams and the way data is streamed from a data source, rather than have the entire collection to extract data from.

This quote in particular I read on an article regarding streams in Java 8.

No storage. Streams don't have storage for values; they carry values from a source (which could be a data structure, a generating function, an I/O channel, etc) through a pipeline of computational steps.

I understand the concept of streaming data in from a source piece by piece. What I don't understand is if you are streaming from a collection how is there no storage? The collection already exists on the Heap, you are just streaming the data from that collection, the collection already exists in "storage".

What's the difference memory-footprint wise if I were to just loop through the collection with a standard for loop?

Jason Law
  • 965
  • 1
  • 9
  • 21
user3587411
  • 205
  • 1
  • 2
  • 5

5 Answers5

39

The statement about streams and storage means that a stream doesn't have any storage of its own. If the stream's source is a collection, then obviously that collection has storage to hold the elements.

Let's take one of examples from that article:

int sum = shapes.stream()
                .filter(s -> s.getColor() == BLUE)
                .mapToInt(s -> s.getWeight())
                .sum();

Assume that shapes is a Collection that has millions of elements. One might imagine that the filter operation would iterate over the elements from the source and create a temporary collection of results, which might also have millions of elements. The mapToInt operation might then iterate over that temporary collection and generate its results to be summed.

That's not how it works. There is no temporary, intermediate collection. The stream operations are pipelined, so elements emerging from filter are passed through mapToInt and thence to sum without being stored into and read from a collection.

If the stream source weren't a collection -- say, elements were being read from a network collection -- there needn't be any storage at all. A pipeline like the following:

int sum = streamShapesFromNetwork()
                .filter(s -> s.getColor() == BLUE)
                .mapToInt(s -> s.getWeight())
                .sum();

might process millions of elements, but it wouldn't need to store millions of elements anywhere.

Stuart Marks
  • 127,867
  • 37
  • 205
  • 259
  • 1
    But in case I create a stream object like this `Stream stream = Stream.of("test", "test", "test", "test");` I can access this object later. Doesn't it store the elements? – Coffemanz Feb 13 '20 at 13:40
  • 2
    @Coffemanz Sure, but `Stream.of` is a special case. It's a varargs method, so the arguments passed to it are collected into an array, which is then used as the source of the stream. (All varargs methods work by collecting the arguments into an array; this isn't special to streams in any way.) Now consider something like `Stream s2 = Stream.of("test", "test", "test", "test").map(String::toUpperCase);`. There is no collection or array anywhere that contains four instances of `"TEST"`. The uppercased values are not stored but are generated lazily. – Stuart Marks Feb 13 '20 at 22:17
7

Think of the stream as a nozzle connected to the water tank that is your data structure. The nozzle doesn't have its own storage. Sure, the water (data) the stream provides is coming from a source that has storage, but the stream itself has no storage. Connecting another nozzle (stream) to your tank (data structure) won't require storage for a whole new copy of the data.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • 1
    Thanks for the response! I understand the data streaming part, but what is the memory (storage) overhead if I were to just use the collection directly as opposed to streaming it? If I use the collection directly I access the collection on the heap, if I stream it, I am still streaming the data from the collection residing on the heap. Either way the memory footprint remains the same in both methods. Please correct me if I am wrong / missing something here. – user3587411 Apr 22 '15 at 04:26
  • @user3587411: There is essentially no difference. One way might use a few more stack frames and temporary objects, but that's just a few bytes. – user2357112 Apr 22 '15 at 04:27
  • *"Either way the memory footprint remains the same in both methods."* The purpose of a stream isn't to save memory. – Radiodef Apr 22 '15 at 04:29
  • 1
    @Radiodef I see, is there a purpose/advantage to using stream over a regular loop for collections besides for lambda support, and ensuring the data is read-only (@alfasin) mentioned above there. – user3587411 Apr 22 '15 at 04:37
5
  1. Collection is a data structure. Based on the problem you decide which collection to be used like ArrayList, LinekedList (Considering time and space complexity) . Where as Stream is just a processing kind of tool, which makes your life easy.

  2. Other difference is, you can consider Collection as in-memory data structure, where you can add , remove element. Where as in Stream you can perform two kind of operation:

    a. Intermediate operation : Filter, map ,sort,limit on the result set
    b. Terminal operation : forEach ,collect the result set to a collection.

    But if you notice, with stream you can't add or remove elements.

  3. Stream is kind of iterator, you can traverse collection through stream. Note, you can traverse stream only once, let me give you an example to have better understanding:

Example1:

List<String> employeeNameList = Arrays.asList("John","Peter","Sachin");
    Stream<String> s = employeeNameList.stream();

    // iterate through list
    s.forEach(System.out :: println);  // this work's perfectly fine
    s.forEach(System.out :: println);  // you will get IllegalStateException, stating stream already operated upon

So, what you can infer is, collection you can iterate as many times as you want. But for the stream, once you iterate , it won't remember what it is supposed to do. So, you need to instruct it again.

I hope, it is clear.

Benoit
  • 3,569
  • 2
  • 22
  • 20
Rakesh
  • 1,374
  • 1
  • 16
  • 24
4

A stream is just a view of the data, it has no storage of its own and you can't modify the underlying collection (assuming it's a stream that was built on top a collection) through the stream. It's like a "read only" access.

If you have any RDBMS experience - it's the exact same idea of "view".

Nir Alfasi
  • 53,191
  • 11
  • 86
  • 129
0

Previous answer are mostly correct. Yet still a much more intuitive response follows (for Google passengers landing here):

Think of streams as UNIX pipelines of text: cat input.file | sed ... | grep ... > output.file

In general those UNIX text utilities will consume an small quantity of RAM compared to the processed input data.

That's not always the case. Think of "sort". This algorithm will need to keep intermediate stuff in memory. That same is true for streams. Sometimes temporal data will be needed. Most of the times it will not.

As an extra simile, to some extend "cloud-serverless APIs" follows this same UNIX pipelines o Java stream design. They do not exist in memory until the have some input data to process. The cloud OS will launch them and inject the input data. The output is sent gradually somewhere else, so the cloud-serverless-API does not consume many resources (most of the times).

Not absolute "trues" in this case.

earizon
  • 2,099
  • 19
  • 29