52

In Unix shell programming the pipe operator is an extremely powerful tool. With a small set of core utilities, a systems language (like C) and a scripting language (like Python) you can construct extremely compact and powerful shell scripts, that are automatically parallelized by the operating system.

Obviously this is a very powerful programming paradigm, but I haven't seen pipes as first class abstractions in any language other than a shell script. The code needed to replicate the functionality of scripts using pipes seems to always be quite complex.

So my question is why don't I see something similar to Unix pipes in modern high-level languages like C#, Java, etc.? Are there languages (other than shell scripts) which do support first class pipes? Isn't it a convenient and safe way to express concurrent algorithms?

Just in case someone brings it up, I looked at the F# pipe-forward operator (forward pipe operator), and it looks more like a function application operator. It applies a function to data, rather than connecting two streams together, as far as I can tell, but I am open to corrections.

Postscript: While doing some research on implementing coroutines, I realize that there are certain parallels. In a blog post Martin Wolf describes a similar problem to mine but in terms of coroutines instead of pipes.

cdiggins
  • 17,602
  • 7
  • 105
  • 102
  • 6
    I've really wondered the same thing, but never thought to actually ask. – dsimcha Oct 19 '09 at 03:52
  • FWIW you might be interested in http://en.wikipedia.org/wiki/Hartmann_pipeline – Mark Oct 19 '09 at 03:55
  • Keep in mind that the "data" that the F# forward pipe operator applies the function to can itself be a function, or a sequence of functions. – brianary Oct 19 '09 at 05:01
  • @Mark, the Harmann pipeline was very much related to what I wanted. @Brinary: I don't know how to apply data to a sequence of functions, are they composed together? I'm assuming the data isn't applied to them all concurrently. – cdiggins Oct 19 '09 at 23:59
  • 1
    It also blows my mind that so few languages do this well. Chaining iterators in java is a PITA. – Adam Gent May 18 '11 at 18:29
  • I don't know what kind of an answer you're expecting to this question, other than 'because they didn't put it in' or 'because they didn't think of it' or whatever, all of which is just guesswork anyway. – user207421 Jan 27 '15 at 16:03
  • 1
    There's `|>` in F# and `%>%` in R – Neil McGuigan Apr 07 '17 at 23:48

16 Answers16

13

Haha! Thanks to my Google-fu, I have found an SO answer that may interest you. Basically, the answer is going against the "don't overload operators unless you really have to" argument by overloading the bitwise-OR operator to provide shell-like piping, resulting in Python code like this:

for i in xrange(2,100) | sieve(2) | sieve(3) | sieve(5) | sieve(7):
    print i

What it does, conceptually, is pipe the list of numbers from 2 to 99 (xrange(2, 100)) through a sieve function that removes multiples of a given number (first 2, then 3, then 5, then 7). This is the start of a prime-number generator, though generating prime numbers this way is a rather bad idea. But we can do more:

for i in xrange(2,100) | strify() | startswith(5):
    print i

This generates the range, then converts all of them from numbers to strings, and then filters out anything that doesn't start with 5.

The post shows a basic parent class that allows you to overload two methods, map and filter, to describe the behavior of your pipe. So strify() uses the map method to convert everything to a string, while sieve() uses the filter method to weed out things that aren't multiples of the number.

It's quite clever, though perhaps that means it's not very Pythonic, but it demonstrates what you are after and a technique to get it that can probably be applied easily to other languages.

Community
  • 1
  • 1
Chris Lutz
  • 73,191
  • 16
  • 130
  • 183
  • Not bad. I voted up of course, but my qualms are none of the functions are executed in parallel. What I really want is something more like the Erlang and Java answers. – cdiggins Oct 19 '09 at 23:56
  • 1
    @cdiggins - Your qualms are an implementation detail. This same functionality could have easily been implemented in C (or even assembly?) to take advantage of parallelism, and because all the brute work is done in the base class, derived classes written in pure Python would still get that power (the two methods that you have to overload - `map` and `filter` - only work on one element at a time). Of course, if we're willing to implement it, we could do the same for most regular functional constructs, with or without the Unix-pipe interface. – Chris Lutz Oct 20 '09 at 01:15
  • I'm actually not surprised that you can't use the pipe character itself, but I'm a little disappointed that there's no easy "concurrent data pipes" available in some way. Something like `concat_file(myfile) |> filter("mypattern") |> write_file(myoutfile)` (or something of the sorts) could work, right? Basically it would create 3 threads and send data from one to the other in parallel. – diogovk Jun 17 '15 at 20:45
7

You can do pipelining type parallelism quite easily in Erlang. Below is a shameless copy/paste from my blogpost of Jan 2008.

Also, Glasgow Parallel Haskell allows for parallel function composition, which amounts to the same thing, giving you implicit parallelisation.

You already think in terms of pipelines - how about "gzcat foo.tar.gz | tar xf -"? You may not have known it, but the shell is running the unzip and untar in parallel - the stdin read in tar just blocks until data is sent to stdout by gzcat.

Well a lot of tasks can be expressed in terms of pipelines, and if you can do that then getting some level of parallelisation is simple with David King's helper code (even across erlang nodes, ie. machines):

pipeline:run([pipeline:generator(BigList),
          {filter,fun some_filter/1},
          {map,fun_some_map/1},
          {generic,fun some_complex_function/2},
          fun some_more_complicated_function/1,
          fun pipeline:collect/1]).

So basically what he's doing here is making a list of the steps - each step being implemented in a fun that accepts as input whatever the previous step outputs (the funs can even be defined inline of course). Go check out David's blog entry for the code and more detailed explanation.

Mark
  • 1,304
  • 12
  • 22
  • This is very much along the lines of what I am thinking about. I need to study it a bit more tomorrow after some coffee! – cdiggins Oct 19 '09 at 04:20
  • 1
    I accepted this answer, because it is the most elegant solution I saw and Mark pointed me to Hartmann pipelines in the above comment, which lead me to Flow-based programming. – cdiggins Oct 20 '09 at 00:06
  • Is map-reduce an example of a pipeline? I really like JavaScript's built-in functions fire that. But I too am disappointed even golang has no special syntax like bash does – Sridhar Sarnobat Jun 06 '21 at 07:40
7

magrittr package provides something similar to F#'s pipe-forward operator in R:

rnorm(100) %>% abs %>% mean

Combined with dplyr package, it brings a neat data manipulation tool:

iris %>%
  filter(Species == "virginica") %>%
  select(-Species) %>%
  colMeans
gkcn
  • 1,360
  • 1
  • 12
  • 23
5

You can find something like pipes in C# and Java, for example, where you take a connection stream and put it inside the constructor of another connection stream.

So, you have in Java:

new BufferedReader(new InputStreamReader(System.in));

You may want to look up chaining input streams or output streams.

James Black
  • 41,583
  • 10
  • 86
  • 166
  • 1
    I was thinking more about two separate processes running in parallel one spitting out data, and the other consuming data. – cdiggins Oct 19 '09 at 04:18
  • 2
    Then you just use a PipedInput (Output) Stream: http://java.sun.com/javase/6/docs/api/java/io/PipedInputStream.html, but this is not in parallel, but you can convert data as you go through the streams. – James Black Oct 19 '09 at 04:38
  • @James, the PipedInputStream and PipedOutStream is very close to what I am looking for. I was wondering about first class support for this thing (with automatic parallelism) in languages. Turns out that what I was looking for is called "flow programming". Thank you very much though for showing me some cool Java stuff. – cdiggins Oct 20 '09 at 00:02
  • 1
    The problem with this approach is that you have to think "backwards": when reading the code, you first read the last function and end with the first applied function. I.e. the order is reversed as compared to shell's pipe, and to the order of operations. I would prefer something like `System.in | (new InputStreamReader) | (new BufferedReader)` – jrouquie Jun 26 '12 at 13:49
  • @jrouquie - In functional programming type languages this is now easier to do, using F# or Scala, for example, to get the behavior you want. – James Black Jun 26 '12 at 18:54
5

Thanks to all of the great answers and comments, here is a summary of what I learned:

It turns out that there is an entire paradigm related to what I am interested in called Flow-based programming. A good example of a language designed specially for flow-based programming is Hartmann pipelines. Hartamnn pipelines generalize the idea of streams and pipes used in Unix and other OS's, to allows for multiple input and output streams (rather than just a single input stream, and two output streams). Erlang contains powerful abstractions that make it easy to express concurrent processes in a manner which resembles pipes. Java provides PipedInputStream and PipedOutputStream which can be used with threads to achieve the same kind of abstractions in a more verbose manner.

cdiggins
  • 17,602
  • 7
  • 105
  • 102
  • And thank you for posting the question. Id been wondering for a while the same thing but was discouraged from asking due to the hostile environment stack overflow is – Sridhar Sarnobat Jun 06 '21 at 07:45
3

I think the most fundamental reason is because C# and Java tend to be used to build more monolithic systems. Culturally, it's just not common to even want to do pipe-like things -- you just make your application implement the necessary functionality. The notion of building a multitude of simple tools and then gluing them together in arbitrary ways just isn't common in those contexts.

If you look at some of the scripting languages, like Python and Ruby, there are some pretty good tools for doing pipe-like things from within those scripts. Check out the Python subprocess module, for example, which allows you to do things like:

proc = subprocess.Popen('cat -',
                       shell=True,
                       stdin=subprocess.PIPE,
                       stdout=subprocess.PIPE,)
stdout_value = proc.communicate('through stdin to stdout')[0]
print '\tpass through:', stdout_value
divegeek
  • 4,795
  • 2
  • 23
  • 28
  • ISn't this just driving the Shell's pipe from Python? As I understand it, the OP's question is about having a pipe-like construct inside the language. – jrouquie Jun 26 '12 at 13:52
  • Interesting, I thought all pipes just wrapped the os pipes just like threads but now that you mention it I am wondering when are they and are they not their own implementation of the paradigm. Id like to think we shouldn't reinvent the wheel but this is hardcore stuff I may not have a deep awareness of. All I know is I like pipelines – Sridhar Sarnobat Jun 06 '21 at 07:49
3

Usually you just don't need it and programs run faster without it.

Basically piping is consumer/producer pattern. And it's not that hard to write those consumers and producers because they don't share much data.

  • Piping for Python : pypes
  • Mozart-OZ can do pipes using ports and threads.
ohaal
  • 5,208
  • 2
  • 34
  • 53
Egon
  • 1,705
  • 18
  • 32
3

Are you looking at the F# |> operator? I think you actually want the >> operator.

2

Objective-C has the NSPipe class. I use it quite frequently.

Dave DeLong
  • 242,470
  • 58
  • 448
  • 498
2

Since R added pipe operator today, it's worth to mention Julialang has pipe all a long:

help?> |>
search: |>

  |>(x, f)


  Applies a function to the preceding argument. This allows for easy function chaining.

  Examples
  ≡≡≡≡≡≡≡≡≡≡

  julia> [1:5;] |> x->x.^2 |> sum |> inv
  0.01818181818181818
jling
  • 2,160
  • 12
  • 20
1

I've had a lot of fun building pipeline functions in Python. I have a library I wrote, I put the contents and a sample run here. The best fit me for has been XML processing, described in this Wikipedia article.

Joel Bender
  • 885
  • 4
  • 8
1

Streaming libraries based on coroutines have existed in Haskell for quite some time now. Two popular examples are conduit and pipes.

Both libraries are well-written and well-documented, and are relatively mature. The Yesod web framework is based on conduit, and it's pretty damn fast. Yesod is competitive with Node on performance, even beating it in a few places.

Interestingly, all of the these libraries are single-threaded by default. This is because the single motivating use case for pipelines is servers, which are I/O bound.

Lambda Fairy
  • 13,814
  • 7
  • 42
  • 68
1

You can do pipe like operations in Java by chaining/filtering/transforming iterators. You can use Google's Guava Iterators.

I will say even with the very helpful guava library and static imports its still ends up being lots of Java code.

In Scala its quite easy to make your own pipe operator.

Adam Gent
  • 47,843
  • 23
  • 153
  • 203
0

if you're still interested in an answer...

you can look at factor, or the older joy and forth for the concatenative paradigm. in arguments and out arguments are implicit, dumped to a stack. then the next word (function) takes that data and does something with it.

the syntax is postfix.

"123" print

where print takes one argument, whatever is in the stack.

kobi7
  • 969
  • 1
  • 7
  • 15
0

You can use my library in python: github.com/sspipe/sspipe

mhsekhavat
  • 977
  • 13
  • 18
0

In Mathematica, you can use //

for example

f[g[h[x,parm1],parm2]]

quite a mess.

could be written as

x // h[#, parm1]& // g[#, parm2]& // f

the # and & is lambda in Mathematica


In js, there seems to be pipe operator |> soon.

https://github.com/tc39/proposal-pipeline-operator

AsukaMinato
  • 1,017
  • 12
  • 21