8

Is there any possible way to achieve a non-lazy left to right invocation of operations on a list in Python?

E.g. Scala:

 val a = ((1 to 50)
  .map(_ * 4)
  .filter( _ <= 170)
  .filter(_.toString.length == 2)
  .filter (_ % 20 == 0)
  .zipWithIndex
  .map{ case(x,n) => s"Result[$n]=$x"}
  .mkString("  .. "))

  a: String = Result[0]=20  .. Result[1]=40  .. Result[2]=60  .. Result[3]=80

While I realize many folks will not prefer the above syntax, I like the ability to move left to right and add arbitrary operations as we go.

The Python for comprehension is IMO not easy to read when there are three or more operations. The result seems to be we're required to break everything up into chunks.

[f(a) for a in g(b) for b in h(c) for ..]

Is there any chance for the approach mentioned?

Note: I tried out a few libraries including toolz.functoolz. That one is complicated by Python 3 lazy evaluation: each level returns a map object. In addition, it is not apparent that it can operate on an input list.

Georgy
  • 12,464
  • 7
  • 65
  • 73
WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560
  • 1
    Related: [How to multiply functions in python?](https://stackoverflow.com/q/30195045/674039) <-- you can decorate functions to make them composable, and then multiply them like `f*g*h`. Then you would use `(f*g*h)(data)`. – wim Feb 27 '18 at 21:40

4 Answers4

6

The answer from @JohanL does a nice job of seeing what the closest equivalent is in standard python libraries.

I ended up adapting a gist from Matt Hagy in November 2019 that is now in pypi

https://pypi.org/project/infixpy/

from infixpy import *
a = (Seq(range(1,51))
     .map(lambda x: x * 4)
     .filter(lambda x: x <= 170)
     .filter(lambda x: len(str(x)) == 2)
     .filter( lambda x: x % 20 ==0)
     .enumerate() 
     .map(lambda x: 'Result[%d]=%s' %(x[0],x[1]))
     .mkstring(' .. '))
print(a)

  # Result[0]=20  .. Result[1]=40  .. Result[2]=60  .. Result[3]=80

Other approaches described in other answers

Older approaches

I found a more appealing toolkit in Fall 2018

https://github.com/dwt/fluent

enter image description here

After a fairly thorough review of the available third party libraries it seems the Pipe https://github.com/JulienPalard/Pipe best suits the needs .

You can create your own pipeline functions. I put it to work for wrangling some text shown below. the bolded line is where the work happens. All those @Pipe stuff I only have to code once and then can re-use.

The task here is to associate the abbreviation in the first text:

rawLabels="""Country: Name of country
Agr: Percentage employed in agriculture
Min: Percentage employed in mining
Man: Percentage employed in manufacturing
PS: Percentage employed in power supply industries
Con: Percentage employed in construction
SI: Percentage employed in service industries
Fin: Percentage employed in finance
SPS: Percentage employed in social and personal services
TC: Percentage employed in transport and communications"""

With an associated tag in this second text:

mylabs = "Country Agriculture Mining Manufacturing Power Construction Service Finance Social Transport"

Here's the one-time coding for the functional operations (reuse in subsequent pipelines):

@Pipe
def split(iterable, delim= ' '):
    for s in iterable: yield s.split(delim)

@Pipe
def trim(iterable):
    for s in iterable: yield s.strip()

@Pipe
def pzip(iterable,coll):
    for s in zip(list(iterable),coll): yield s

@Pipe
def slice(iterable, dim):
  if len(dim)==1:
    for x in iterable:
      yield x[dim[0]]
  elif len(dim)==2:
    for x in iterable:
      for y in x[dim[0]]:
        yield y[dim[1]]
    
@Pipe
def toMap(iterable):
  return dict(list(iterable))

And here's the big finale : all in one pipeline:

labels = (rawLabels.split('\n') 
     | trim 
     | split(':')
     | slice([0])
     | pzip(mylabs.split(' '))
     | toMap )

And the result:

print('labels=%s' % repr(labels))

labels={'PS': 'Power', 'Min': 'Mining', 'Country': 'Country', 'SPS': 'Social', 'TC': 'Transport', 'SI': 'Service', 'Con': 'Construction', 'Fin': 'Finance', 'Agr': 'Agriculture', 'Man': 'Manufacturing'}
WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560
4

Here is another solution using SSPipe library.

Note that all functions used here like map, filter, str, len, enumerate, str.format, str.join except p and px are builtin python functions and are you don't need to learn about new function names and API. The only thing you need is the p wrapper and px placeholder:

from sspipe import p, px
a = (
    range(1, 50+1)
    | p(map, px * 4)
    | p(filter, px <= 170)
    | p(filter, p(str) | p(len) | (px == 2))
    | p(filter, px % 20 == 0)
    | p(enumerate)
    | p(map, p('Result[{0[0]}]={0[1]}'.format)) 
    | p('  ..  '.join)
)
print(a)
mhsekhavat
  • 977
  • 13
  • 18
  • Looks great! Wish I could have found these libraries 16 months ago: almost all are younger than that. – WestCoastProjects Jun 07 '19 at 11:59
  • 1
    oh and the placeholders are nice instead of reading `lambda, lambda, lambda` – WestCoastProjects Jun 07 '19 at 12:00
  • This is also a bit more concise than the (already v good!) `scalaps` since do not need the `ScSeq` out front. I am interested to see if your library also supports `dict`s: looking now.. – WestCoastProjects Jun 07 '19 at 12:02
  • 1
    This is v good. I was a contributor to `julien/pipe` 16 months back but did find usability a b it daunting. This is much simpler to use and also you added support for `numpy` and `pandas`. _Thank you_ . btw is it possible to simplify the `pandas` expressions to not require repeating `px[]` inside the expression - e.g. `px[px[fielda]==aval] and px[fieldb]==bval]` ? – WestCoastProjects Jun 07 '19 at 12:15
  • @javadba I'm glad you liked the library! It is new-born and you can help it grow by contributing/starring/sharing it! In order to reduce the `px` hassle, you can use the pandas API `my_dataframe | px.query('fielda == aval and fieldb == bval')` which calls https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html however, I myself hate strings and prefer the `px` hassle! – mhsekhavat Jun 07 '19 at 15:24
  • what do you mean "p and px are builtin functions" ? – WestCoastProjects Aug 03 '20 at 10:57
  • I think you missed the "except" word in my sentence. I told functions like map, filter, str, len, enumerate, str.format, str.join are builtin functions. – mhsekhavat Aug 05 '20 at 01:01
3

Even though it is not considered Pythonic, Python still contains map and filter and reduce can be imported from functools. Using these functions it is possible to generate the same pipe line as the one you have in scala, albeit it will be written in the opposite direction (from right to left, rather than left to right):

from functools import reduce
a = reduce(lambda f,s: f'{f} .. {s}',
    map(lambda nx: f'Result[{nx[0]}]: {nx[1]}',
    enumerate(
    filter(lambda n: n%20 == 0,
    filter(lambda n: len(str(n)) == 2,
    filter(lambda n: n <= 170,
    map(lambda n: n*4,
    range(1,51))))))))

Now, this is lazy, in the sense that it will let each value be transported through the whole pipe before the next is being evaluated. However, since all values are consumed by the final reduce call, this is not seen.

It is possible to generate a list from each map or filter object in each step:

a = reduce(lambda f,s: f'{f} .. {s}',
    list(map(lambda nx: f'Result[{nx[0]}]: {nx[1]}',
    list(enumerate(
    list(filter(lambda n: n%20 == 0,
    list(filter(lambda n: len(str(n)) == 2,
    list(filter(lambda n: n <= 170,
    list(map(lambda n: n*4,
    list(range(1,51)))))))))))))))

Both of these expressions, especially the second one, are quite verbose, so I don't know if I would recommend them. I would recommend using list/generator comprehensions and a few intermediate varaiables:

n4 = [n*4 for n in range(1,51)]
fn4 = [n for n in n4 if n <= 170 if len(str(n))==2 if n%20 == 0]
rfn4 = [f'Result[{n}]: {x}' for n, x in enumerate(fn4)]
a = ' .. '.join(rfn4)

Another benefit with this approach (for you, at least) is that with this approach you will keep the order of opeations that is found in scala. It will also, as long as we do list comprehension (as shown) be non-lazy evaluated. If we want lazy evaluation, it is possible to do generator comprehension instead:

n4 = (n*4 for n in range(1,51))
fn4 = (n for n in n4 if n <= 170 if len(str(n))==2 if n%20 == 0)
rfn4 = (f'Result[{n}]: {x}' for n, x in enumerate(fn4))
a = ' .. '.join(rfn4)

Thus, the only difference is that we use parantheses instead of brackets. But, as stated before; since all data is consumed, the difference in this example is rather minimal.

JohanL
  • 6,671
  • 1
  • 12
  • 26
  • 1
    This is awful. If you want code like this, better not to write in Python (where you get all the disadvantages, and none of the advantages). Use a functional language in the first place. – wim Feb 27 '18 at 21:42
  • @wim Yes, but that is not what is asked for, is it? Also, what is wrong with the latter part of the answer? The way I propose to write it. – JohanL Feb 27 '18 at 21:44
  • Yes, I like the generator pipelines and would upvote this answer if you deleted all the lambdas. – wim Feb 27 '18 at 21:45
  • I need to use `numpy` and `scientific python` . Those are *great* stuff. But I'm not going to do 8-way nested `for x in y ..` to do data pipelines. – WestCoastProjects Feb 27 '18 at 21:45
  • @wim But that would not answer the OP question, though. – JohanL Feb 27 '18 at 21:46
  • I disagree with the downvoter: this is a nice treatise on the most similar built-in approach to answering the OP – WestCoastProjects Feb 27 '18 at 21:51
1

There's a library that already does exactly what you are looking for, i.e. the fluid syntaxt, lazy evaluation and the order of operations is the same with how it's written, as well as many more other good stuff like multiprocess or multithreading Map/Reduce. It's named pyxtension and it's prod ready and maintained on PyPi. Your code would be rewritten in this form:

from pyxtension.streams import stream
a = stream(range(1, 50)) \
    .map(lambda _: _ * 4) \
    .filter(lambda _: _ <= 170) \
    .filter(lambda _: len(str(_)) == 2) \
    .filter(lambda _: _ % 20 == 0) \
    .enumerate() \
    .map(lambda n_x: f"Result[{n_x[0]}]={n_x[1]}") \
    .mkString("  .. ")
>  a: 'Result[0]=20  .. Result[1]=40  .. Result[2]=60  .. Result[3]=80'

Replace map with mpmap for multiprocessed map, or fastmap for multithreaded map.

asu
  • 539
  • 6
  • 15
  • 1
    v nice i'll take a look. I don't think there's any way to get around that lambdas in python can only be expressions - and can not have any statements : that really puts python behind other languages. But we do what we can. – WestCoastProjects Jun 25 '20 at 23:59
  • I am getting this error `AttributeError: 'int' object has no attribute 'toString' ` – WestCoastProjects Jun 26 '20 at 00:29
  • Any ideas about that error I mentioned? `---> 33 return cls(lambda *args, **kwargs: f(g(*args, **kwargs))) AttributeError: 'int' object has no attribute 'toString'` – WestCoastProjects Aug 01 '20 at 14:20
  • @javadba yes, you are right, the code wasn't converted 100% from Scala, as toString method doesn't exist in Py, as well as lambda doesn't automatically unpack tuple to arguments. Now I've fixed, tested and is perfectly working. Check now pls. – asu Aug 03 '20 at 08:54
  • actually the results are transposed `Result[20]=0 .. Result[40]=1 .. Result[60]=2 .. Result[80]=3` – WestCoastProjects Aug 03 '20 at 11:04
  • It is correct now: and I'll add a comment about this library in my comprehensive answer. btw I had already upvoted in June so can't upvote again ;) – WestCoastProjects Aug 03 '20 at 11:35