Questions tagged [pipeline]

A pipeline is a sequence of functions (or the equivalent thereof), composed so that the output of one is input for the next, in order to create a compound transformation. Famously, a shell pipeline looks like "command | command2 | command3" (but use the tag "pipe" for this). It's also used in computer architecture to define a sequence of serial stages that execute in parallel over elements being fed into a pipe, in order to increase the overall throughput.

In a command line interface or shell, a pipeline uses the pipe operator ("|") to take output from one function or command and input it to another. This is done in a series like "command1 | function1 | command2". For questions related to the pipe operator use the tag.

In computer architecture, a pipeline is a process consisting of a sequence of stages that must be performed in serial order over each element passing the pipe, but may execute in parallel over the elements inside, such that the overall throughput does not depend on the length of the pipe. This is utilized by most CPUs' hardware to process instructions.

A similar technique is also done in software (software-pipelining) in order to optimize the parallelism of a given loop by reordering it to arrange data dependencies in a pipelined manner.

More broadly, "pipeline" is synonymous with "workflow."

See also:

5444 questions
160
votes
3 answers

How can you diff two pipelines in Bash?

How can you diff two pipelines without using temporary files in Bash? Say you have two command pipelines: foo | bar baz | quux And you want to find the diff in their outputs. One solution would obviously be to: foo | bar > /tmp/a baz | quux >…
Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
130
votes
16 answers

Functional pipes in python like %>% from R's magrittr

In R (thanks to magrittr) you can now perform operations with a more functional piping syntax via %>%. This means that instead of coding this: > as.Date("2014-01-01") > as.character((sqrt(12)^2) You could also do this: > "2014-01-01" %>% as.Date >…
cantdutchthis
  • 31,949
  • 17
  • 74
  • 114
85
votes
5 answers

Sklearn Pipeline: Get feature names after OneHotEncode In ColumnTransformer

I want to get feature names after I fit the pipeline. categorical_features = ['brand', 'category_name', 'sub_category'] categorical_transformer = Pipeline(steps=[ ('imputer', SimpleImputer(strategy='constant', fill_value='missing')), …
ResidentSleeper
  • 2,385
  • 2
  • 10
  • 20
85
votes
2 answers

What is the difference between pipeline and make_pipeline in scikit-learn?

I got this from the sklearn webpage: Pipeline: Pipeline of transforms with a final estimator Make_pipeline: Construct a Pipeline from the given estimators. This is a shorthand for the Pipeline constructor. But I still do not understand when I…
Aizzaac
  • 3,146
  • 8
  • 29
  • 61
73
votes
16 answers

How can I fix "kex_exchange_identification: read: Connection reset by peer"?

I want to copy data with scp in a GitLab pipeline using PRIVATE_KEY. The error is: kex_exchange_identification: read: Connection reset by peer Connection reset by x.x.x.x port 22 lost connection Pipeline log: $ mkdir -p ~/.ssh $ echo…
Mohammad Reza Mousavi
  • 894
  • 1
  • 10
  • 18
63
votes
1 answer

How to extract tar archive from stdin?

I have a large tar file I split. Is it possible to cat and untar the file using pipeline. Something like: cat largefile.tgz.aa largefile.tgz.ab | tar -xz instead of: cat largefile.tgz.aa largfile.tgz.ab > largefile.tgz tar -xzf largefile.tgz I…
Charlie
  • 931
  • 1
  • 6
  • 10
61
votes
2 answers

Getting model attributes from pipeline

I typically get PCA loadings like this: pca = PCA(n_components=2) X_t = pca.fit(X).transform(X) loadings = pca.components_ If I run PCA using a scikit-learn pipeline: from sklearn.pipeline import Pipeline pipeline = Pipeline(steps=[ …
lmart999
  • 6,671
  • 10
  • 29
  • 37
56
votes
2 answers

Sklearn How to Save a Model Created From a Pipeline and GridSearchCV Using Joblib or Pickle?

After identifying the best parameters using a pipeline and GridSearchCV, how do I pickle/joblib this process to re-use later? I see how to do this when it's a single classifier... import joblib joblib.dump(clf, 'filename.pkl') But how do I save…
Jarad
  • 17,409
  • 19
  • 95
  • 154
53
votes
4 answers

GitLab CI Pipeline on specific branch only

I'm trying to implement GitLab CI Pipelines to build and deploy an Angular app. In our project we have two general branches: master (for production only) and develop. For development we create feature/some-feature branches from develop branch. When…
ProximaCygni
  • 887
  • 1
  • 6
  • 9
43
votes
2 answers

How to insert Keras model into scikit-learn pipeline?

I'm using a scikit-learn custom pipeline (sklearn.pipeline.Pipeline) in conjunction with RandomizedSearchCV for hyper-parameter optimization. This works great. Now I would like to insert a keras model as a first step into the pipeline. The…
machinery
  • 5,972
  • 12
  • 67
  • 118
42
votes
2 answers

Gitlab pipeline - reports config contains unknown keys: cobertura

I'm not able run the gitlab pipeline due to this error Invalid CI config YAML file jobs:run tests:artifacts:reports config contains unknown keys: cobertura
Shashikumar KL
  • 1,007
  • 1
  • 10
  • 25
40
votes
6 answers

Pipe complete array-objects instead of array items one at a time?

How do you send the output from one CmdLet to the next one in a pipeline as a complete array-object instead of the individual items in the array one at a time? The problem - Generic description As can be seen in help for about_pipelines (help…
NoOneSpecial
  • 695
  • 1
  • 6
  • 16
37
votes
7 answers

Share gitlab-ci.yml between projects

We are thinking to move our ci from jenkins to gitlab. We have several projects that have the same build workflow. Right now we use a shared library where the pipelines are defined and the jenkinsfile inside the project only calls a method defined…
36
votes
1 answer

What exactly is a dual-issue processor?

I came across several references to the concept of a dual issue processor (I hope this even makes sense in a sentence). I can't find any explanation of what exactly dual issue is. Google gives me links to micro-controller specification, but the…
Phonon
  • 12,549
  • 13
  • 64
  • 114
36
votes
8 answers

How to properly pickle sklearn pipeline when using custom transformer

I am trying to pickle a sklearn machine-learning model, and load it in another project. The model is wrapped in pipeline that does feature encoding, scaling etc. The problem starts when i want to use self-written transformers in the pipeline for…
spiral
  • 381
  • 1
  • 3
  • 6
1
2 3
99 100