2

Does anyone know how to stream the output of a shell command (a chain of csvkit tool invocations) into a jupyter notebook cell, but specifically into a Pandas DataFrame. From the cell's content it would look something like this:

 output = !find /path -name "*.csv" | csvstack ... | csvgrep ... 
 df = DataFrame.read_csv(output)

only the above isn't really work. The output of the shell is very large millions of rows, which Pandas can handle just fine, but I don't want the output to be loaded into memory in its entirety as a string.

I'm looking for a piping/streaming solution that allows Pandas to read the output as it comes.

Dmitry B.
  • 9,107
  • 3
  • 43
  • 64
  • .@Dmitry Read csv into clipboard, then pd. read_clipboard? As, far a know pandas does not handle streams.. – Merlin Jun 12 '16 at 20:09

3 Answers3

4

I figured out a workaround. Though not actually piping, but it saves some disk I/O expense:

import io
import pandas as pd
output = !(your Unix command)
df = pd.read_table(io.StringIO(output.n))
Qiyun Zhu
  • 125
  • 1
  • 8
  • For those needing to distinguish stdout and stderr, the `%%capture` cell magics offers a nice option. See [here](http://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-capture) and [here](http://ipython.readthedocs.io/en/stable/api/generated/IPython.utils.capture.html). Looks akin to oLas's answer [here](https://stackoverflow.com/a/24776049/8508004) using `%%bash` cell magics but offers another route. – Wayne May 02 '18 at 14:44
0

IIUC you can do it by letting pandas read from STDIN:

Python script:

import sys
import pandas as pd
df = pd.read_csv(sys.stdin)
print(df)

Shell command line:

!find /path -name "*.csv" | csvstack ... | csvgrep ... | python our_pyscript.py

please pay attention at the last part: | python our_pyscript.py

You may also want to check this

Community
  • 1
  • 1
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
0

Perhaps "named pipes" would be useful in your case.

In shell:

mkfifo MYFIFO
head myfile.txt > MYFIFO

In notebook:

with open('MYFIFO', 'rt') as f:
    print(f.readline())

A few good internet searches should give you the information you need to use named pipes safely and effectively. Good luck!

Gordon Bean
  • 4,272
  • 1
  • 32
  • 47