2

In the following program

I want to access/pipe the data from one function in the downstream function.

With the python code something like below:

def main():
data1, data2, data3 = read_file()
do_calc(data1, data2, data3)   

def read_file():
    data1 = ""
    data2 = ""
    data3 = ""

    file1 = open('file1.txt', 'r+').read()
    for line in file1
        do something....
        data1 += calculated_values

    file2 = open('file2.txt', 'r+').read()
    for line in file1
        do something...
        data2 += calculated_values    

    file1 = open('file1.txt', 'r+').read()
    for line in file1
        do something...
        data3 += calculated_values

    return data1, data2, data3

def do_calc(data1, data2, data3):
    d1_frame = pd.read_table(data1, sep='\t')
    d2_frame = pd.read_table(data2, sep='\t')
    d3_frame = pd.read_table(data3, sep='\t')

    all_data = [d1_frame, d2_frame, d3_frame]

main()

What is wrong with the given code? looks like panda isn't able to read the input files properly but is printing the values from data1, 2 and 3 to the screen.

read_hdf seems to read the file but not properly. Is there a way to read the data returned from function directly into pandas (without writing/reading into a file).

Error message:

Traceback (most recent call last):

  File "calc.py", line 757, in <module>

    main()

  File "calc.py", line 137, in main

    merge_tables(pop1_freq_table, pop2_freq_table, f1_freq_table)

  File "calc.py", line 373, in merge_tables

    df1 = pd.read_table(pop1_freq_table, sep='\t')

  File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 645, in parser_f

    return _read(filepath_or_buffer, kwds)

  File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 388, in _read

    parser = TextFileReader(filepath_or_buffer, **kwds)

  File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 729, in __init__

    self._make_engine(self.engine)

  File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 922, in _make_engine

    self._engine = CParserWrapper(self.f, **self.options)

  File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 1389, in __init__

    self._reader = _parser.TextReader(src, **kwds)

  File "pandas/parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4019)

  File "pandas/parser.pyx", line 665, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:7967)

FileNotFoundError: File b'0.667,0.333\n2\t15800126\tT\tT,A\t0.667,0.333\n2\t15800193\tC\tC,T\t0.667,0.333\n2\t15800244\tT\tT,C\......

I would appreciate any explanation.

everestial007
  • 6,665
  • 7
  • 32
  • 72
  • For piping data returned from function to another I would suggest using decorators. Take a look at this http://stackoverflow.com/questions/739654/how-to-make-a-chain-of-function-decorators-in-python?rq=1 – Chenna V Dec 14 '16 at 19:38

2 Answers2

3

read_table is expecting a file as input, but you pass a string of data instead of a string with the file location. You could write your data to a file and then read from that file. Assuming the string is already properly formatted:

filename = 'tab_separated_file_1.dat'
with open(filename, 'w') as f:
    f.write(data1)

df1 = pd.read_table(filename, sep='\t')
Alex
  • 12,078
  • 6
  • 64
  • 74
  • I am able to write the data to a file and read it for the downstream function. But, this is not what I want. I want to read the file from `stdout` which became not possible for me then I resorted to piping data from one function to another. Any other suggestions with using stdout?? Thanks – everestial007 Dec 14 '16 at 17:18
  • 1
    It look like nrlakin provided a solution using StringIO. – Alex Dec 14 '16 at 19:20
  • Yes, thats what I had been looking for. – everestial007 Dec 14 '16 at 19:43
2

As other answers have said, read_table expects a file for input--or, more accurately, a "file-like object". You can use a StringIO object to wrap the data1, data2, and data3 strings in an object that will "behave" like a file when fed to pandas with a few tweaks to your code:

#Import StringIO...
# python 2
from StringIO import StringIO
# python 3
from io import StringIO

def main():
    data1, data2, data3 = read_file()
    do_calc(data1, data2, data3)   

def read_file():
    # use StringIO objects instead of strings...
    data1 = StringIO()
    data2 = StringIO()
    data3 = StringIO()

    file1 = open('file1.txt', 'r+').read()
    for line in file1
        do something....
        # note that " += " became ".write()"
        data1.write(calculated_values)

    file2 = open('file2.txt', 'r+').read()
    for line in file1
        do something...
        data2.write(calculated_values)

    file1 = open('file1.txt', 'r+').read()
    for line in file1
        do something...
        data3.write(calculated_values)

    return data1, data2, data3

def do_calc(data1, data2, data3):
    d1_frame = pd.read_table(data1, sep='\t')
    d2_frame = pd.read_table(data2, sep='\t')
    d3_frame = pd.read_table(data3, sep='\t')

    all_data = [d1_frame, d2_frame, d3_frame]

main()
nrlakin
  • 5,234
  • 3
  • 16
  • 27
  • You are awesome. Thanks much ! – everestial007 Dec 14 '16 at 19:23
  • Also, you can assign StringIo when reading the table. I just added `StringIO` when reading the files instead of when creating a null variable. The reason is that I also want to write the `data1, 2, 3` to a file and be able to access it in another format elsewhere in the code. – everestial007 Dec 14 '16 at 19:24
  • @everestial007 no problem--I've had the same issue. – nrlakin Dec 14 '16 at 19:25
  • @everestial007 that's right--you can initialize from the file contents, or do some line-by-line processing where it says "do something..." – nrlakin Dec 14 '16 at 19:26