0

Trying to load a dataframe from a stream using requests.get(stream=True) and iter_lines but its not working right.

Tried the sample solution as well from here, but doesnt seem to be working for me.

Here is the sample solution to populate the dataframe using a generator of string (csv representation):

import pandas as pd

print(pd.__version__)

def gen():
    lines = [
        'col1,col2\n',
        'foo,bar\n',
        'foo,baz\n',
        'bar,baz\n'
    ]
    for line in lines:
        yield line

class Reader(object):
    def __init__(self, g):
        self.g = g
    def read(self, n=0):
        try:
            return next(self.g)
        except StopIteration:
            return ''

df = pd.read_csv(Reader(gen()))

My output is:

1.0.5
Traceback (most recent call last):
  ...
    df = pd.read_csv(Reader(gen()))
  File "\.virtualenvs\TEST-gzvLffvD\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "\.virtualenvs\TEST-gzvLffvD\lib\site-packages\pandas\io\parsers.py", line 431, in _read
    filepath_or_buffer, encoding, compression
  File "\.virtualenvs\TEST-gzvLffvD\lib\site-packages\pandas\io\common.py", line 200, in get_filepath_or_buffer
    raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class 'Reader'>

If i just pass the generator as is:

df = pd.read_csv(gen())

I get the error:

ValueError: Invalid file path or buffer object type: <class 'generator'>

How do I get this working? To basically stream data to dataframe from a generator of csv string representaion.

user1179317
  • 2,693
  • 3
  • 34
  • 62

1 Answers1

0

I think you're looking for the pd.DataFrame constructor which excepts a tuple generator:

import pandas as pd

print(pd.__version__)

def gen():
    lines = [
        'col1,col2\n',
        'foo,bar\n',
        'foo,baz\n',
        'bar,baz\n'
    ]
    for line in lines:
        yield tuple(line.strip().split(","))

df = pd.DataFrame(gen())
      0     1
0  col1  col2
1   foo   bar
2   foo   baz
3   bar   baz

^ Not the exact output you want, so you'll have to modify the solution to get the right column labels.

Jay Mody
  • 3,727
  • 1
  • 11
  • 27
  • Yea the document for pd.read_csv should be able to take an object that has 'read' method, not sure why its not working. Did you get the same error with the sample code? – user1179317 Sep 11 '20 at 16:39