1

Following is the StringIO object value

DebugPoint csv_log_stream type

csv_log_stream.getvalue()

Raw Output

'"2022-06-04 12:02:40,248",azure_functions_worker,INFO,"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 5af2b92f-7e82-4515-89c5-846737ba3e60,function Name: ProcessWebSaleExportFilesInRSBlobStorage"\n"2022-06-04 12:02:40,252",azure_functions_worker,INFO,"Received FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 0d487a72-2ded-487e-b269-2ac913e3fcebfunction Name: ReadIntegrationInterfaceConfiguration"\n"2022-06-04 12:02:40,259",azure_functions_worker,INFO,"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 0d487a72-2ded-487e-b269-2ac913e3fceb,function Name: ReadIntegrationInterfaceConfiguration"\n"2022-06-04 12:02:40,261",azure_functions_worker,INFO,"Received FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: c42740c8-21fd-4435-a5cb-7b9f74dc7225function Name: SaveLogsToRSBlobStorage"\n"2022-06-04 12:02:40,265",azure_functions_worker,INFO,"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: c42740c8-21fd-4435-a5cb-7b9f74dc7225,function Name: SaveLogsToRSBlobStorage"\n"2022-06-04 12:02:43,000",azure_functions_worker,INFO,"Received FunctionInvocationRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 5af2b92f-7e82-4515-89c5-846737ba3e60, function name: ProcessWebSaleExportFilesInRSBlobStorage, invocation ID: c42bf678-d155-4859-a71a-b0108645080d, function type: sync, sync threadpool max workers: 1000"\n"2022-06-04 12:02:43,007",root,INFO,Python HTTP trigger :: ProcessWebSaleExportFilesInRSBlobStorage function processed a request.\n"2022-06-04 12:02:43,008",root,INFO,Processing Request object started for the desired parameters.\n"2022-06-04 12:02:43,009",root,INFO,Processing Request object completed for the desired parameters.\n"2022-06-04 12:02:43,010",root,INFO,Processing Request object started for the desired parameters.\n"2022-06-04 12:02:43,011",root,INFO,Processing Request object completed for the desired parameters.\n"2022-06-04 12:02:43,041",azure.core.pipeline.policies.http_logging_policy,INFO,"Request URL: \'https://koxdsrssa.blob.core.windows.net/koxds-export?restype=REDACTED&comp=REDACTED&prefix=REDACTED&st=REDACTED&se=REDACTED&sp=REDACTED&sv=REDACTED&sr=REDACTED&sig=REDACTED\'\nRequest method: \'GET\'\nRequest headers:\n    \'x-ms-version\': \'REDACTED\'\n    \'Accept\': \'application/xml\'\n    \'User-Agent\': \'azsdk-python-storage-blob/12.12.0 Python/3.8.12 (Windows-10-10.0.19044-SP0)\'\n    \'x-ms-date\': \'REDACTED\'\n    \'x-ms-client-request-id\': \'79b647c5-e3ed-11ec-8c08-48a4728e3a8b\'\nNo body was attached to the request"\n"2022-06-04 12:02:43,564",azure.core.pipeline.policies.http_logging_policy,INFO,"Response status: 200\nResponse headers:\n    \'Transfer-Encoding\': \'chunked\'\n    \'Content-Type\': \'application/xml\'\n    \'Server\': \'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0\'\n    \'x-ms-request-id\': \'4a6cab6b-e01e-002d-5ffa-77769c000000\'\n    \'x-ms-client-request-id\': \'79b647c5-e3ed-11ec-8c08-48a4728e3a8b\'\n    \'x-ms-version\': \'REDACTED\'\n    \'Access-Control-Expose-Headers\': \'REDACTED\'\n    \'Access-Control-Allow-Origin\': \'REDACTED\'\n    \'Date\': \'Sat, 04 Jun 2022 10:02:43 GMT\'"\n"2022-06-04 12:02:44,070",azure.core.pipeline.policies.http_logging_policy,INFO,"Request URL: \'https://koxdsrssa.blob.core.windows.net/koxds-export/WebSale/Test/2022_06_03_20_13_23_782-0500_c841f873-9a12-4402-a164-5819cbcddc3e_Test_0.json?st=REDACTED&se=REDACTED&sp=REDACTED&sv=REDACTED&sr=REDACTED&sig=REDACTED\'\nRequest method: \'GET\'\nRequest headers:\n    \'x-ms-range\': \'REDACTED\'\n    \'x-ms-version\': \'REDACTED\'\n    \'Accept\': \'application/xml\'\n    \'User-Agent\': \'azsdk-python-storage-blob/12.12.0 Python/3.8.12 (Windows-10-10.0.19044-SP0)\'\n    \'x-ms-date\': \'REDACTED\'\n    \'x-ms-client-request-id\': \'7a5398fe-e3ed-11ec-a414-48a4728e3a8b\'\nNo body was attached to the request"\n"2022-06-04 12:02:44,226",azure.core.pipeline.policies.http_logging_policy,INFO,"Response status: 206\nResponse headers:\n    \'Content-Length\': \'8337358\'\n    \'Content-Type\': \'application/json\'\n    \'Content-Range\': \'REDACTED\'\n    \'Last-Modified\': \'Sat, 04 Jun 2022 01:14:56 GMT\'\n    \'Accept-Ranges\': \'REDACTED\'\n    \'ETag\': \'""0x8DA45C7A2F73E96""\'\n    \'Server\': \'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0\'\n    \'x-ms-request-id\': \'4a6cad87-e01e-002d-3cfa-77769c000000\'\n    \'x-ms-client-request-id\': \'7a5398fe-e3ed-11ec-a414-48a4728e3a8b\'\n    \'x-ms-version\': \'REDACTED\'\n    \'x-ms-creation-time\': \'REDACTED\'\n    \'x-ms-blob-content-md5\': \'REDACTED\'\n    \'x-ms-lease-status\': \'REDACTED\'\n    \'x-ms-lease-state\': \'REDACTED\'\n    \'x-ms-blob-type\': \'REDACTED\'\n    \'Content-Disposition\': \'REDACTED\'\n    \'x-ms-server-encrypted\': \'REDACTED\'\n    \'Access-Control-Expose-Headers\': \'REDACTED\'\n    \'Access-Control-Allow-Origin\': \'REDACTED\'\n    \'Date\': \'Sat, 04 Jun 2022 10:02:44 GMT\'"\n"2022-06-04 12:09:07,090",root,INFO,Total time taken: 6 minutes and 24 seconds\n'

Reading from StringIO to pandas.DataFrame:

df_logs = pd.read_csv(csv_log_stream, header=None)

Output:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "ProjectDir\\.venv\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\readers.py", line 680, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\readers.py", line 575, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\readers.py", line 933, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\readers.py", line 1235, in _make_engine
    return mapping[engine](f, **self.options)
  File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 75, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas\_libs\parsers.pyx", line 551, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

The above attempt to read DataFrame from StringIO throws error. So, I did the following and am getting empty DataFrame.

df_logs = pd.read_csv(csv_log_stream, names=["Timestamp", "LogName", "LogLevel", "LogMessage"])
print(df_logs)

Output:

Empty DataFrame
Columns: [Timestamp, LogName, LogLevel, LogMessage]
Index: []

I am not able to understand what am I doing wrong. My input StringIO value seems to be correct. What am I missing?!!

Benison Sam
  • 2,755
  • 7
  • 30
  • 40

1 Answers1

1

It might be that you are calling pd.read_csv on the string which StringIO.getvalue() outputs instead of the StringIO object itself:

import pandas as pd
from io import StringIO

file = StringIO(
    "\"2022-06-04 12:02:40,248\",azure_functions_worker,INFO,\"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 5af2b92f-7e82-4515-89c5-846737ba3e60,function Name: ProcessWebSaleExportFilesInRSBlobStorage\"\n\"2022-06-04 12:02:40,252\",azure_functions_worker,INFO,\"Received FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 0d487a72-2ded-487e-b269-2ac913e3fcebfunction Name: ReadIntegrationInterfaceConfiguration\"\n\"2022-06-04 12:02:40,259\",azure_functions_worker,INFO,\"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 0d487a72-2ded-487e-b269-2ac913e3fceb,function Name: ReadIntegrationInterfaceConfiguration\"\n\"2022-06-04 12:02:40,261\",azure_functions_worker,INFO,\"Received FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: c42740c8-21fd-4435-a5cb-7b9f74dc7225function Name: SaveLogsToRSBlobStorage\"\n\"2022-06-04 12:02:40,265\",azure_functions_worker,INFO,\"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: c42740c8-21fd-4435-a5cb-7b9f74dc7225,function Name: SaveLogsToRSBlobStorage\"\n\"2022-06-04 12:02:43,000\",azure_functions_worker,INFO,\"Received FunctionInvocationRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 5af2b92f-7e82-4515-89c5-846737ba3e60, function name: ProcessWebSaleExportFilesInRSBlobStorage, invocation ID: c42bf678-d155-4859-a71a-b0108645080d, function type: sync, sync threadpool max workers: 1000\"\n\"2022-06-04 12:02:43,007\",root,INFO,Python HTTP trigger :: ProcessWebSaleExportFilesInRSBlobStorage function processed a request.\n\"2022-06-04 12:02:43,008\",root,INFO,Processing Request object started for the desired parameters.\n\"2022-06-04 12:02:43,009\",root,INFO,Processing Request object completed for the desired parameters.\n\"2022-06-04 12:02:43,010\",root,INFO,Processing Request object started for the desired parameters.\n\"2022-06-04 12:02:43,011\",root,INFO,Processing Request object completed for the desired parameters.\n\"2022-06-04 12:02:43,041\",azure.core.pipeline.policies.http_logging_policy,INFO,\"Request URL: 'https://koxdsrssa.blob.core.windows.net/koxds-export?restype=REDACTED&comp=REDACTED&prefix=REDACTED&st=REDACTED&se=REDACTED&sp=REDACTED&sv=REDACTED&sr=REDACTED&sig=REDACTED'\nRequest method: 'GET'\nRequest headers:\n    'x-ms-version': 'REDACTED'\n    'Accept': 'application/xml'\n    'User-Agent': 'azsdk-python-storage-blob/12.12.0 Python/3.8.12 (Windows-10-10.0.19044-SP0)'\n    'x-ms-date': 'REDACTED'\n    'x-ms-client-request-id': '79b647c5-e3ed-11ec-8c08-48a4728e3a8b'\nNo body was attached to the request\"\n\"2022-06-04 12:02:43,564\",azure.core.pipeline.policies.http_logging_policy,INFO,\"Response status: 200\nResponse headers:\n    'Transfer-Encoding': 'chunked'\n    'Content-Type': 'application/xml'\n    'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'\n    'x-ms-request-id': '4a6cab6b-e01e-002d-5ffa-77769c000000'\n    'x-ms-client-request-id': '79b647c5-e3ed-11ec-8c08-48a4728e3a8b'\n    'x-ms-version': 'REDACTED'\n    'Access-Control-Expose-Headers': 'REDACTED'\n    'Access-Control-Allow-Origin': 'REDACTED'\n    'Date': 'Sat, 04 Jun 2022 10:02:43 GMT'\"\n\"2022-06-04 12:02:44,070\",azure.core.pipeline.policies.http_logging_policy,INFO,\"Request URL: 'https://koxdsrssa.blob.core.windows.net/koxds-export/WebSale/Test/2022_06_03_20_13_23_782-0500_c841f873-9a12-4402-a164-5819cbcddc3e_Test_0.json?st=REDACTED&se=REDACTED&sp=REDACTED&sv=REDACTED&sr=REDACTED&sig=REDACTED'\nRequest method: 'GET'\nRequest headers:\n    'x-ms-range': 'REDACTED'\n    'x-ms-version': 'REDACTED'\n    'Accept': 'application/xml'\n    'User-Agent': 'azsdk-python-storage-blob/12.12.0 Python/3.8.12 (Windows-10-10.0.19044-SP0)'\n    'x-ms-date': 'REDACTED'\n    'x-ms-client-request-id': '7a5398fe-e3ed-11ec-a414-48a4728e3a8b'\nNo body was attached to the request\"\n\"2022-06-04 12:02:44,226\",azure.core.pipeline.policies.http_logging_policy,INFO,\"Response status: 206\nResponse headers:\n    'Content-Length': '8337358'\n    'Content-Type': 'application/json'\n    'Content-Range': 'REDACTED'\n    'Last-Modified': 'Sat, 04 Jun 2022 01:14:56 GMT'\n    'Accept-Ranges': 'REDACTED'\n    'ETag': '\"\"0x8DA45C7A2F73E96\"\"'\n    'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'\n    'x-ms-request-id': '4a6cad87-e01e-002d-3cfa-77769c000000'\n    'x-ms-client-request-id': '7a5398fe-e3ed-11ec-a414-48a4728e3a8b'\n    'x-ms-version': 'REDACTED'\n    'x-ms-creation-time': 'REDACTED'\n    'x-ms-blob-content-md5': 'REDACTED'\n    'x-ms-lease-status': 'REDACTED'\n    'x-ms-lease-state': 'REDACTED'\n    'x-ms-blob-type': 'REDACTED'\n    'Content-Disposition': 'REDACTED'\n    'x-ms-server-encrypted': 'REDACTED'\n    'Access-Control-Expose-Headers': 'REDACTED'\n    'Access-Control-Allow-Origin': 'REDACTED'\n    'Date': 'Sat, 04 Jun 2022 10:02:44 GMT'\"\n\"2022-06-04 12:09:07,090\",root,INFO,Total time taken: 6 minutes and 24 seconds\n"
)


df = pd.read_csv(
    file,
    names=[
        "Timestamp",
        "LogName",
        "LogLevel",
        "LogMessage",
    ],
)
print(df)
# Output
                  Timestamp                                           LogName  \
0   2022-06-04 12:02:40,248                            azure_functions_worker
1   2022-06-04 12:02:40,252                            azure_functions_worker
2   2022-06-04 12:02:40,259                            azure_functions_worker
3   2022-06-04 12:02:40,261                            azure_functions_worker
4   2022-06-04 12:02:40,265                            azure_functions_worker
5   2022-06-04 12:02:43,000                            azure_functions_worker
6   2022-06-04 12:02:43,007                                              root
7   2022-06-04 12:02:43,008                                              root
8   2022-06-04 12:02:43,009                                              root
9   2022-06-04 12:02:43,010                                              root
10  2022-06-04 12:02:43,011                                              root
11  2022-06-04 12:02:43,041  azure.core.pipeline.policies.http_logging_policy
12  2022-06-04 12:02:43,564  azure.core.pipeline.policies.http_logging_policy
13  2022-06-04 12:02:44,070  azure.core.pipeline.policies.http_logging_policy
14  2022-06-04 12:02:44,226  azure.core.pipeline.policies.http_logging_policy
15  2022-06-04 12:09:07,090                                              root

   LogLevel                                         LogMessage
0      INFO  Successfully processed FunctionLoadRequest, re...
1      INFO  Received FunctionLoadRequest, request ID: 5bc6...
2      INFO  Successfully processed FunctionLoadRequest, re...
3      INFO  Received FunctionLoadRequest, request ID: 5bc6...
4      INFO  Successfully processed FunctionLoadRequest, re...
5      INFO  Received FunctionInvocationRequest, request ID...
6      INFO  Python HTTP trigger :: ProcessWebSaleExportFil...
7      INFO  Processing Request object started for the desi...
8      INFO  Processing Request object completed for the de...
9      INFO  Processing Request object started for the desi...
10     INFO  Processing Request object completed for the de...
11     INFO  Request URL: 'https://koxdsrssa.blob.core.wind...
12     INFO  Response status: 200\nResponse headers:\n    '...
13     INFO  Request URL: 'https://koxdsrssa.blob.core.wind...
14     INFO  Response status: 206\nResponse headers:\n    '...
15     INFO         Total time taken: 6 minutes and 24 seconds

Be careful with StringIO objects when accessing their content dynamically, which is not the same as playing with a raw string.

Here is an example with the same "file" object:

file.seek(20000)  # Change the stream position to the given byte offset.

df = pd.read_csv(
    file,
    names=[
        "Timestamp",
        "LogName",
        "LogLevel",
        "LogMessage",
    ],
)

print(df)
# Output
Empty DataFrame
Columns: [Timestamp, LogName, LogLevel, LogMessage]
Index: []

Whereas:

file.seek(0)  # Change the stream position to the beginning of the file

df = pd.read_csv(
    file,
    names=[
        "Timestamp",
        "LogName",
        "LogLevel",
        "LogMessage",
    ],
)

print(df)
# Ouput
   LogLevel                                         LogMessage  
0      INFO  Successfully processed FunctionLoadRequest, re...  
1      INFO  Received FunctionLoadRequest, request ID: 5bc6...  
2      INFO  Successfully processed FunctionLoadRequest, re...  
3      INFO  Received FunctionLoadRequest, request ID: 5bc6...  
4      INFO  Successfully processed FunctionLoadRequest, re...  
5      INFO  Received FunctionInvocationRequest, request ID...  
6      INFO  Python HTTP trigger :: ProcessWebSaleExportFil...  
7      INFO  Processing Request object started for the desi...  
8      INFO  Processing Request object completed for the de...  
9      INFO  Processing Request object started for the desi...  
10     INFO  Processing Request object completed for the de...  
11     INFO  Request URL: 'https://koxdsrssa.blob.core.wind...  
12     INFO  Response status: 200\nResponse headers:\n    '...  
13     INFO  Request URL: 'https://koxdsrssa.blob.core.wind...  
14     INFO  Response status: 206\nResponse headers:\n    '...  
15     INFO         Total time taken: 6 minutes and 24 seconds  
Laurent
  • 12,287
  • 7
  • 21
  • 37
  • I am using `csv_log_stream` `StringIO` object itself instead of the string value. But in your implementation I see that you have added three additional columns while reading it from `StringIO` i.e. "requestID", "functionID", "functionName". Any particular reason why you did that? Especially when you are initializing the `StringIO` object with only four columns. – Benison Sam Jun 04 '22 at 09:46
  • Unfortunately, your solution doesn't work for me. I have added the raw string value to my question. Could you please try your solution again by initializing with the raw string value? – Benison Sam Jun 04 '22 at 10:19
  • Seems like it's working for you with the raw string. It works for me also with the raw string but when executed as part of the main program, it doesn't. To ensure if I am actually working with `StringIO` object, I checked its type during a debug session and I have posted the screenshot from my IDE as well. Is there anything else I could try? – Benison Sam Jun 04 '22 at 13:03
  • Just to be sure, does `isinstance(csv_log_stream, StringIO)` returns True? Also, could you remove the "Printed Output" and "What did try?" of your post, which are not useful anymore, and replace them with what your main program now outputs and/or the error message it raises? – Laurent Jun 04 '22 at 13:15
  • No error message. Just that `pd.read_csv` returns an empty data frame. Maybe I can add some more detail to the quesion. – Benison Sam Jun 04 '22 at 13:17
  • Getting back an empty dataframe from pd.read_csv and no error message can only mean that the StringIO object you passed to it is empty (StringIO("")), so it seems that the issue resides elsewhere in your code, where csv_log_stream is assigned. – Laurent Jun 04 '22 at 13:27
  • Not really because the same place where I have the debug point in the screenshot, `csv_log_stream.getvalue()` returns the raw output. Which means it is not empty. – Benison Sam Jun 04 '22 at 13:31
  • I have updated my screenshot, better proof – Benison Sam Jun 04 '22 at 13:40
  • 1
    You are assuming that the object (csv_log_stream) you pass to pd.read_csv in your main code is not empty because calling `getvalue` on it returns a string, but you are disregarding the fact that getvalue consumes the StringIO object all at once, which is not the case of pd.read_csv when called with the StringIO object itself. Before creating the dataframe, you should call csv_log_stream.seek(0), see my updated answer. – Laurent Jun 04 '22 at 13:53
  • 1
    Thank you for adding the detailed answer :) – Benison Sam Jun 04 '22 at 13:57