2

I would like to load a csv into a pandas dataframe and the data is available on a remote server.

The goal is to read it directly to a dataframe without writing the data to disk.

url = https://dev.azure.com/tankerkoenig/tankerkoenig-data/_git/tankerkoenig-data?path=/prices/2022/07/2022-07-01-prices.csv

I am thinking about an approach like this:

import requests
import pandas as pd

r = requests.get(url)
df = pd.read_csv(r.content)

Does anyone know if something like this is possible?

Daniel
  • 400
  • 2
  • 10

1 Answers1

0

You can call read_csv method directly.

import pandas as pd

url="https://dev.azure.com/tankerkoenig/tankerkoenig-data/_git/tankerkoenig-data?path=/prices/2022/07/2022-07-01-prices.csv"
df=pd.read_csv(url)

OR if you want the text in string and read

from io import StringIO

import pandas as pd
import requests
url='https://dev.azure.com/tankerkoenig/tankerkoenig-data/_git/tankerkoenig-data?path=/prices/2022/07/2022-07-01-prices.csv'
s=requests.get(url).text

df=pd.read_csv(StringIO(s))
Anon
  • 2,608
  • 6
  • 26
  • 38
  • Solution 1: I am receiving an error called ```ParserError: Error tokenizing data. C error: Expected 1 fields in line 8, saw 2``` Solution 2: Getting an error called ```AttributeError: 'str' object has no attribute 'decode'``` – Daniel Jul 18 '22 at 10:05
  • @Daniel, can you try now? – Anon Jul 18 '22 at 10:07
  • When trying out solution 2 I am receiving the same error when executing solution 1. It seems the data is received but the schema cannot be read properly. – Daniel Jul 18 '22 at 10:08
  • 1
    Ok, then if you can receive the data, the issue is not with authentication. What fields do you need from this data? – Anon Jul 18 '22 at 10:10
  • date, station_uuid, diesel, e5, e10, dieselchange, e5change and e10change. – Daniel Jul 18 '22 at 10:11
  • 1
    Ok, so all fields. I suspect date is causing some issue. Let me check – Anon Jul 18 '22 at 10:12
  • When using ```sep='delimiter'```I can see that only html data is received... – Daniel Jul 18 '22 at 10:17
  • I got connection refused error. So I suspect it is something with the URL. I checked the data and it seems to be ok. So issue si with getting the data. Is it possible to download he file locally and then process? raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPSConnectionPool(host='dev.azure.com', port=443): Max retries exceeded with url: /tankerkoenig/tankerkoenig-data/_git/tankerkoenig-data?path=/prices/2022/07/2022-07-01-prices.csv ( – Anon Jul 18 '22 at 10:23
  • 1
    I think you receiving html data is proof that csv is not accessible. – Anon Jul 18 '22 at 10:25