0

I have this piece of code:

csvData = str(request.GET.get('csvData'))

print("TYPE csvData", type(csvData))

print("CSV", csvData)

arr = np.array(csvData.splitlines())

print("ARR", arr)

print("TYPE arr", type(arr))

The csvData is a string.

The code prints the following output:

TYPE csvData

CSV "NUM,AIRLINE_ARR_ICAO,WAKE,SIBT,SOBT,PLANNED_TURNAROUND,DISTANCE_FROM_ORIGIN,DISTANCE_TO_TARGET\n1,AEA,H,2016-01-01 04:05:00,2016-01-01 14:10:00,605,9920.67,5776.89\n2,AEA,H,2016-01-01 04:25:00,2016-01-01 06:30:00,125.0,10060.80,483.93\n3,AVA,H,2016-01-01 05:05:00,2016-01-01 07:05:00,120.0,8033.86,8033.86\n4,IBE,H,2016-01-01 05:20:00,2016-01-01 10:40:00,320.0,0.00,8507.73\n5,IBE,H,2016-01-01 05:25:00,2016-01-01 10:50:00,325.0,6698.42,6698.42\n6,IBE,H,2016-01-01 05:30:00,2016-01-01 08:10:00,160.0,10699.06,1246.30\n7,IBE,H,2016-01-01 05:30:00,2016-01-01 11:00:00,330.0,9081.35,8033.86\n8,IBE,H,2016-01-01 05:40:00,2016-01-01 11:35:00,355.0,5776.89,8749.87\n9,ANE,M,2016-01-01 05:50:00,2016-01-01 14:50:00,540.0,284.73,284.73\n10,ETD,H,2016-01-01 06:35:00,2016-01-01 08:00:00,85.0,5647.10,5647.10\n11,IBS,M,2016-01-01 06:50:00,2016-01-01 08:00:00,70.0,547.36,1460.92\n12,IBE,H,2016-01-01 06:50:00,2016-01-01 10:35:00,225.0,6763.16,6763.16\n13,IBE,H,2016-01-01 06:50:00,2016-01-01 10:50:00,240.0,7120.40,7120.40\n14,IBE,H,2016-01-01 06:50:00,2016-01-01 10:55:00,245.0,7010.08,0.00\n15,QTR,H,2016-01-01 06:55:00,2016-01-01 08:30:00,95.0,5338.52,5338.52\n16,IBS,M,2016-01-01 07:00:00,2016-01-01 07:45:00,45.0,485.52,1721.09\n17,IBS,M,2016-01-01 07:00:00,2016-01-01 07:45:00,45.0,394.98,429.37\n18,ELY,M,2016-01-01 07:05:00,2016-01-01 08:30:00,85.0,3550.48,3550.48\n19,AAL,H,2016-01-01 07:05:00,2016-01-01 12:05:00,300.0,5925.61,5925.61\n20,TVF,M,2016-01-01 07:30:00,2016-01-01 08:10:00,40.0,1030.31,1030.31\n"

ARR ['"NUM,AIRLINE_ARR_ICAO,WAKE,SIBT,SOBT,PLANNED_TURNAROUND,DISTANCE_FROM_ORIGIN,DISTANCE_TO_TARGET\n1,AEA,H,2016-01-01 04:05:00,2016-01-01 14:10:00,605,9920.67,5776.89\n2,AEA,H,2016-01-01 04:25:00,2016-01-01 06:30:00,125.0,10060.80,483.93\n3,AVA,H,2016-01-01 05:05:00,2016-01-01 07:05:00,120.0,8033.86,8033.86\n4,IBE,H,2016-01-01 05:20:00,2016-01-01 10:40:00,320.0,0.00,8507.73\n5,IBE,H,2016-01-01 05:25:00,2016-01-01 10:50:00,325.0,6698.42,6698.42\n6,IBE,H,2016-01-01 05:30:00,2016-01-01 08:10:00,160.0,10699.06,1246.30\n7,IBE,H,2016-01-01 05:30:00,2016-01-01 11:00:00,330.0,9081.35,8033.86\n8,IBE,H,2016-01-01 05:40:00,2016-01-01 11:35:00,355.0,5776.89,8749.87\n9,ANE,M,2016-01-01 05:50:00,2016-01-01 14:50:00,540.0,284.73,284.73\n10,ETD,H,2016-01-01 06:35:00,2016-01-01 08:00:00,85.0,5647.10,5647.10\n11,IBS,M,2016-01-01 06:50:00,2016-01-01 08:00:00,70.0,547.36,1460.92\n12,IBE,H,2016-01-01 06:50:00,2016-01-01 10:35:00,225.0,6763.16,6763.16\n13,IBE,H,2016-01-01 06:50:00,2016-01-01 10:50:00,240.0,7120.40,7120.40\n14,IBE,H,2016-01-01 06:50:00,2016-01-01 10:55:00,245.0,7010.08,0.00\n15,QTR,H,2016-01-01 06:55:00,2016-01-01 08:30:00,95.0,5338.52,5338.52\n16,IBS,M,2016-01-01 07:00:00,2016-01-01 07:45:00,45.0,485.52,1721.09\n17,IBS,M,2016-01-01 07:00:00,2016-01-01 07:45:00,45.0,394.98,429.37\n18,ELY,M,2016-01-01 07:05:00,2016-01-01 08:30:00,85.0,3550.48,3550.48\n19,AAL,H,2016-01-01 07:05:00,2016-01-01 12:05:00,300.0,5925.61,5925.61\n20,TVF,M,2016-01-01 07:30:00,2016-01-01 08:10:00,40.0,1030.31,1030.31\n"']

TYPE arr

I need to convert arr to pandas DataFrame. I wrote this code:

new = []
for i in range(0,len(arr)):
    line = arr[i].split(",")
    new.append(line)

X = pd.DataFrame(new[1:],columns=new[0])

print("X",X.head())

But it does not work properly. I assume that it does not work because arr is ['".."'] instead of [..].

Any help is highly appreciated.

UPDATE:

csvData = pd.read_csv(io.StringIO((request.GET.get('csvData'))))

print("TYPE csvData", type(csvData))

print("CSV", csvData.head())

TYPE csvData <class 'pandas.core.frame.DataFrame' CSV Empty DataFrame
Columns:
[NUM,AIRLINE_ARR_ICAO,WAKE,SIBT,SOBT,PLANNED_TURNAROUND,DISTANCE_FROM_ORIGIN,DISTANCE_TO_TARGET\n1,AEA,H,2016-01-01
04:05:00,2016-01-01 14:10:00,605,9920.67,5776.89\n2,AEA,H,2016-01-01
04:25:00,2016-01-01 06:30:00,125.0,10060.80,483.93\n3,AVA,H,2016-01-01
05:05:00,2016-01-01 07:05:00,120.0,8033.86,8033.86\n4,IBE,H,2016-01-01
05:20:00,2016-01-01 10:40:00,320.0,0.00,8507.73\n5,IBE,H,2016-01-01
05:25:00,2016-01-01 10:50:00,325.0,6698.42,6698.42\n6,IBE,H,2016-01-01
05:30:00,2016-01-01
08:10:00,160.0,10699.06,1246.30\n7,IBE,H,2016-01-01
05:30:00,2016-01-01 11:00:00,330.0,9081.35,8033.86\n8,IBE,H,2016-01-01
05:40:00,2016-01-01 11:35:00,355.0,5776.89,8749.87\n9,ANE,M,2016-01-01
05:50:00,2016-01-01 14:50:00,540.0,284.73,284.73\n10,ETD,H,2016-01-01
06:35:00,2016-01-01 08:00:00,85.0,5647.10,5647.10\n11,IBS,M,2016-01-01
06:50:00,2016-01-01 08:00:00,70.0,547.36,1460.92\n12,IBE,H,2016-01-01
06:50:00,2016-01-01
10:35:00,225.0,6763.16,6763.16\n13,IBE,H,2016-01-01
06:50:00,2016-01-01
10:50:00,240.0,7120.40,7120.40\n14,IBE,H,2016-01-01
06:50:00,2016-01-01 10:55:00,245.0,7010.08,0.00\n15,QTR,H,2016-01-01
06:55:00,2016-01-01 08:30:00,95.0,5338.52,5338.52\n16,IBS,M,2016-01-01
07:00:00,2016-01-01 07:45:00,45.0,485.52,1721.09\n17,IBS,M,2016-01-01
07:00:00,2016-01-01 07:45:00,45.0,394.98,429.37\n18,ELY,M,2016-01-01
07:05:00,2016-01-01 08:30:00,85.0,3550.48,3550.48\n19,AAL,H,2016-01-01
07:05:00,2016-01-01
12:05:00,300.0,5925.61,5925.61\n20,TVF,M,2016-01-01
07:30:00,2016-01-01 08:10:00,40.0,1030.31,1030.31\n] 

Index: []

Update 2:

csvData = pd.read_csv(io.StringIO((request.GET.get('csvData').replace('\\n', '\n'))))

print("TYPE csvData", type(csvData))

print("CSV", csvData.head())

TYPE csvData <class 'pandas.core.frame.DataFrame'>
CSV Empty DataFrame

Columns: [NUM,AIRLINE_ARR_ICAO,WAKE,SIBT,SOBT,PLANNED_TURNAROUND,DISTANCE_FROM_ORIGIN,DISTANCE_TO_TARGET
1,AEA,H,2016-01-01 04:05:00,2016-01-01 14:10:00,605,9920.67,5776.89
2,AEA,H,2016-01-01 04:25:00,2016-01-01 06:30:00,125.0,10060.80,483.93
3,AVA,H,2016-01-01 05:05:00,2016-01-01 07:05:00,120.0,8033.86,8033.86
4,IBE,H,2016-01-01 05:20:00,2016-01-01 10:40:00,320.0,0.00,8507.73
5,IBE,H,2016-01-01 05:25:00,2016-01-01 10:50:00,325.0,6698.42,6698.42
6,IBE,H,2016-01-01 05:30:00,2016-01-01 08:10:00,160.0,10699.06,1246.30
7,IBE,H,2016-01-01 05:30:00,2016-01-01 11:00:00,330.0,9081.35,8033.86
8,IBE,H,2016-01-01 05:40:00,2016-01-01 11:35:00,355.0,5776.89,8749.87
9,ANE,M,2016-01-01 05:50:00,2016-01-01 14:50:00,540.0,284.73,284.73
10,ETD,H,2016-01-01 06:35:00,2016-01-01 08:00:00,85.0,5647.10,5647.10
11,IBS,M,2016-01-01 06:50:00,2016-01-01 08:00:00,70.0,547.36,1460.92
12,IBE,H,2016-01-01 06:50:00,2016-01-01 10:35:00,225.0,6763.16,6763.16
13,IBE,H,2016-01-01 06:50:00,2016-01-01 10:50:00,240.0,7120.40,7120.40
14,IBE,H,2016-01-01 06:50:00,2016-01-01 10:55:00,245.0,7010.08,0.00
15,QTR,H,2016-01-01 06:55:00,2016-01-01 08:30:00,95.0,5338.52,5338.52
16,IBS,M,2016-01-01 07:00:00,2016-01-01 07:45:00,45.0,485.52,1721.09
17,IBS,M,2016-01-01 07:00:00,2016-01-01 07:45:00,45.0,394.98,429.37
18,ELY,M,2016-01-01 07:05:00,2016-01-01 08:30:00,85.0,3550.48,3550.48
19,AAL,H,2016-01-01 07:05:00,2016-01-01 12:05:00,300.0,5925.61,5925.61
20,TVF,M,2016-01-01 07:30:00,2016-01-01 08:10:00,40.0,1030.31,1030.31
]

Index: []

Update 3:

This is how csvData is generated,

    var reader = new FileReader();
    reader.onload =  (e) => {
      // Use reader.result
      this.setState({
        csvData: reader.result
      })
      this.props.setCsvData(reader.result)
    }
    reader.readAsText(files[0])

Then I sent it to backend in this way:

'&csvData='+JSON.stringify(this.state.csvData)
ScalaBoy
  • 3,254
  • 13
  • 46
  • 84
  • what is the website you are trying to access? pandas have inbuilt functions to extract tables from websites. – Bhanu Tez Feb 20 '19 at 10:07
  • @BhanuTez: It is my platform with the front-end and back-end. I send these data from the front-end. – ScalaBoy Feb 20 '19 at 10:09
  • Try looking into the character encoding of the data. I guess the problem is with the \n. \n that is echoed to the screen is not a new line and try cleaning the string before you use read_csv(). – arjun Feb 20 '19 at 10:25
  • @MallikarjunM: You are right. But how can I substitute \n with real linebreaks? string.replace("\n","\n") – ScalaBoy Feb 20 '19 at 10:53
  • replace('\\n', '\n'). Reference: https://stackoverflow.com/questions/42965689/replacing-a-text-with-n-in-it-with-a-real-n-output – arjun Feb 20 '19 at 10:55
  • @MallikarjunM: This seem to have partly solved the problem. But the DataFrame is still empty and all the content is inside columns as it is shown in the Update2. – ScalaBoy Feb 20 '19 at 11:30
  • @ScalaBoy: You have garbage input that's not a real CSV, you should fix your front end code first there to send an actual CSV. I'm suspecting you have a CSV that had somehow been reencoded as a JSON string, i which case you can use `json.loads()` first to decode the data. However, unless you really somehow must have this smuggle as JSON, it's better if you fix the problem at the source of the data, so it doesn't JSON encode what you're sending. – Lie Ryan Feb 20 '19 at 12:38
  • @LieRyan: Please see my Update #3, where I explained how `csvData` is created in the front-end. – ScalaBoy Feb 20 '19 at 16:18

1 Answers1

2

Use the read_csv method to load your data.

import io

data = pandas.read_csv(io.StringIO(request.GET.get('csvData')))
Lie Ryan
  • 62,238
  • 13
  • 100
  • 144
  • Thanks. Please see my update where I show the result of running this code. All content goes to the columns for some reason. – ScalaBoy Feb 20 '19 at 10:10
  • I think you should look into the documentation of `read_csv` and see params for delimiter, newline characters and headers. – Addy Feb 20 '19 at 10:16
  • I came up with this code: csvData = `pd.read_csv(io.StringIO((request.GET.get('csvData'))), sep=',', escapechar='\n')`, but anyway all the data gets into `columns`. – ScalaBoy Feb 20 '19 at 16:32