1

Working in Python 3.7.

I'm currently pulling data from an API (Qualys's API, fetching a report) to be specific. It returns a string with all the report data in a CSV format with each new line designated with a '\r\n' escape.

(i.e. 'foo,bar,stuff\r\n,more stuff,data,report\r\n,etc,etc,etc\r\n')

The problem I'm having is writing this string properly to a CSV file. Every iteration of code I've tried writes the data cell by cell when viewed in Excel with the \r\n appended to where ever it was in the string all on one row, rather than on a new line.

(i.e |foo|bar|stuff\r\n|more stuff|data|report\r\n|etc|etc|etc\r\n|)

I'm just making the switch from 2 to 3 so I'm almost positive it's a syntactical error or an error with my understanding of how python 3 handles new line delimiters or something along those lines, but even after reviewing documentation, here and blog posts I just cant either cant get my head around it, or I'm consistently missing something.

current code:

def dl_report(id, title):
    data = {'action': 'fetch', 'id': id}
    res = a.request('/api/2.0/fo/report/', data=data)
    print(type(res)) #returns string

    #input('pause')
    f_csv = open(title,'w', newline='\r\n')
    f_csv.write(res)
    f_csv.close

but i've also tried:

with open(title, 'w', newline='\r\n') as f:
    writer = csv.writer(f,<tried encoding here, no luck>)
    writer.writerows(res)

#anyone else looking at this, this didn't work because of the difference 
#between writerow() and writerows()

and I've also tried various ways to declare newline, such as:

newline=''
newline='\n'
etc...

and various other iterations along these lines. Any suggestions or guidance or... anything at this point would be awesome.

edit:

Ok, I've continued to work on it, and this kinda works:

def dl_report(id, title):
data = {'action': 'fetch', 'id': id}
res = a.request('/api/2.0/fo/report/', data=data)
print(type(res)) #returns string

reader = csv.reader(res.split(r'\r\n'), delimiter=',')

with open(title, 'w') as outfile:
    writer = csv.writer(outfile, delimiter= '\n')
    writer.writerow(reader)

But its ugly, and does create errors in the output CSV (some rows (less than 1%) don't parse as a CSV row, probably a formatting error somewhere..), but more concerning is that it works wonky when a "\" is presented in data.

I would really be interested in a solution that works... better? More pythonic? more consistently would be nice...

Any ideas?

ke0crj
  • 13
  • 5
  • If it's already literal CSV, you can just write it directly. It sorta sounds like you're saying the text includes literal backslashes followed by `r` and `n` though, is that the case? That is, if you `print(repr(thestring))`, do you see one backslash or two? – ShadowRanger Sep 25 '18 at 23:36
  • @ShadowRanger two backslashes. I think one of issues is the api output is CSV -ish. It includes alot of header information and then multiple CSV's with a table header above each one. – ke0crj Sep 25 '18 at 23:44

4 Answers4

3

Based on your comments, the data you're being served doesn't actually include carriage returns or newlines, it includes the text representing the escapes for carriage returns and newlines (so it really has a backslash, r, backslash, n in the data). It's otherwise already in the form you want, so you don't need to involve the csv module at all, just interpret the escapes to their correct value, then write the data directly.

This is relatively simple using the unicode-escape codec (which also handles ASCII escapes):

import codecs  # Needed for text->text decoding

# ... retrieve data here, store to res ...

# Converts backslash followed by r to carriage return, by n to newline,
# and so on for other escapes
decoded = codecs.decode(res, 'unicode-escape')

# newline='' means don't perform line ending conversions, so you keep \r\n
# on all systems, no adding, no removing characters
# You may want to explicitly specify an encoding like UTF-8, rather than
# relying on the system default, so your code is portable across locales
with open(title, 'w', newline='') as f:
    f.write(decoded)

If the strings you receive are actually wrapped in quotes (so print(repr(s)) includes quotes on either end), it's possible they're intended to be interpreted as JSON strings. In that case, just replace the import and creation of decoded with:

import json


decoded = json.loads(res)
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • This worked beautifully! Thank you so much! I was getting crazy frustrated trying to force it to work as strings and never would have thought to go down this route. I had a tried a few different encoding functions based on previous comments here and other places I was looking at, but not decoding. Simple, clean solution. Thank you again! – ke0crj Sep 26 '18 at 15:24
0

If I understand your question correctly, can't you just replace the string? with open(title, 'w') as f: f.write(res.replace("¥r¥n","¥n"))

  • This still doesn't work. It seems like the write function is not interpreting the new line delimiters whether they're '\r\n' or '\n', which is why I think its something wrong with the way I'm handling the string or the newline function – ke0crj Sep 25 '18 at 20:24
0

Check out this answer:

Python csv string to array

According to CSVReader's documentation, it expects \r\n as the line delimiter by default. Your string should work fine with it. If you load the string into the CSVReader object, then you should be able to check for the standard way to export it.

Dougyfresh
  • 586
  • 3
  • 15
0

Python strings use the single \n newline character. Normally, a \r\n is converted to \n when a file is read and the newline is converted \n or \r\n depending on your system default and the newline= parameter on write.

In your case, \r wasn't removed when you read it from the web interface. When you opened the file with newline='\r\n', python expanded the \n as it was supposed to, but the \r passed through and now your neline is \r\r\n. You can see that by rereading the text file in binary mode:

>>> res = 'foo,bar,stuff\r\n,more stuff,data,report\r\n,etc,etc,etc\r\n'
>>> open('test', 'w', newline='\r\n').write(res)
54
>>> open('test', 'rb').read()
b'foo,bar,stuff\r\r\n,more stuff,data,report\r\r\n,etc,etc,etc\r\r\n'

Since you already have the line endings you want, just write in binary mode and skip the conversions:

>>> open('test', 'wb').write(res.encode())
54
>>> open('test', 'rb').read()
b'foo,bar,stuff\r\n,more stuff,data,report\r\n,etc,etc,etc\r\n'

Notice I used the system default encoding, but you likely want to standardize on an encoding.

tdelaney
  • 73,364
  • 6
  • 83
  • 116