1

The below code works, but assigns each newline of a multiline string to a new row vs. the desired state of one cell.

While researching a possible solutions I read:

  • That I should try enclose the string in double quotes
  • That replacing newline with carriage return helps
  • Parsing as a list/tuple via [] declaration

Having tried all the above, I'm thoroughly confused as to why this isn't working. If I remove newline it assigns everything to a single cell in the row but then it's hard to read.

Is there no way to preserve format when outputting a string with csv.writer?

def run_rip():
os.chdir('/REGRIPPERDIR/RegRipper2.8')
for k in ntDict:
    run_command = "".join(["./rip.pl", " -r /APATH/NTUSER.DAT -p ", str(k)])
    process = subprocess.Popen(run_command,
                               shell=True,
                               stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE)
    out, err = process.communicate()
    pattern = re.compile('lastwrite|(\d{2}:\d{2}:\d{2})|alert|trust|Value')
    grouping = re.compile('(?P<first>.+?)(\n)(?P<second>.+?)([\n]{2})(?P<rest>.+[\n])', re.MULTILINE | re.DOTALL)
    if pattern.findall(out):
        match = re.search(grouping, out)
        first = match.group('first')
        second =match.group('second')
        rest = ('"%s' % os.linesep.join([s for s in match.group('rest').splitlines() if s]))
        rest = rest.replace('\n','\r\n')
        headers = ['Name', 'Description', 'Found']
        f = csv.writer(open('/APATH/ntuser.csv', 'ab'),
                       dialect=csv.excel,
                       delimiter='\t')
        f.writerow(headers)
        f.writerow((first, second, rest))
        print(out)
    ntDict.popitem(last=False)

run_rip()

Sample Output: Three Column Output

/edit: I was requested in the comments below to post a sample of the third string rest as it is collected. The below text is what would be passed to csv.writer.

Baseline\n #First string as defined by the regex
(All) scans a hive\n #Second String as defined by the regex

Total values checked    : 64\n   #This and below is the rest string
Number of binary value lengths : 11\n
...\n
Value -5: 176 bytes [# times: 1]\n
Value -4: 712 bytes [# times: 5]\n

Desired State: enter image description here

ImNotLeet
  • 381
  • 5
  • 19
  • Can I suggest that you post some csv text (i.e. a string) that gives you the desired layout? The underlying problem is that csv isn't really a standard. What you're asking looks to be about whatever format the app that's producing your pasted images accepts. – demented hedgehog Feb 20 '16 at 06:04
  • @dementedhedgehog The app I'm interfacing with is Registry Ripper which outputs non-standard text and creates allot of excessive results that aren't needed. So my approach has been to call the app, read stdout, chuck the data into three variables as assigned above then output into a .csv. However the third variable has many newlines, which unfortunately are being read by csv.writer and output to separate cells as depicted in `sample output`. Is there a way to preserve newline/carriage return in output to .csv so that I can obtain the `desired state`? – ImNotLeet Feb 21 '16 at 16:35
  • Are there other .csv modules that would do the above? Am I not being clear in my question, is it lack of bounty? The above issue has been nagging me since I posted it. I've tried multiple hours of solutions and it's just out of my grasp (likely because I'm new to programming). – ImNotLeet Feb 22 '16 at 14:43
  • You're not posting information that anyone can reproduce the problem with. So we can only guess at solutions. Here are some: i) the csv writer has a newline argument .. try some values for that: e.g. "\n", "\r\n", "\r", "" and see what happens, and read the docs https://docs.python.org/2/library/csv.html, ii) think about changing the delimiter. iii) Maybe you need to escape your newlines. See http://stackoverflow.com/questions/15392730/in-python-is-it-possible-to-escape-newline-characters-when-printing-a-string iv) Have you considered just reading the registry yourself in python? – demented hedgehog Feb 22 '16 at 21:20
  • What regex are you using? – demented hedgehog Feb 22 '16 at 21:24
  • If you're writing to excel consider writing the excel directly? – demented hedgehog Feb 22 '16 at 21:28
  • The problem is how do we know at the time we get to the rest string you posted that ...\n is part of the previous line. It's a valid line of csv – demented hedgehog Feb 22 '16 at 21:29
  • @dementedhedgehog put full function in edit above includes the regex. Thanks again for your help. – ImNotLeet Feb 22 '16 at 21:30
  • @dementedhedgehog while excel is one of the possible inputs for the csv/tsv or however it needs to be separated it's not the only output. It might be my newness, but it doesn't feel like it should be that hard to wordwrap within a row to display multiple lines of code as defined by the string I'm passing. I will try a different delimiter at the end of the string, perhaps | but it seems based off observation that csv.writer is interpreting `\n` as a tab, or there is text I'm unable to see in the print that is causing it to write to the next cell as displayed in sample output. Very Frustrating:/ – ImNotLeet Feb 22 '16 at 21:31
  • The underlying problems are: csv is not a standardized format. It's defined by implementations. So you might want to look into newlines in csv for excel. Secondly, a much nicer way is to work at a high level. Read directly from the registry, write directly to an excel file. The newline arg seems to only be for python3 :( dunno what it does exactly. Otherwise I agree.. This stuff can be p[ainful sometimes – demented hedgehog Feb 22 '16 at 21:37
  • @dementedhedgehog can you elaborate on the registry part of your statement, I'm unsure what you mean? Forgive my innocence on this matter I am still really new to programming and python, but have the programming itch and am writing this to assist in consolidating my forensic tools for work. – ImNotLeet Feb 22 '16 at 21:40
  • Sure, If all you want to do is grab a few, well known registry key values you can use the builtin _winreg lib (winreg in python3) to do that. It let's you grab/set values and walk the registry tree. – demented hedgehog Feb 22 '16 at 21:43
  • Added links to libs to writing excel docs directly in my "answer" (using the term loosely). Wanting to learn is 9/10ths of the job. – demented hedgehog Feb 22 '16 at 21:46
  • 1
    Check this out http://stackoverflow.com/questions/2668678/importing-csv-with-line-breaks-in-excel-2007 – demented hedgehog Feb 22 '16 at 21:54
  • 1
    You can also write your own csv writer. It's pretty trivial (if you need to prepend a space to values that look like this: "2\n3"). I don't have excel so I can't test what formats excel imports successfully (and it's likely to vary by excel version anyway). – demented hedgehog Feb 22 '16 at 21:56

4 Answers4

2

Not an answer... but I want code formatting.

import csv
x = ("1", "2\n3", "4")
f = csv.writer(open('foo', 'w'),
                   dialect=csv.excel,
                   delimiter='\t')
f.writerow(x)

Produces this:

$ less foo
1       "2
3"      4

And that's "valid" tab separated csv.. it's just that excel doesn't handle it "properly". Where things in quotes are really more implementation issues since it's not a standardized format. The extra double quotes around the 23 are kind of annoying.

Check out https://pypi.python.org/pypi/xlwt (pre xml excel formats) or http://xlsxwriter.readthedocs.org/ (xml excel formats) for 3rd party libraries to write excel directly.

demented hedgehog
  • 7,007
  • 4
  • 42
  • 49
0

With the guidance of the comments I found my answer. Simply excel borks up formatting (for reasons covered in the comments). However when opened in LibreOffice the formatting is retained.

The suggested thread in the comments (Importing CSV with line breaks in Excel 2007) has a purposed workaround which includes quoting the actual newlines vs. quoting the entire string, which is what I was doing.

Community
  • 1
  • 1
ImNotLeet
  • 381
  • 5
  • 19
0

this is how I solved this problem.

Input Data:

('firstName.lastName@gmail.com', 'firstName', 'lastName', 'Address1', 'Address1
Address2', 'IP1
IP2
IP3', 'PhoneNumber')

Desired CSV Format: enter image description here

Here is a Python code to get the same:

try:
    cs.execute(sql)
    row = cs.fetchone()
    while row is not None:
        email = row[0]
        filename = '{0}.csv'.format(email)
        with open(filename, "w", encoding='utf-8-sig', newline='') as fp:
            writer = csv.writer(fp)
            writer.writerow(('REQUESTER EMAIL', 'FIRST NAME', 'LAST NAME', 'BILLING ADDRESSES', 'SHIPPING ADDRESSES', 'IP ADDRESSES', 'PHONE NUMBER'))
            writer.writerow((
                row[0],
                row[1],
                row[2],
                row[3],
                row[4],
                row[5],
                row[6],
            ))
        row = cs.fetchone()
finally:
    cs.close()
ctx.close()

See this line (#7):

open(filename, "w", encoding='utf-8-sig', newline='')

setting up a unicode to 'utf-8-sig' is doing all trick.

Thanks, Hussain Bohra

Hussain Bohra
  • 985
  • 9
  • 15
0

You can simply use \n to separate the items in the multiline cell. Just make sure the whole cell content appears between double-quote:

f = open(filename, "w", encoding='utf-8')

f.write('a,b,"c\nd",e\n')

This example writes a row in a csv file where the third cell has 2 lines c and d.

tagny
  • 1