Python formatting CSV from webscraped data

Question

I managed to finish a script to automate repetitive tasks. My first one on Python!So I am now in the process of automating the part where I have to retrieve the data and format it for the script to use.

Here are the relevant parts my code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
import csv

ie = 'C:\\Users\\dd\\Desktop\\IEDriverServer32.exe'
print(ie)
Iebrowswer = webdriver.Ie(ie)
Iebrowswer.get('https://ww3.example.com/')

Iebrowswer.find_element_by_class_name('gridrowselect').click()

print(len(Iebrowswer.find_elements_by_class_name('gridrow'))) 

Gridcells = Iebrowswer.find_elements_by_class_name('gridcell')
Gridinfo = [i.text for i in Gridcells]
print(Gridinfo)

csvfile = 'C:\\Users\\dd\\Desktop\\CSV1.csv'
with open(csvfile, "w") as output:
    writer = csv.writer(output, lineterminator='\n')
    for val in Gridinfo:
        writer.writerow(['val'])

I managed to get the information that I wanted. All of it. Right now, my biggest issue is what is happening to the data when I make my CSV. It's coming out all wrong. This is what I get when I print into the shell(a small example):

['5555', '1', 'Verified', '', '6666', '2', 'Verified', '']

My excel/csv file is being displayed vertically like this:

Columnl    
[5555]
[1]
[Verified]
[ ]
[6666]
[2] 
[Verified]
[ ]

What I want is for my data to displayed horizontally breaking after the empty space like this:

Column1 Column2 Column3 Column4
5555    1       Verified 
6666    2       Verified

How do I achieve this?

I've looked over the documentation and a bunch of other questions on here, but I'm not closer to understanding the csv library and its arguments at all. It always seems that I get stuck on these really simple things. The only thing I succeeded in was adding even more columns to vertically display data taunting myself.

`writer.writerow(['val'])` would not give the output you have given, it would just give rows containing the string `"val"`. Please make sure your code is an accurate representation of what you're using (you're close to cracking your problem btw) — roganjosh, Mar 15 '18 at 20:23
I spent about 30 minutes editing the code to keep the relevant parts. Maybe I messed up somewhere. Forgive me if I did. However, I did spent about 30 minutes to write this question as to not waste someone's time looking over the rest of the code. — Noctsol, Mar 15 '18 at 21:08

roganjosh · Accepted Answer · 2018-03-15T21:41:06.577

I'm not sure why you get all of your rows back as a single list. The writerow() method of the csv module expects a single list to represent a row.

for val in Gridinfo:
    writer.writerow(['val'])

Would therefore give each datapoint its own row (note however that 'val' is a string literal, so your output from this code would just be rows of the string "val" and not your actual data).

The first thing to do is to chunk your single list into multiple lists of length 4. I've borrowed the chunking function from here; you can see other methods in the answers there depending on your exact case.

This will give you a nested list. That's perfect for the writerows() method (note, plural).

Try:

def chunks(l, n):
    n = max(1, n)
    return [l[i:i+n] for i in range(0, len(l), n)]

with open(csvfile, "w") as output:
    writer = csv.writer(output, lineterminator='\n')
    writer.writerows(chunks(Gridinfo, 4))

EDIT:

The chunk() function:

Uses a list comprehension, with list slicing for the sublists
n = max(1, n) is defensive programming. It basically stops you specifying a chunk length of 0 or less (which doesn't make sense and will throw ValueError: range() arg 3 must not be zero exception). For all intents and purposes you can remove it and it will work fine; there's no harm keeping it in to avoid such an error.

It is equivalent to:

def chunks(my_list, chunk_size):
    new_list = [] # What we will return
    chunk = []    # Individual sublist chunk
    for item in my_list:
        if len(chunk) < 3:
            chunk.append(item)
        else:
            new_list.append(chunk) # Add the chunk to the output
            chunk = []             # Reset for the next chunk
            chunk.append(item)     # Make sure the current "item" gets added to the new chunk

    if len(chunk) >= 1:            # Catch any stragglers that don't make a complete chunk
        new_list.append(chunk)

    return new_list


SUBLIST_LENGTH = 3
list_to_be_chunked = [1, 2, 3, 4, 5, 6, 7]

result = chunks(list_to_be_chunked, SUBLIST_LENGTH)
print(result)

IT WORKS. From someone who only learned to use python(or code at all) about 3 weeks ago and miraculously came up with scripts to automate various tasks spending every single moment on the edge of breaking, I really appreciate this. Though, forgive me for resenting you for leaving me the rest of the day to think about how that function works. — Noctsol, Mar 15 '18 at 21:06
@Noctsol You're welcome. Give me a min to edit out one complexity and I'll give you pointers on the rest for understanding it — roganjosh, Mar 15 '18 at 21:08
@Noctsol ok, so from the answer now, what is returned from the function is a _list comprehension_ that uses _list slicing_. The original function I linked to had a _generator expression_ but it was both pointless for this application and an extra thing to cause confusion. When I get chance, I'll expand out the list comprehension to regular `for` loops. — roganjosh, Mar 15 '18 at 21:12
So from what I understand, it has to count the items in the list itself and whats constitutes a word/value. After that, the function can actually break/slice the list on a integer of your choosing. I still have to under the specifics, but that's pretty good for now. — Noctsol, Mar 15 '18 at 21:20
@Noctsol edited in an explanation and an expanded version of the function. Hope it helps. — roganjosh, Mar 15 '18 at 21:41

PythonCA · Answer 2 · 2018-03-15T21:43:32.577

0

import numpy as np
import csv

csvfile = r'C:\temp\test.csv'

Gridinfo = ['5555', '1', 'Verified', '', '6666', '2', 'Verified', '']

arr = np.resize(Gridinfo,(len(Gridinfo)/4,4))

with open(csvfile, "w") as output:
    writer = csv.writer(output, lineterminator='\n')
    writer.writerows(arr) 



#Output
5555    1   Verified
6666    2   Verified

edited Mar 15 '18 at 21:43

answered Mar 15 '18 at 20:42

PythonCA

7
4

I'm attempting this right now and getting an error. TypeError: 'float' object cannot be interpreted as an integer. I'm trying to figure out what's being considered a float here so I can convert it. – Noctsol Mar 15 '18 at 21:14
1

The edited answer still suffers from floating point errors. You need `arr = np.resize(Gridinfo,(int(len(Gridinfo)/4),4))` – roganjosh Mar 15 '18 at 21:45
I figured it out. It worked. The actual list and the actual items per row that I need are much longer. So the original statement you had did not become an integer. This is what I had to change. arr = numpy.resize(Gridinfo,((int(len(Gridinfo)/11)),11)) – Noctsol Mar 15 '18 at 21:45
Ah, well I was almost right. Thank you very much! I wonder how the division works now though. There are 8 items in the original list, it should have been fine. But that's assuming that's how it is being done. – Noctsol Mar 15 '18 at 21:48
1

@Noctsol [see this](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) – roganjosh Mar 15 '18 at 21:49
@roganjosh that explains so much. I figured it out, but didn't realize why. I'm pretty sure this has some kind of greater implication in computing, math, and philosophy, but I don't think I'm smart enough. I might be obsessively curious enough though. – Noctsol Mar 16 '18 at 03:33

Python formatting CSV from webscraped data

2 Answers2