1

I saved a sheet from excel into csv format. And after importing the data in python with the code:

import csv
with open('45deg_marbles.csv', 'r') as f:
    reader = csv.reader(f,dialect='excel')
    basis = []
    for row in reader:
        print(row)

Output:

['1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16']
['0.001;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363']
['0.002;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363']
['0.003;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283']

Basically it has 16 columns and 1399 rows. I realized each row consists of one long string, I then replaced all the ';' with ',' which will hopefully help to convert the column of strings to a matrix which I can manipulate the data with. Now I end up with a matrix or rather a list of one row with all the strings. This is what I have so far in terms of code and output respectively:

import csv
with open('45deg_marbles.csv', 'r') as f:
    reader = csv.reader(f,dialect='excel')
    basis = []
    for row in reader:
        #print(row)

        for i in range(len(row)):
            new_row = (row[i].replace(';', ','))
            basis.append(new_row)

print(basis)


>> ['1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16', '0.001,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363', '0.002,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363', '0.003,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.004,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.005,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.006,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.007,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.008,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.009,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.01,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', ... , '1.396,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0', '1.397,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0', '1.398,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0']

But this is the form I want it in a matrix equal to:

[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],[0.001,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363],[0.002,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363], [0.003,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283]]

In order to do manipulation on the data

I would greatly appreciate any help. Thank you in advance.

2 Answers2

2

change separator to a semicolon (default is comma, which does not work here since your input data has semicolons in it) (I think you could omit the dialect='excel' part)

import csv

with open('45deg_marbles.csv', 'r') as f:
    reader = csv.reader(f,dialect='excel',delimiter=";")
    basis = list(reader)

now basis is a list of rows containing the data as text.

But you want them as integers / float. So you have to do some more postprocessing: list comprehension converting to integer if it is an integer (negative integers work too), else converting to float (of course another test needs to be added if there are alphanumerical rows, but not the case here)

import csv,re
intre = re.compile(r"-?\d+$")

with open('45deg_marbles.csv', 'r') as f:
    reader = csv.reader(f,dialect='excel',delimiter=";")
    basis = []
    for row in reader:
        basis.append([int(x) if intre.match(x) else float(x) for x in row])

print(basis)

result

[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], [0.001, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363], [0.002, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363], [0.003, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283]]

Note that there's a variant if integers are guaranteed to be positive. Saves a regex evaluation:

basis.append([int(x) if x.isdigit() else float(x) for x in row])
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • Note that `int(x) if x.isdigit() else float(x)` converts negative integers to `float`, but that may be ok for the OP's purposes. – PM 2Ring Sep 28 '16 at 14:22
  • @PM2Ring agreed (did not think of that). We would need a regex then. Edited. – Jean-François Fabre Sep 28 '16 at 14:31
  • There's a way to do it without regex: http://stackoverflow.com/a/379966/4014959 But that approach will probably be slower unless most of the input strings represent integers: Python exceptions are very fast and efficient when the exception isn't raised, but they're quite slow when the exception is raised. OTOH, regex isn't exactly fast, either... – PM 2Ring Sep 28 '16 at 14:40
  • 1
    and would be difficult to handle exceptions in a listcomp. I have added the alternative to avoid regexes if numbers cannot be negative (seems to be the case here, seem to be just row indexes) – Jean-François Fabre Sep 28 '16 at 14:52
-2

what you need to do is

for row in reader:
    basis.append(row.split(';'))

What you are doing wrong is that you replace ';' with comma ',' this does not make a list from a string, just replaces a symbol in this string. While you should split string to elements.

Alex
  • 1,141
  • 8
  • 13
  • @Johan Sestiger: I don't understand why you de-accepted my other answer for that one that doesn't even work since it doesn't convert the values as integer/floats and uses csv with the wrong separator, to split it again with another one! Can you at least explain? – Jean-François Fabre Sep 29 '16 at 19:13
  • @ Jean-François Fabre: Sorry, I am new to stack exchange. I don't know how to accept and/or de-accept the answers. Is the green tick a toggle? Oky, I see. My appologies. –  Nov 08 '16 at 21:10