I am generating data-sets from experiments. I end up with csv data-sets that are typically are n x 4
dimensional (n
rows; n > 1000
and 4
columns). However, due to an artifact of the data-collection process, typically the first couple of rows and the last couple of rows have only 2 or 3 columns. So a data-set looks like:
8,0,4091
8,0,
8,0,4091,14454
10,0,4099,14454
2,0,4094,14454
8,-3,4104,14455
3,0,4100,14455
....
....
14,-1,4094,14723
0,3,4105,14723
7,0,4123,14723
7,
6,-2,4096,
3,2,
As you can see, the first two rows and the last three don't have the 4 columns that I expect. When I try importing this file using np.loadtxt(filename, delimiter = ',')
, I get an error. Once I remove the rows which have fewer than 4 columns (first 2 rows, and last 3 rows, in this case), the import works fine.
Two questions:
Why doesn't the usual importing work. I am not sure what is the exact error in this importing. In other words, why is not having the same number of columns in all rows a problem?
As a workaround, I know how to ignore the first two rows while importing the files with numpy
np.loadtxt(filename, skiprows= 2)
, but is there a simple way to also select a fixed number of rows at the bottom to ignore?
Note: This is NOT about finding unique rows in a numpy array. Its more about importing csv data that are non-uniform in the number of columns that each row contains.