I am a newbie and trying to perform some calculations and analyse some data from a huge number of CSV files using Python 3. Each CSV file has data stored in about 30000 rows seperated by a semicolon. As these are system generated files, the first row has the names of Parameters in a different order for each file. So I want to extract these Parameter names, compare them and find out the positions of required parameter values for every file.
But before I even do that, when I read the first row of the CSV file, the list has a single string seperated by the ';' while the rest of the lines are properly extracted. I have tried splitting the text and other means but none of them have any effect on the output. Can someone kindly help solve this issue.
This is the piece of code I have written:
import os
import sys
import csv
import math
# Open a file
full_path = "C:\\Documents and Settings\\My Documents\\CSV files"
dirs = os.listdir( full_path )
filename = list()
"""This will find the file in the directory and print the first row of the
CSV file - the names of the parameters."""
for file in dirs:
path = full_path+'\\'+file
with open(path, 'rt') as csvfile:
#For printing the location name
Turbine_name = (os.path.splitext(file)[0])
#Reading the files into an object
csvreader = csv.reader(csvfile, delimiter=';')
#Obtaining the first line of the CSV file - the names of the parameters
Parameters = next(csvreader)
print (Parameters)
#From this line, the values of different Parameters are extracted
for row in csvreader:
Parameters = next(csvreader)
print (Parameters)
#To eliminate rows with insufficient data
if len(Parameters)>11:
Created_time = Parameters[0]
Grid_frequency = float(Parameters[3])
Grid_KW = float(Parameters[4])
Rotor_RPM = float(Parameters[10])
The positions of the parameters are taken for example here as I am yet to write the code for comparing strings. A sample of the input file looks like this:
createTime;"Grid CosPhi";"Grid Current";"Grid Frequency";"Grid kW";"Grid VAr";"Grid Voltage";"Pitch angle 1";"Pitch angle 2";"Pitch angle 3";"Rotor RPM";"Temp. 5 214";"Temp. 6 217";"Temp. 9 227";"Winddirection";"Windspeed"
9/21/14 11:30:01 PM;N/A;N/A;49.963;211688.734;N/A;N/A;-1.06;-1.039;-1.119;19.379;47.167;36;64;N/A;6.319
9/21/14 11:40:01 PM;N/A;N/A;50.002;170096.297;N/A;N/A;-1.003;-0.96;-1.058;19.446;47.182;36.084;63.772;N/A;5.628
9/21/14 11:50:01 PM;N/A;N/A;50.021;175038.734;N/A;N/A;-0.976;-0.961;-1.082;18.805;47;36.223;63.153;N/A;5.577
9/22/14 12:00:01 AM;N/A;N/A;49.964;229942.016;N/A;N/A;-1.047;-1.018;-1.066;18.957;47.125;36.293;63.766;N/A;6.431
9/22/14 12:10:01 AM;N/A;N/A;49.908;200873.844;N/A;N/A;-0.997;-0.985;-1.06;19.229;47.028;36.334;63.962;N/A;6.076
9/22/14 12:20:01 AM;N/A;N/A;49.934;234467.609;N/A;N/A;-1.028;-0.986;-1.001;18.995;47.056;36.401;63.732;N/A;6.067
9/22/14 12:30:01 AM;N/A;N/A;49.96;270969.25;N/A;N/A;-1.138;-1.103;-1.122;18.983;47.274;36.499;64.014;N/A;6.487
9/23/14 12:30:01 PM;N/A;N/A;50.073;54458.719;N/A;N/A;-0.492;-0.52;-0.472;13;42.02;33.624;46.426;N/A;3.757
And when I get the output it looks like this:
>>>['createTime;"Grid CosPhi";"Grid Current";"Grid Frequency";"Grid kW";"Grid VAr";"Grid Voltage";"Pitch angle 1";"Pitch angle 2";"Pitch angle 3";"Rotor RPM";"Temp. 5 214";"Temp. 6 217";"Temp. 9 227";"Winddirection";"Windspeed"']
['9/21/14 11:40:01 PM', 'N/A', 'N/A', '50.002', '170096.297', 'N/A', 'N/A', '-1.003', '-0.96', '-1.058', '19.446', '47.182', '36.084', '63.772', 'N/A', '5.628']
['9/22/14 12:00:01 AM', 'N/A', 'N/A', '49.964', '229942.016', 'N/A', 'N/A', '-1.047', '-1.018', '-1.066', '18.957', '47.125', '36.293', '63.766', 'N/A', '6.431']
['9/22/14 12:20:01 AM', 'N/A', 'N/A', '49.934', '234467.609', 'N/A', 'N/A', '-1.028', '-0.986', '-1.001', '18.995', '47.056', '36.401', '63.732', 'N/A', '6.067']
As observed in some of the files, some of the Parameters are completely missing. This is why I need to find out positions of individual parameters in each CSV file. Any ideas how it can be done best are also appreciated. Thanks in advance.
Edit: Unfortunately I cannot use Pandas as it has to be seperately installed in Python34 and my office system (Windows XP, P4) does not support it. I want to do this with CSV module if possible.