0

I'm working on a project, where we for a start, have to filter the data, so that the invalid data is removed. This means among other that if one of the rows in the data we load contains letters/words it has to be deleted. Is my following code enough to do so?

import numpy as np
def dataLoad(filename):
#The data is loaded and the variables are defined:
    rawData=np.loadtxt('test.txt')
    rawTemperature, rawGrowthrate, rawBacteria=np.loadtxt('test.txt',unpack=True)
    print("You have choosen to work with the file {:s}".format(filename))
    # Removeing unvalid data:
    # Empty vector to save the invalid data in:
    InvalidData=[]
    # Vector with ones:
    Erase=np.ones(len(rawData))

    # The loop looks trough every datapoint in the matrix:
    for i in range(len(rawData)):
        # The rows in the Data that contains invalid data is inserted in Invalid Data
        # And the ones in I'th place is switched to a zeroes.
        if rawTemperature[i]<10 or rawTemperature[i]>60 or rawTemperature[i]==(""):
            InvalidData.insert(i,'In line %d invalid Temperature' % (i+1))
            Erase[i]=0
        if rawGrowthrate[i]<0 or rawGrowthrate[i]==(""):
            InvalidData.insert(i,'In line %d invalid Growth rate' % (i+1))
            Erase[i]=0
        if rawBacteria[i]<0 or rawBacteria[i]>4 or rawBacteria[i]==(""):
            InvalidData.insert(i,'In line %d invalid Bacteria' % (i+1))            
            Erase[i]=0
Tonechas
  • 13,398
  • 16
  • 46
  • 80

1 Answers1

0

i do not understand if you want to delete the entire row or only a character that is a letter and not a number or something

to check if a row contains letters or words you can use the regex [a-zA-Z] Regex to match only letters

https://docs.python.org/2/library/re.html

if you want simply delete the character you can use re.sub and substitute the character with an empty space ''

import re
s = "ExampleString123"
replaced = re.sub('[a-zA-Z]', '', s)
print replaced 

for a numpy example see Numpy array Regex sub

if you want to delete the entire row in the numpy array you can select it using the regex [a-zA-Z] (Selecting elements in numpy array using regular expressions) and then delete it (deleting rows in numpy array)

Community
  • 1
  • 1
ralf htp
  • 9,149
  • 4
  • 22
  • 34