Skip lines with strange characters when I read a file

Question

I am trying to read some data files '.txt' and some of them contain strange random characters and even extra columns in random rows, like in the following example, where the second row is an example of a right row:

CTD 10/07/30 05:17:14.41 CTD 24.7813, 0.15752, 1.168, 0.7954, 1497.¸ 23.4848, 0.63042, 1.047, 3.5468, 1496.542

CTD 10/07/30 05:17:14.47 CTD 23.4846, 0.62156, 1.063, 3.4935, 1496.482

I read the description of np.loadtxt and I have not found a solution for my problem. Is there a systematic way to skip rows like these?

The code that I use to read the files is:

#Function to read a datafile

def Read(filename):
    #Change delimiters for spaces
    s = open(filename).read().replace(':',' ')
    s = s.replace(',',' ')
    s = s.replace('/',' ')
    #Take the columns that we need
    data=np.loadtxt(StringIO(s),usecols=(4,5,6,8,9,10,11,12))
    return data

Can you provide a sample of your code? – YamiOmar88 Aug 30 '19 at 09:05 — YamiOmar88, Aug 30 '19 at 09:05

score 0 · Answer 1 · answered Aug 30 '19 at 09:50

0

You could use the csv module to read the file one line at a time and apply your desired filter.

import csv

def isascii(s):
    len(s) == len(s.encode())

with open('file.csv') as csvfile:
     csvreader = csv.reader(csvfile)
    for row in csvreader:
         if len(row)==expected_length and all((isascii(x) for x in row)):
             'write row onto numpy array'

I got the ascii check from this thread How to check if a string in Python is in ASCII?

answered Aug 30 '19 at 09:50

Simon Notley

2,070
3
12
18

Instead of using CSV could you not just read the file line by line and do the isascii check ? `with open("file.txt", "r") as ins: array = [] for line in ins:` – Shadesfear Aug 30 '19 at 10:25
Absolutely, but the OP said that they had an issue with unexpected columns and their data appears to be csv so it was the most standard way I could think of to check the number of columns in a row of csv in addition to the ascii check. – Simon Notley Aug 30 '19 at 10:56

score 0 · Accepted Answer · answered Aug 30 '19 at 10:39

This works without using csv like the other answer and just reads line by line checking if it is ascii

data = []

def isascii(s):
    return len(s) == len(s.encode())

with open("test.txt", "r") as fil:
    for line in fil:
        res = map(isascii, line)
        if all(res):
            data.append(line)

print(data)

Skip lines with strange characters when I read a file

2 Answers2