-1

Good evening, I'm having a problem with a code I'm writing, and I would love to get advice. I want to do the following:

  1. Remove rows in a .csv file that contain a specific value (-3.4028*10^38)
  2. Write a new .csv

The file I'm working with is large (12.2 GB, 87 million rows), and has 6 columns within it, with the first 5 columns being numerical values, and the last value containing text.

Here is my code:

import csv

directory = "/media/gman/Folder1/processed/test_removal1.csv"
with open('run1.csv', 'r') as fin, open(directory, 'w', newline='') as fout:

# define reader and writer objects
reader = csv.reader(fin, skipinitialspace=False)
writer = csv.writer(fout, delimiter=',')

# write headers
writer.writerow(next(reader))

# iterate and write rows based on condition
for i in reader:
    if (i[-1]) == -3.4028E38:
        writer.writerow(i)

When I run this I get the following error message:

Error: line contains NUL

File "/media/gman/Aerospace_Classes/Programs/csv_remove.py", line 19, in <module>
for i in reader: Error: line contains NUL 

I'm not sure how to proceed. If anyone has any suggestions, please let me know. Thank you.

wwii
  • 23,232
  • 7
  • 37
  • 77
gman2020
  • 39
  • 5
  • Please provide the entire error message, as well as a [mcve]. – AMC Mar 07 '20 at 03:27
  • Could it be an encoding issue? Check this: https://stackoverflow.com/a/9882004/1293690 – geo909 Mar 07 '20 at 03:30
  • 1
    I think you're using the wrong tool for this job. If you just want to create a new CSV from the existing, minus the rows that contain a certain substring, just do something like this: `grep -v '-3.4028*10^38' existing_file.csv > new_file.csv` – Z4-tier Mar 07 '20 at 03:37
  • Here is the full error message: File "/media/gman/Aerospace_Classes/Programs/csv_remove.py", line 19, in for i in reader: Error: line contains NUL – gman2020 Mar 07 '20 at 03:58
  • Please fix the indentation . – wwii Mar 07 '20 at 05:14
  • 1
    Does this answer your question? [Python CSV error: line contains NULL byte](https://stackoverflow.com/questions/4166070/python-csv-error-line-contains-null-byte) .. Many more searching with `Error: line contains NUL`. – wwii Mar 07 '20 at 05:15

1 Answers1

0

I figured it out. Here is what I ended up doing:

#IMPORT LIBRARIES
import pandas as pd

#IMPORT FILE PATH
directory = '/media/gman/Grant/Maps/processed_maps/csv_combined.csv'

#CREATE DATAFRAME FROM IMPORTED CSV
data = pd.read_csv(directory)
data.head()
data.drop(data[data.iloc[:,2] < -100000].index, inplace=True) #remove rows that contain altitude values greater than -100,000 meters.
# this is to remove the -3.402823E038 meter altitude values that keep coming up.

#CONVERT PROCESSED DATAFRAME INTO NEW CSV FILE
df = data.to_csv(r'/media/gman/Grant/Maps/processed_maps/corrected_altitude_data.csv') #export good data to this file.

I went with pandas to remove rows based on a logic argument, this made a dataframe. I then exported the dataframe into a csv file.

gman2020
  • 39
  • 5