0

I am looking to remove rows from a csv file if they contain specific strings or in their row.

I'd like to be able to create a new output file versus overwriting the original.

I need to remove any rows that contain "py-board" or "coffee"

Example Input:

173.20.1.1,2-base
174.28.2.2,2-game
174.27.3.109,xyz-b13-coffee-2
174.28.32.8,2-play
175.31.4.4,xyz-102-o1-py-board
176.32.3.129,xyz-b2-coffee-1
177.18.2.8,six-jump-walk

Expected Output:

173.20.1.1,2-base
174.28.2.2,2-game
174.28.32.8,2-play
177.18.2.8,six-jump-walk

I tried this Deleting rows with Python in a CSV file

import csv
with open('input_csv_file.csv', 'rb') as inp, open('purged_csv_file', 'wb') as out:
    writer = csv.writer(out)
    for row in csv.reader(inp):
        if row[1] != "py-board" or if row[1] != "coffee":
            writer.writerow(row)

and I tried this

import csv
with open('input_csv_file.csv', 'rb') as inp, open('purged_csv_file', 'wb') as out:
    writer = csv.writer(out)
    for row in csv.reader(inp):
        if row[1] != "py-board":
            if row[1] != "coffee":
                writer.writerow(row)

and this

        if row[1][-8:] != "py-board":
            if row[1][-8:] != "coffee-1":
                if row[1][-8:] != "coffee-2":

but got this error

  File "C:\testing\syslogyamlclean.py", line 6, in <module>
    for row in csv.reader(inp):
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
Daniel Widdis
  • 8,424
  • 13
  • 41
  • 63
PythonDawg
  • 5
  • 2
  • 5
  • This is wrong: `if row[1] != "py-board" or if row[1] != "coffee" ` What about this: `if (row[1] != "py-board" or row[1] != "coffee"): ` ? – vojtam Apr 28 '21 at 13:09
  • `"xyz-102-o1-py-board" != "py-board"` However, `"py-board" in "xyz-102-o1-py-board"` is `True`. – 001 Apr 28 '21 at 13:11
  • @JohnnyMopp - I also tried if row[1][-8:] != "py-board": – PythonDawg Apr 28 '21 at 13:16
  • Also, you are opening the file in binary mode (the "b" in "rb") but the csv library wants text. Change "rb" to "r". – 001 Apr 28 '21 at 13:22
  • @JohnnyMopp - changed rb to r and still got this: TypeError: a bytes-like object is required, not 'str' – PythonDawg Apr 28 '21 at 13:30

2 Answers2

2

I would actually not use the csv package for this goal. This can be achieved easily using standard file reading and writing.

Try this code (I have written some comments to make it self-explanatory):

# We open the source file and get its lines
with open('input_csv_file.csv', 'r') as inp:
    lines = inp.readlines()

# We open the target file in write-mode
with open('purged_csv_file.csv', 'w') as out:
    # We go line by line writing in the target file
    # if the original line does not include the
    # strings 'py-board' or 'coffee'
    for line in lines:
        if not 'py-board' in line and not 'coffee' in line:
            out.write(line)
Sherlock Bourne
  • 490
  • 1
  • 5
  • 10
  • I tried this, but it did not remove the rows with those strings, they are still in the purged file. – PythonDawg Apr 28 '21 at 13:25
  • That's strange, I executed it and the output file contains only the rows you wanted. Did you literally copy/pasted and tested it @PythonDawg? – Sherlock Bourne Apr 28 '21 at 13:32
  • actually, yes, that worked, thank you so much! sorry, when I copied, I forgot to change the sample strings to actually match the ones I needed to remove. Thank you! – PythonDawg Apr 28 '21 at 13:45
  • What if I constantly need to remove string, the for loop writing is very time-consuming – myworldbox Dec 24 '21 at 17:16
  • @myworldbox, I do not really get your point. How would you do it in a less time-consuming way, then? – Sherlock Bourne Dec 24 '21 at 17:27
0
# pandas helps to read and manipulate .csv file
import pandas as pd

# read .csv file
df = pd.read_csv('input_csv_file.csv', sep=',', header=None)
df
              0                    1
0    173.20.1.1               2-base
1    174.28.2.2               2-game
2  174.27.3.109     xyz-b13-coffee-2
3   174.28.32.8               2-play
4    175.31.4.4  xyz-102-o1-py-board
5  176.32.3.129      xyz-b2-coffee-1
6    177.18.2.8        six-jump-walk

# filter rows
result = df[np.logical_not(df[1].str.contains('py-board') | df[1].str.contains('coffee'))]
print(result)
             0              1
0   173.20.1.1         2-base
1   174.28.2.2         2-game
3  174.28.32.8         2-play
6   177.18.2.8  six-jump-walk

# save to result.csv file
result.to_csv('result.csv', index=False, header=False)
imdevskp
  • 2,103
  • 2
  • 9
  • 23
  • thanks. I was trying to solve without pandas if I could and also, I want the output to exclude those items versus include. Hard to tell from your example if that's happening. – PythonDawg Apr 28 '21 at 13:20