Remove trailing and leading char using csv.reader

Question

How can I remove a certain char if my value in second column of csv starts with "(" or end with ")", I'm very new to python guys help me to solve this

Example:

0023632fa4a860be8bc85ddf39fc19c3c4c2e6fe,(Java Archive (JAR) 4049-0),Not Supported,
005c41fc0f8580f51644493fcbaa0d2d468312c3,(WIN32 EXE 7-2),Ransom.Win32.TRX.XXPE50FFF027,

to

0023632fa4a860be8bc85ddf39fc19c3c4c2e6fe,Java Archive (JAR) 4049-0,Not Supported,
005c41fc0f8580f51644493fcbaa0d2d468312c3,WIN32 EXE 7-2,Ransom.Win32.TRX.XXPE50FFF027,

I have this code using DATA INFILE

TRIM(TRAILING ')' FROM TRIM(LEADING '('

How can I apply it here in my code:

with open(fullPath, 'rb') as file:
     csv_data = csv.reader(file)
     next(csv_data)

because i only need to delete the () at the start and end of the string — Godshand, Nov 15 '18 at 07:52

b-fg · Answer 1 · 2018-11-15T09:27:51.467

2

A solution using lstrip() and rstrip()

import csv

new_rows = []
with open('test.csv', 'rt') as file:
    csv_data = csv.reader(file, delimiter=',')
    for row in csv_data:
        new_rows.append([row[0],row[1].lstrip('(').rstrip(')'),row[2]])

print(new_rows) # Outputs ['0023632fa4a860be8bc85ddf39fc19c3c4c2e6fe,Java Archive (JAR) 4049-0Not Supported', '005c41fc0f8580f51644493fcbaa0d2d468312c3,WIN32 EXE 7-2ansom.Win32.TRX.XXPE50FFF027']

Edit

To save the edit on a new .csv file just add:

with open('test2.csv', 'wt') as file:
    writer = csv.writer(file)
    for row in new_rows:
        writer.writerow(row)

edited Nov 15 '18 at 09:27

answered Nov 15 '18 at 08:14

b-fg

3,959
2
28
44

how can i line break this? itried new_rows.append(row[0]+','+row[1].lstrip('(').rstrip(')')+','+row[2] + "\n") so the array would be like the csv but it doesnt work – Godshand Nov 15 '18 at 08:29
i need to break it using "\n" so when printing it will show per line – Godshand Nov 15 '18 at 08:34
to get line per line just use: `for line in new_rows: print(line)` – b-fg Nov 15 '18 at 08:35
is there anyway to save it to an updated csv? because im trying to import my csv to database – Godshand Nov 15 '18 at 08:45
I have included the code to write the `new_rows` to a `test2.csv` file. If this has helped you please consider upvoting and accepting the answer. Thanks. – b-fg Nov 15 '18 at 08:56
it is saving but when im opening it, it makes empty line per row and all the values are in the first column – Godshand Nov 15 '18 at 09:13
this will of course depend on the way you import it in your .csv reader. – b-fg Nov 15 '18 at 09:24
I have edited the code to save you trouble of this. Let me know if it works. – b-fg Nov 15 '18 at 09:27
thanks, i will figure it out how to fix this, if i did it i will accept your answer – Godshand Nov 15 '18 at 09:47

score 0 · Answer 2 · answered Nov 15 '18 at 08:00

Here's one way of doing it, I've replaced the first occurrence and the last occurrence of '(' and ')' from the string. Hope it helps.

s = '''0023632fa4a860be8bc85ddf39fc19c3c4c2e6fe,(Java Archive (JAR) 4049-0),Not Supported,
005c41fc0f8580f51644493fcbaa0d2d468312c3,(WIN32 EXE 7-2),Ransom.Win32.TRX.XXPE50FFF027,'''

def last_replace(s, old, new, occurrence):
    '''Replaces the last occurence of the character'''
    li = s.rsplit(old, occurrence)
    return new.join(li)

new_string = [last_replace(line, ')', '', 1).replace('(', '', 1) for line in s.split('\n')]
print(new_string)

Output:

['0023632fa4a860be8bc85ddf39fc19c3c4c2e6fe,Java Archive (JAR) 4049-0,Not Supported,',
'005c41fc0f8580f51644493fcbaa0d2d468312c3,WIN32 EXE 7-2,Ransom.Win32.TRX.XXPE50FFF027,']

PS : I stole the last_replace function from here

Yea that would work too, Went with this approach so it will be easy to replace multiple `(` , `)` later on. — Vineeth Sai, Nov 15 '18 at 08:04

score 0 · Answer 3 · answered Nov 15 '18 at 08:04

This is a great opportunity to learn about regular expressions! Regular expressions are a method for recognising and dealing with patterns in text. Python has a regular expressions package as part of its standard library. I'm going to assume you're using Python 3 for the rest of this answer, where the package is named re.

The TLDR answer to your question is:

import re

string_without_parens = re.sub(r'(^\()|(\)$)', '', string_maybe_has_parens)

What's going on here, though? the re.sub() function takes three parameters, a regular expression string (denoted by the leading r), a string that you want to replace each match with, and the string you want to substitute in. The regular expression here is (^$)|($$). So what does that mean? Lets take it step by step:

A set of parentheses () represents a capture group, these can be used to get the matches out, but I've used them as a way to group characters we're looking for together. There are two capture groups in this regular expression: (^$) and ($$).
Between these is a | character, this represents OR in regular expression language, so it's looking for something that matches either (^$) or ($$).
The first capture group (^\(): has two things inside it (well, three really, but we'll get to that). The first is ^, this is what is called an anchor, this one in particular says, "only look at the start of the string". The second (and third) characters are \( which says "I want to look for an opening parentheses". Because parentheses are using in regular expressions, we have to use the backslash character to "escape" it.
The second capture group (\)$): contains an escaped closing parenthesis \) and other anchor. This anchor represents the end of the string, in the same way ^ represented the start.
Together this says: "match an opening parentheses at the beginning or a closing parenthesis at the end", and the re.sub() function says replace anything that matches this pattern with '' (i.e. nothing).

Hope that helps! If you want to play more with regular expressions, you can try out regexr, which helped me wrap my head around them.

This will be inefficient if you only want to replace `(` and `)` — Vineeth Sai, Nov 15 '18 at 08:05
"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." - Jamie Zawinski — Ahmad Khan, Nov 15 '18 at 08:07
@VineethSai Regular expressions can be compiled to DFA that operate in O(n), and I suspect the use of the anchors would optimise this to O(1) under the hood. Your use of the `replace` method makes your solution O(n). Blanket statements of "regex is slow" don't really help anyone. I'd suggest giving this a read if you want to learn more about regex speed: https://swtch.com/~rsc/regexp/regexp1.html. Regex is also provides compact, comprehensible syntax (in this case a single line). — nicklambourne, Nov 15 '18 at 08:23

Remove trailing and leading char using csv.reader

3 Answers3