0

Problem

I have a table made of 380 rows and 20 columns. I want to remove rows from this table following a certain condition.

To clarify things, let's say I have the list:

names = ['John', 'Amy', 'Daniel']

I want to remove the data of all the people whose name is found in the list names.

Example, let's say my data looks something like this:

John    82    3.12    boy
Katy    12    1.12    girl
Amy     42    2.45    girl
Robert  32    1.56    boy
Daniel  47    2.10    boy

I want to remove the data of John, Amy, and Daniel. So the output should be:

Katy    12    1.12    girl
Robert  32    1.56    boy

Attempt to solve it

import csv
import numpy as np

# loading data
data = np.genfromtxt('file.txt', dtype = None)

csvfile = "home/paula/Desktop/test.txt"
with open(csvfile, 'w') as output:
    writer = csv.writer(output, delimiter = '\t')

    for row in range(len(data)):
        if data[row][0] == (i for i in names):
            print 'removing the data of', i, '...'
        else:
            writer.writerow([data[row][0], data[row][1], 
                             data[row][2], data[row][3]])

My code is working, however the data was not deleted from my original data. When I open the new test.txt file, I can see that the data was not deleted.

I am certain that the bug is in if data[row][0] == (i for i in names): How can I fix this?

aloha
  • 4,554
  • 6
  • 32
  • 40

3 Answers3

4

The condition should be written:

if data[row][0] in names:

In your current code, (i for i in names) creates a generator and you are then testing if the string is equal to the generator object, which will be false

>>> (i for i in names)
<generator object <genexpr> at 0x1060564b0>
>>> 'John' == (i for i in names)
False
>>>

Instead, you can test if an item is in a list as follows

>>> names = ['John', 'Amy', 'Daniel']
>>> 'John' in names
True
>>> 'Bob' in names
False
>>>

As mentioned in the comments, you can make this check more efficient by converting names to a set before iterating over the rows. But ideally you would use the Pandas library to manipulate csv/table data. See this answer for a similar example. You can negate the condition with df[~df.Name.isin(...)].

Community
  • 1
  • 1
YXD
  • 31,741
  • 15
  • 75
  • 115
  • making names a set would be a lot more efficient – Padraic Cunningham Apr 18 '15 at 20:36
  • 2
    It would, but I wanted to explain the issues with the current method as concisely as possible. I added a link to Pandas, which will be more efficient than any hand-crafted code using `set`. I'll add a few words about it. – YXD Apr 18 '15 at 20:38
  • I would appreciate it if there is a better way to write the code. Especially that I am not satisfied by the last line: `writer.writerow([ ... ])`. As I've said above, my data is made up of 20 columns, so in the `writer.writerow` I had to write 20 columns!! Thanks a lot! – aloha Apr 18 '15 at 20:52
0

You're checking whether data[row][0] is the same as (i for i in names). What you want to do is check whether it's the same as one of the elements of (i for i in names). You could do that this way:

any([data[row][0]==i for i in names])

You could also do it the non-ridiculous way, with the in operator:

data[row][0] in names

This checks whether any of the elements of names is the same as data[row][0].

KSFT
  • 1,774
  • 11
  • 17
0
if data[row][0] == (i for i in names):
            print 'removing the data of', i, '...'

in that portion i is use in (i for i in names) as a local veriable. But in next print line you use i. Here you can not use this.

you can use for check as if data[row][0] in names:. You can try like:

if data[row][0] ==  names:
            print 'removing the data of', data[row][0], '...'
Sakib Ahammed
  • 2,452
  • 2
  • 25
  • 29