10

I am working with enron email dataset and I am trying to remove email addresses that don't have "@enron.com" (i.e. I would like to have enron emails only). When I tried to delete those addresses without @enron.com, some emails just got skipped for some reasons. A small graph is shown below where vertices are email address. This is gml format:

Creator "igraph version 0.7 Sun Mar 29 20:15:45 2015"
Version 1
graph
[
  directed 1
  node
  [
    id 0
    label "csutter@enron.com"
  ]
  node
  [
    id 1
    label "steve_williams@eogresources.com"
  ]
  node
  [
    id 2
    label "kutner.stephen@enron.com"
  ]
  node
  [
    id 3
    label "igsinc@ix.netcom"
  ]
  node
  [
    id 4
    label "dbn@felesky.com"
  ]
  node
  [
    id 5
    label "cheryltd@tbardranch.com"
  ]
  node
  [
    id 6
    label "slover.eric@enron.com"
  ]
  node
  [
    id 7
    label "alkeister@yahoo.com"
  ]
  node
  [
    id 8
    label "econnors@mail.mainland.cc.tx.us"
  ]
  node
  [
    id 9
    label "jafry@hotmail.com"
  ]
  edge
  [
    source 5
    target 5
    weight 1
  ]
]

My code is:

G = ig.read("enron_email_filtered.gml")
for v in G.vs:
    print v['label']
    if '@enron.com' not in v['label']:
        G.delete_vertices(v.index)
        print 'Deleted'

In this dataset, 7 emails should be deleted. However, based on the above code, only 5 emails are removed.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user1894963
  • 635
  • 3
  • 11
  • 18
  • 1
    I don't think you are allowed to remove vertices when you iterate over `G.vs`. Try collecting them and then remove them all at once. – Jan Katins Mar 29 '15 at 22:28
  • That's correct - modifications to the vertex set while iterating over `G.vs` yields unpredictable results. – Tamás Mar 30 '15 at 15:36

1 Answers1

8

From the tutorial here, you can access all the vertices with a specific property, and then delete them in bulk as follows:

to_delete_ids = [v.index for v in G.vs if '@enron.com' not in v['label']]
G.delete_vertices(to_delete_ids)

Here is the output I got:

to delete ids: [1, 3, 4, 5, 7, 8, 9]
Before deletion: IGRAPH D-W- 10 1 --
+ attr: id (v), label (v), weight (e)
+ edges:
5->5
After deletion: IGRAPH D-W- 3 0 --
+ attr: id (v), label (v), weight (e)
label: csutter@enron.com
label: kutner.stephen@enron.com
label: slover.eric@enron.com
Jey
  • 590
  • 1
  • 5
  • 18
  • 2
    I have a similar problem, but I want to delete the vertices that have no edges. Something like `to delete_ids = [v.index for v in g_groups_all.vs if v HAS NO EDGES]` Any ideas? @Jey @Brian Tompsett? – B Furtado May 06 '16 at 20:26
  • 4
    @B_Furtado use g.vs.find(_degree=0). – M.M Jul 21 '16 at 14:13
  • To get a list of isolates, you can do something like: [v.index for v in g.vs if v.degree() == 0] – Mark Graph Jul 23 '20 at 01:20