1

I'm new to nltk and trying to extract PERSON, ORGANIZATION, GPE from the the following code:

for i in tokcomp:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
namedEnt = nltk.ne_chunk(tagged, binary=False)
print(namedEnt)

The output i got is :

(S
  Our/PRP$
  direct/JJ
  competitors/NNS
  include/VBP
  ,/,
  among/IN
  others/NNS
  ,/,
  (PERSON Accenture/NNP)
  ,/,
  (GPE Capgemini/NNP)
  ,/,
  (ORGANIZATION Computer/NNP Sciences/NNPS Corporation/NNP)
  ,/,
  (GPE Genpact/NNP)
  ,/,
  (ORGANIZATION HCL/NNP Technologies/NNPS)
  ,/,
  (ORGANIZATION HP/NNP Enterprise/NNP)
  ,/,
  (ORGANIZATION IBM/NNP Global/NNP Services/NNPS)
  ,/,
  (ORGANIZATION Infosys/NNP Technologies/NNPS)
  ,/,
  (PERSON Tata/NNP Consultancy/NNP Services/NNPS)
  and/CC
  (PERSON Wipro/NNP)
  ./.)
(S
  These/DT
  markets/NNS
  also/RB
  include/VBP
  numerous/JJ
  smaller/JJR
  local/JJ
  competitors/NNS
  in/IN
  the/DT
  various/JJ
  geographic/JJ
  markets/NNS
  in/IN
  which/WDT
  we/PRP
  operate/VBP
  which/WDT
  may/MD
  be/VB
  able/JJ
  to/TO
  provide/VB
  services/NNS
  and/CC
  solutions/NNS
  at/IN
  lower/JJR
  costs/NNS
  or/CC
  on/IN
  terms/NNS
  more/RBR
  attractive/JJ
  to/TO
  clients/NNS
  than/IN
  we/PRP
  can/MD
  ./.)
(S
  Our/PRP$
  direct/JJ
  competitors/NNS
  include/VBP
  ,/,
  among/IN
  others/NNS
  ,/,
  (PERSON Accenture/NNP)
  ,/,
  (GPE Capgemini/NNP)
  ,/,
  (ORGANIZATION Computer/NNP Sciences/NNPS Corporation/NNP)
  ,/,
  (GPE Genpact/NNP)
  ,/,
  (ORGANIZATION HCL/NNP Technologies/NNPS)
  ,/,
  (ORGANIZATION HP/NNP Enterprise/NNP)
  ,/,
  (ORGANIZATION IBM/NNP Global/NNP Services/NNPS)
  ,/,
  (ORGANIZATION Infosys/NNP Technologies/NNPS)
  ,/,
  (PERSON Tata/NNP Consultancy/NNP Services/NNPS)
  and/CC
  (PERSON Wipro/NNP)
  ./.)
(S
  The/DT
  rates/NNS
  we/PRP
  are/VBP
  able/JJ
  to/TO
  recover/VB
  for/IN
  our/PRP$
  services/NNS
  are/VBP
  affected/VBN
  by/IN
  a/DT
  number/NN
  of/IN
  factors/NNS
  ,/,
  including/VBG
  :/:
  •/VB
  our/PRP$
  clients’/JJ
  perceptions/NNS
  of/IN
  our/PRP$
  ability/NN
  to/TO
  add/VB
  value/NN
  through/IN
  our/PRP$
  services/NNS
  ;/:
  •/NNP
  introduction/NN
  of/IN
  new/JJ
  services/NNS
  or/CC
  products/NNS
  by/IN
  us/PRP
  or/CC
  our/PRP$
  competitors/NNS
  ;/:
  •/VB
  our/PRP$
  competitors’/NN
  pricing/NN
  policies/NNS
  ;/:
  •/VB
  our/PRP$
  ability/NN
  to/TO
  accurately/RB
  estimate/VB
  ,/,
  attain/NN
  and/CC
  sustain/NN
  contract/NN
  revenues/NNS
  ,/,
  margins/NNS
  and/CC
  cash/NN
  flows/NNS
  over/IN
  increasingly/RB
  longer/JJR
  contract/NN
  periods/NNS
  ;/:
  •/NNP
  bid/NN
  practices/NNS
  of/IN
  clients/NNS
  and/CC
  their/PRP$
  use/NN
  of/IN
  third-party/JJ
  advisors/NNS
  ;/:
  •/VB
  the/DT
  use/NN
  by/IN
  our/PRP$
  competitors/NNS
  and/CC
  our/PRP$
  clients/NNS
  of/IN
  offshore/JJ
  resources/NNS
  to/TO
  provide/VB
  lower-cost/JJ
  service/NN
  delivery/NN
  capabilities/NNS
  ;/:
  •/VB
  our/PRP$
  ability/NN
  to/TO
  charge/VB
  premium/NN
  prices/NNS
  when/WRB
  justified/VBN
  by/IN
  market/NN
  demand/NN
  or/CC
  the/DT
  type/NN
  of/IN
  service/NN
  ;/:
  and/CC
  •/VB
  general/JJ
  economic/JJ
  and/CC
  political/JJ
  conditions/NNS
  ./.)
(S
  For/IN
  our/PRP$
  internal/JJ
  management/NN
  reporting/NN
  and/CC
  budgeting/NN
  purposes/NNS
  ,/,
  we/PRP
  use/VBP
  non-GAAP/JJ
  financial/JJ
  information/NN
  that/WDT
  does/VBZ
  not/RB
  include/VB
  stock-based/JJ
  compensation/NN
  expense/NN
  ,/,
  acquisition-related/JJ
  charges/NNS
  and/CC
  net/JJ
  non-operating/JJ
  foreign/JJ
  currency/NN
  exchange/NN
  gains/NNS
  or/CC
  losses/NNS
  for/IN
  financial/JJ
  and/CC
  operational/JJ
  decision/NN
  making/NN
  ,/,
  to/TO
  evaluate/VB
  period-to-period/JJ
  comparisons/NNS
  and/CC
  for/IN
  making/VBG
  comparisons/NNS
  of/IN
  our/PRP$
  operating/NN
  results/NNS
  to/TO
  those/DT
  of/IN
  our/PRP$
  competitors/NNS
  ./.)

I went through many links but didn't find a way which fits my purpose to extract the companies which are tagged as Person, Organization and GPE.

Will be very thankful if any links to learn more about extracting named entities other than nltk website are provided.

Dima Vasiluk
  • 177
  • 1
  • 2
  • 13

1 Answers1

0

Applied the code from this link and able to get the Named Entities from the above results. Used nltk.ne_chunk_sents() function instead of nltk.ne_chunk.