0

I am trying to scrap some data from a website, and below is a long string that I have managed to get.

var playerlist=["Roger Federer", "Rainer Schuettler", "Dominik Hrbaty", "Thomas Muster", "Andy Roddick", "Nikolay Davydenko", "Tommy Haas", "Jarkko Nieminen", "Arnaud Clement", "Ivan Ljubicic", "David Ferrer", "Nicolas Massu", "Tommy Robredo", "Lleyton Hewitt", "Filippo Volandri", "Olivier Rochus", "Kevin Kim", "Juan Ignacio Chela", "Juan Carlos Ferrero", "Jimmy Connors", "Mikhail Youzhny", "Ruben Ramirez Hidalgo", "Rafael Nadal"]

Above is not a javascript list, it is a String.

I want to create a list of all player names from this string. So I have to extract all the substrings between " " and add it to a list. Alternatively if I can somehow convert this string as it is to a list or an array, it would be great.

Can someone suggest how can we do this in python?

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
user2709885
  • 413
  • 2
  • 8
  • 16
  • 2
    @ZdaR They want to extract the names from the string, ie "Roger Federer" etc. – Loocid Jun 05 '15 at 02:15
  • 1
    possible duplicate of [Extract string from between quotations](http://stackoverflow.com/questions/2076343/extract-string-from-between-quotations) – Andy Jun 05 '15 at 02:17

1 Answers1

1

You can use ast.literal_eval

>>> s = 'var playerlist=["Roger Federer", "Rainer Schuettler", "Dominik Hrbaty", "Thomas Muster", "Andy Roddick", "Nikolay Davydenko", "Tommy Haas", "Jarkko Nieminen", "Arnaud Clement", "Ivan Ljubicic", "David Ferrer", "Nicolas Massu", "Tommy Robredo", "Lleyton Hewitt", "Filippo Volandri", "Olivier Rochus", "Kevin Kim", "Juan Ignacio Chela", "Juan Carlos Ferrero", "Jimmy Connors", "Mikhail Youzhny", "Ruben Ramirez Hidalgo", "Rafael Nadal"]'
>>> import ast
>>> start = s.index('[')
>>> ast.literal_eval(s[start:])
['Roger Federer', 'Rainer Schuettler', 'Dominik Hrbaty', 'Thomas Muster', 'Andy Roddick', 'Nikolay Davydenko', 'Tommy Haas', 'Jarkko Nieminen', 'Arnaud Clement', 'Ivan Ljubicic', 'David Ferrer', 'Nicolas Massu', 'Tommy Robredo', 'Lleyton Hewitt', 'Filippo Volandri', 'Olivier Rochus', 'Kevin Kim', 'Juan Ignacio Chela', 'Juan Carlos Ferrero', 'Jimmy Connors', 'Mikhail Youzhny', 'Ruben Ramirez Hidalgo', 'Rafael Nadal']

As Steve mentions below in the comments, it is better to use, json.loads

>>> import json
>>> json.loads(s[start:])
[u'Roger Federer', u'Rainer Schuettler', u'Dominik Hrbaty', u'Thomas Muster', u'Andy Roddick', u'Nikolay Davydenko', u'Tommy Haas', u'Jarkko Nieminen', u'Arnaud Clement', u'Ivan Ljubicic', u'David Ferrer', u'Nicolas Massu', u'Tommy Robredo', u'Lleyton Hewitt', u'Filippo Volandri', u'Olivier Rochus', u'Kevin Kim', u'Juan Ignacio Chela', u'Juan Carlos Ferrero', u'Jimmy Connors', u'Mikhail Youzhny', u'Ruben Ramirez Hidalgo', u'Rafael Nadal']
Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
  • 1
    Oh, and since it's scraped out of some Javascript, `json.loads` might be a better approximation to a parser than `ast.literal_eval`. Ofc they both work for the example given. – Steve Jessop Jun 05 '15 at 02:29
  • @Steve Job 2 done Sir! Please report other impending operations. ;) – Bhargav Rao Jun 05 '15 at 02:41