1

I have a file such that each line consists of two strings separated by variable space, like below:

"Doe, Mary" "W 135"

How can this be parsed into pairs of strings, ["Doe, Mary", "W 135"]?

Pippi
  • 2,451
  • 8
  • 39
  • 59
  • possible duplicate of [Split string into a list in Python](http://stackoverflow.com/questions/743806/split-string-into-a-list-in-python) – sashkello Sep 25 '13 at 01:23
  • 1
    @sashkello: That won't help for this problem, because `str.split` won't distinguish between spaces within the quotes and spaces between the quoted strings. – abarnert Sep 25 '13 at 01:27
  • It will if split by (" "). – sashkello Sep 25 '13 at 01:28
  • Where did the data in this format come from? It's a lot easier to write the parsing code when you have the generating code—or, even better, when you can _change_ the generating code to produce something trivial to parse (like a standard CSV file, or JSON, or whatever). – abarnert Sep 25 '13 at 01:28
  • @sashkello: No it won't. Try it. That will strip off the quotes, except for the very first and last ones, and it won't work for variable space, only a single space, and… – abarnert Sep 25 '13 at 01:29
  • @abarnert This does almost solve the problem - stripping quotes is trivial after that. As well as dealing with single-double space (which is not a part of a problem anyway). I'm not claiming it's the best solution, I just think it's the most obvious way. – sashkello Sep 25 '13 at 01:32
  • 2
    @abarnert I agree that the `csv` module is a better solution, but stripping off the quotes is not a problem. In fact, `[s for s in '"Doe, Mary" "W 135"'.split('"') if s.strip()]` ought to work unless one of the lines is like `"foo" " "\n` – kojiro Sep 25 '13 at 01:33
  • @sashkello: Who says the single-double space isn't part of the problem? The OP explicitly says "separated by variable space". – abarnert Sep 25 '13 at 01:36
  • @kojiro: That just gives you a list of all the non-space characters. Clearly you wanted a split` in there somewhere, but I'm not sure where. – abarnert Sep 25 '13 at 01:37
  • @abarnert oops, that's what I get for using illegible variable names in my prototypes. – kojiro Sep 25 '13 at 01:38
  • @abarnert Sorry, didn't see it. Can be dealt with easily though. csv reader is of course the solution here which doesn't mean it can't be done with split. – sashkello Sep 25 '13 at 01:38
  • 1
    @sashkello: No, it can't be done easily. `str.split` can't split on variable-length patterns (except for the special case of "any range of whitespace"). And there's now way it can distinguish quotes unless they're part of the split pattern. So, the only way you could _possibly_ do it with `str.split` is to first split on words, then group by quotes, then re-join each group, which is far from easy. (@sashkello: Of course `re.split` is another story—it can split on variable-length patterns, which solves the problem with `str.split` immediately.) – abarnert Sep 25 '13 at 01:41
  • 1
    Also, the fact that two pretty clever people went through a number of different attempts that they were confident would work, but they didn't, kind of proves that it's not easy… – abarnert Sep 25 '13 at 01:44
  • @abarnert Again, done with split (and join): http://stackoverflow.com/questions/1546226/the-shortest-way-to-remove-multiple-spaces-in-a-string-in-python It is easy not because it is quick - there are three steps to go through here (remove multiple spaces, split, remove quotes), each of these steps is easy, but yes altogether it is a pretty long one-liner. – sashkello Sep 25 '13 at 01:54
  • 1
    @sashkello: Again, not done. Try it on `"Doe Two-Space Mary" "W 135"` and see what you get. Are you deliberately trying to prove my point here? – abarnert Sep 25 '13 at 02:02
  • 1
    @abarnert I hope you're not counting me in there. A) I wasn't all that confident. B) My mistake was not in the code itself, but in how I copied it from terminal to browser. C) You forgot a comma between "pretty" and "clever". – kojiro Sep 25 '13 at 03:13
  • I like this discussion, hahaha. Made my day. – justhalf Sep 25 '13 at 03:36
  • @kojiro: Well, if your avatar is a photo, you certainly have symmetrical features, but I don't know if that's enough on its own to call you pretty. – abarnert Sep 25 '13 at 17:37
  • @sashkello: It looks like SO removes multiple spaces in comments even inside back ticks. There's supposed to be two spaces on each side of `Two-Space`. But that brings up another way to remove spaces in Python: use the SO API to post the string as a comment. :) – abarnert Sep 25 '13 at 17:38
  • @abarnert Yeah, I get it :) Seems easy - post comment through python and then parse the webpage to get it back. Piece of cake. Don't be weak! – sashkello Sep 25 '13 at 22:48

1 Answers1

2
with open('file.txt') as f:
    pairs = csv.reader(f, delimiter=' ', skipinitialspace=True)

Now you can make a list of pairs, iterate over it in a for loop, whatever.

Caleb Hattingh
  • 9,005
  • 2
  • 31
  • 44
abarnert
  • 354,177
  • 51
  • 601
  • 671