-2

I need to capture words separated by tabs as illustrated in the image below.

enter image description here

The expression (.*?)[\t|\n] works well, except for the last line where a line feed is missing. Can anyone suggest a modification of the regular expression to also match the last word, i.e. Cheyenne? Link to code example

John
  • 1,645
  • 2
  • 17
  • 29
  • Can't use the CSV module, The data that I show in the question is constructed just to illustrate the problem. The real data set I'm working with is very different and needs a lot of cleaning. – John Apr 13 '19 at 21:09

1 Answers1

0

Replace [\t|\n] with (\t|$).

BTW, [\t|\n] is a character class, so the pipe | is literal here. You probably meant [\t\n].

wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • OP does not want `[\t\n]`, but `[^\n\t]+` – Wiktor Stribiżew Apr 13 '19 at 20:36
  • See https://regex101.com/r/K8L9rw/1 – Wiktor Stribiżew Apr 13 '19 at 20:38
  • @Wiktor Yeah, that's a better solution. Though just to be clear, I'm not saying to use `[\t\n]`, just mentioning the common mistake of putting a pipe in a character class. I'm actually recommending `(\t|$)`. – wjandrea Apr 13 '19 at 20:44
  • Changing to `(\t|$)` solves the problem as outlined in the question. But let's say I want to match a repeating pattern of cities, as shown here (https://regex101.com/r/D8DIkB/3). Is that possible? – John Apr 14 '19 at 06:06
  • @John That's a different question, so you could [ask a new question](https://stackoverflow.com/questions/ask) for that. – wjandrea Apr 14 '19 at 17:42