-1

I have the following sample text:

mystr = r'''\documentclass[12pt]{article}
\usepackage{amsmath}
\title{\LaTeX}
\begin{document}
\section{Introduction}
This is introduction paragraph
\section{Non-Introduction}
This is non-introduction paragraph
\section{Sample section}
This is sample section paragraph
\begin{itemize}
  \item Item 1
  \item Item 2
\end{itemize}
\end{document}'''

What I'm trying to accomplish is to create a regex expression which will extract the following lines from mystr:

['This is introduction paragraph','This is non-introduction paragraph','    This is sample section paragraph\n \begin{itemize}\n\item Item 1\n\item Item 2\n\end{itemize}']
  • 4
    that what `split()` does. Why does it have to be a regex ? – Ma0 Oct 28 '16 at 11:47
  • Your example does not illustrate the question. "quick elephant" does not have an occurance of the word "a" after it. – roarsneer Oct 28 '16 at 11:47
  • http://stackoverflow.com/questions/743806/split-string-into-a-list-in-python has more detailed description, but the answer above is correct... – A. N. Other Oct 28 '16 at 11:50
  • Good job on the edit; this helps people understand what you're after. Have you attempted anything thus far to tackle this problem? It might help posting it if you have. – Dimitris Fasarakis Hilliard Oct 28 '16 at 12:07

2 Answers2

2

For any reason you need to use regex. Perhaps the splitting string is more involved than just "a". The re module has a split function too:

import re
str_ = "a quick brown fox jumps over a lazy dog than a quick elephant"


print(re.split(r'\s?\ba\b\s?',str_))

# ['', 'quick brown fox jumps over', 'lazy dog than', 'quick elephant']

EDIT: expanded answer with the new information you provided...

After your edit in which you write a better description of your problem and you include a text that looks like LaTeX, I think you need to extract those lines that do not start with a \, which are the latex commands. In other words, you need the lines with only text. Try the following, always using regular expressions:

import re

mystr = r'''\documentclass[12pt]{article}
\usepackage{amsmath}
\title{\LaTeX}
\begin{document}
\section{Introduction}
This is introduction paragraph
\section{Non-Introduction}
This is non-introduction paragraph
\section{Sample section}
This is sample section paragraph
\end{document}'''

pattern = r"^[^\\]*\n"


matches = re.findall(pattern, mystr, flags=re.M)

print(matches)

# ['This is introduction paragraph\n', 'This is non-introduction paragraph\n', 'This is sample section paragraph\n']
chapelo
  • 2,519
  • 13
  • 19
0

You can use the split method from str:

my_string = "a quick brown fox jumps over a lazy dog than a quick elephant"
word = "a "
my_string.split(word)

Results in:

['', 'quick brown fox jumps over ', 'lazy dog than ', 'quick elephant']

Note: Don't use str as a variable name as it is a keyword in Python.

José Sánchez
  • 1,126
  • 2
  • 11
  • 20
  • str is NOT a keyword in python. Its just a build in class. So technically there is no issue in using a str keyword – Rebhu Johymalyo Josh Oct 28 '16 at 11:58
  • 1
    @RebhuJohymalyoJosh while you are right that it is not a keyword you are wrong in stating that there's no issue in using it. Using `str` as a name for your variable will mask the built-in `str` and might lead to unexpected issues in the long run. In short, *avoid using it*. – Dimitris Fasarakis Hilliard Oct 28 '16 at 12:03
  • @Jose Sanchez: Your solution would give strange results, if you feed it with a word, that contains an "a" at the end, for example "a quick brown fox jumps over a lazy lama than a quick elephant". You could use " a " to split, if you would use " "+my_string – am2 Oct 28 '16 at 12:21