-1

I'm trying to verify two first & two lasts characters of a line in a file with regex.

I've try this and many others things but it's not working. How can I do it?

regex = r"^[.B]?{2}"
regexEnd = r"[);]?{2}$"
regexC = re.compile(regex)
regexC1 = re.compile(regexEnd)

for filename in os.listdir(path1):
    f = gzip.open(path1 + filename, "rb")
    for line in f:
        if regexC.search(line) is not None & regexC1.search(line is 
        not None):
            file = open("db.txt", "w")
            file.write(line)

Thanks in advance guys :)

pierreafranck
  • 445
  • 5
  • 19
  • What are trying to accomplish? – Stephen Rauch Mar 30 '18 at 16:28
  • 10
    Why use regex? Index them directly: `line[:2]` and `line[-2:]`. – Prune Mar 30 '18 at 16:28
  • 1
    What are you trying to verify? Are you just trying to get the first and last 2 characters? Or are you trying to check that they match something? If so, what is the pattern you're trying to match? – divibisan Mar 30 '18 at 16:28
  • 1
    I've to match this pattern ".B something blablabla );" @divibisan – pierreafranck Mar 30 '18 at 16:29
  • 4
    Why not simply use `line.startWith("FirstTwoCharacter") and line.endsWith("LastTwoCharactr")` – Rehan Azher Mar 30 '18 at 16:30
  • How works startswith & endswith? i can't specify characters with those methods right? – pierreafranck Mar 30 '18 at 16:33
  • @Prune can you explain it? i'm not very good in python i'm just doing a script for my C program so... – pierreafranck Mar 30 '18 at 16:34
  • @pierreafranck: Search for "Python string slice"; you will get many hits at least as good as my writing Yet Another Explanation. Stack Overflow doesn't need more tutorials. – Prune Mar 30 '18 at 16:37
  • 1
    Look into list slices in python. Then use `line[:2]` to get the first two characters (as a string) and `line[-2:]` to get the last two, and check them individually in your if statement: `if line[:2] == '.B' and line[-2:] == ');':` – scnerd Mar 30 '18 at 16:37
  • @pierreafranck in python strings are essentially lists of characters. You can slice these lists (extract a sublist between certain indices) using square brackets. line[:2] extracts a substring up to the second index (first two chars) and line[-2:] extracts a substring starting from the second last index (last two chars). If a regex solution is mandatory for some reason, you may check my answer below. – Nablezen Mar 30 '18 at 16:38
  • @pierreafranck If you are writing a `C` program, why is the question tagged with `Python`? – Wiktor Stribiżew Mar 30 '18 at 16:38

2 Answers2

0

While I agree that it might be preferrable to use indexing in simple cases, here is a regex solution that matches the first two and last two chars before a newline. Note: simple indexing will not directly cover multiline cases where a string contains newlines intermediately, which appears not to be the case for this particular question, but might be relevant for future reference.

from re import compile as re_compile, match, MULTILINE

text = "test\nwell"
regex = re_compile("^(?P<first>..).*(?P<last>..)$", MULTILINE)

print(match(regex, text))
print(match(regex, text).group("first"))
print(match(regex, text).group("last"))
Nablezen
  • 362
  • 2
  • 10
  • Slicing will work because Python doesn't fail on too-large slices: `'\n'[:2] == '\n'` without any error, so slice-based approaches would still work here – scnerd Mar 30 '18 at 16:39
  • Woops, deleted my comment after you deleted yours. Well, technically the slicing approach would work on multi-line strings as well. Using regex, switching on and off the multi-line flag, would be equivalent to using slicing, switching whether or not you split the string on newlines first. Both methods support both cases with the proper tweaks. – scnerd Mar 30 '18 at 16:45
  • Yes, of course. More than one way to skin a cat. However, in the particular case of multiline strings, I have more frequently seen regex than splitting and slicing -- maybe because multiline strings usually mean there is more (possibly nontrivial) text processing going on that usually requires regex usage either way. – Nablezen Mar 30 '18 at 16:48
0

So you technically can do this with regex, but it isn't advised since you're just checking to see if two characters are equal to something.

If you want to use regex:

pattern = r"^\.B.*\);"
regex = re.compile(pattern)

for filename in os.listdir(path1):
    f = gzip.open(path1 + filename, "rb")
    for line in f:
        if regex.match(line):
            file = open("db.txt", "w")
            file.write(line)

You don't need to actually have two different regular expressions, you can just see if you start with .B, followed by whatever and then ending with );.

The other thing to do is just avoid regular expressions all together if you're not comfortable with them and do something like this instead

for filename in os.listdir(path1):
    f = gzip.open(path1 + filename, "rb")
    for line in f:
        if line[:2] == ".B" and line[-2:] == ");"
            file = open("db.txt", "w")
            file.write(line)

This creates a string slice to compare against directly. It basically says line[:2] take all the characters in line up to, but not including the 2nd index and see if that is equal to ".B". Then line[-2:] take the last two characters of line and compare them to see if they're equal to ");"

Chrispresso
  • 3,660
  • 2
  • 19
  • 31