3

I have a string in Perl: 'CCCCCCCC^hC^iC^*C^"C^8A'.

I want to split this string using a regular expression: "^[any_character]C". In other words, I want to split it by the actual character ^, followed by any character, followed by a specific letter (in this case C, but it could be A, or any other character).

I have tried looking at other questions/posts and finally came up with my @split_str = split(/\^(\.)C/, $letters), but this seems not to be working.

I'm sure I'm doing something wrong, but I don't know what.

Ruslan Osmanov
  • 20,486
  • 7
  • 46
  • 60
user3799576
  • 183
  • 1
  • 2
  • 10
  • 6
    Try `/\^.C/s`.. – Wiktor Stribiżew Oct 19 '16 at 10:08
  • As a follow up, I'm trying to figure out the most efficient way to count the number of ^.C (or whatever letter) my string. If my string was about 1000 characters long, would it be faster to split, and deduct one from the length of the array, or would it be faster to simply count the number of regular expressions in the string? – user3799576 Oct 19 '16 at 10:14
  • See [*Is there a Perl shortcut to count the number of matches in a string?*](http://stackoverflow.com/questions/1849329/is-there-a-perl-shortcut-to-count-the-number-of-matches-in-a-string). – Wiktor Stribiżew Oct 19 '16 at 10:15
  • So in python, I'm using the same regular expression and it's not working. I'm importing re, and with the same string do re.findall("\^.C/", string), and it's not working. Is it a different regex for python (i didn't think it was). – user3799576 Oct 19 '16 at 19:15
  • If you have a new question, please post it as a new question. – Dave Cross Oct 20 '16 at 07:30
  • What about `perldoc split` and `man perlre`? – U. Windl Aug 09 '21 at 08:55

4 Answers4

6

You were very close. There were just a couple of errors in your code. Before I explain them, here's the code I was using to test solutions.

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

use Data::Dumper;

$_ = 'CCCCCCCC^hC^iC^*C^"C^8A';

my @data = split /\^(\.)C/;

say Dumper @data;

Running this with your original regex, we get this output:

$VAR1 = 'CCCCCCCC^hC^iC^*C^"C^8A';

No splitting has taken place at all. That's because your regex includes \.. The dot matches any character in a string, but by escaping it with the backslash you have told Perl to treat it as an ordinary dot. There are no dots in your string, so the regex doesn't match and the string is not split.

If we remove the backslash, we get this output:

$VAR1 = 'CCCCCCCC';
$VAR2 = 'h';
$VAR3 = '';
$VAR4 = 'i';
$VAR5 = '';
$VAR6 = '*';
$VAR7 = '';
$VAR8 = '"';
$VAR9 = '^8A';

This is better. Some splitting has taken place. But because we have parentheses around the dot ((.)), Perl has "captured" the characters that the dot matches and added them to the list of values that split() returns.

If we remove those parentheses, we get only the values between the split markers.

$VAR1 = 'CCCCCCCC';
$VAR2 = '';
$VAR3 = '';
$VAR4 = '';
$VAR5 = '^8A';

Note that we get a few empty elements. That's because in places like "^hC^iC" in your string, there is no data between two adjacent split markers.

By moving the parentheses around the whole of the regex (split /(\^.C)/), we can get a list which includes all of the split markers together with the data between them.

$VAR1 = 'CCCCCCCC';
$VAR2 = '^hC';
$VAR3 = '';
$VAR4 = '^iC';
$VAR5 = '';
$VAR6 = '^*C';
$VAR7 = '';
$VAR8 = '^"C';
$VAR9 = '^8A';

Which of these options is most useful to you depends on exactly what you're trying to do.

Dave Cross
  • 68,119
  • 3
  • 51
  • 97
5

When you say [any_character], you must mean . pattern, a dot matches any char but linebreaks symbols, and if you use an s modifier, it will match any char.

So, in your case, you just should not have escape the dot:

@split_str = split /\^.C/, $letters;
                      ^

Or, with an s modifier:

@split_str = split /\^.C/s, $letters;
                         ^

The caret should be escaped to denote a literal caret symbol in a regex pattern.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

There was a question regarding Counting and not Spliting. Could be done using the regex substitution and global s//g for counting, and scalar return (the $_ contains the modified text):

my $text = 'CCCCCCCC^hC^iC^*C^"C^8C^9A^!B'; #litte longer than yours
$_ = $text ;
my $countanychar = s/\^.C//g ;
print  "counting any char and C:\t $countanychar in $text\n";

$_ = $text ;
my $countnormalchar = s/\^\wC//g ; # h and i and 8  in this example avoid the * and "
print  "counting normal char and C:\t $countnormalchar in $text\n";

$_ = $text ;
my $countnumber = s/\^\dC//g ;# the 8 in this example
print  "counting number and C:\t $countnumber in $text\n";

$_ = $text ;
my $countextended = s/\^.\w//g ;# the he C and the A
print  "counting extended C and A and B:\t $countextended in $text\n";
-4

try like this @split_str = split(/\^/, $letters)

  • 4
    " I want to split it by the actual character ^, followed by any character, followed by a specific letter (in this case C, but it could be A, or any other character)." Your answer doesn't do that. – Dave Cross Oct 19 '16 at 10:51