Немного тестов:
timethese (10_000_000,{
   \'A\' => q|while ($data =~ m ~~g) { ++$open; }|,
   \'B\' => q|while ($data =~ m ~~go) { ++$open; }|,
   \'C\' => q|$count = ($data =~ m ~~g)|,
   \'D\' => q|$count = ($data =~ m ~~go)|});
Benchmark: timing 10000000 iterations of A, B, C, D...
         A:  5 wallclock secs ( 6.10 usr + -0.01 sys =  6.09 CPU) @ 1642305.80/s
 (n=10000000)
         B:  6 wallclock secs ( 6.09 usr +  0.00 sys =  6.09 CPU) @ 1642036.12/s
 (n=10000000)
         C:  3 wallclock secs ( 4.61 usr +  0.01 sys =  4.62 CPU) @ 2165908.60/s
 (n=10000000)
         D:  3 wallclock secs ( 4.56 usr +  0.00 sys =  4.56 CPU) @ 2194426.16/s
 (n=10000000)
=))