inicio mail me! sindicaci;ón

Regular expressions tester for your regex fiddling

I was teaching a good friend a bit about regular expressions and wrote up a regular expression tester. It was a blast to write and simple to implement. It’s all in one HTML page so you can download it and fiddle away. Hope it helps someone else as well:

http://www.zenovations.com/blog/misc/regex.html

Find previous occurrence of string using PHP’s strrev() and preg_match()

Today I wrote a class to iterate words in a string. One challenge was finding my way backwards in a string. Specifically, given a starting position inside the string, I wanted to find the previous “word” and return it. However, since this needs to work localized (not just a-z), and the definition of a “word” is configurable, it was no simple matter of looking back for the previous space character.

So here is what I came up with; a method that finds the next or previous word given a starting position in the string:

   /**
    * Abstracted method for finding the next/prev word. This method assumes that 
    * $pos is greater than zero and less than the length of $text (check before calling)
    *
    * @param string $text the string of text to find next/prev word in
    * @param int $pos the position of first character in current word
    * @param string $wordPattern the regex definition of a word without any matching parens
    * @param string $reverse looks backward instead of forward (finds last word in string)
    * @return mixed false if no more words or array( "the word matched with junk", "the word only")
    */
   private static function nextWordMatch($text, $pos, $wordPattern, $reverse = false) {
      // we get the substring of text, starting at the current position
      if( $reverse ) {
         // in this case, we look at everything before $pos; we reverse it so that
         // we can run a simple regex on it rather than trying to deal with craziness
         // of looking backwards in string
         $text = substr($text, 0, $pos-1);
      }
      else {
         // in this case, we look at everything after $pos
         $text = substr($text, $pos);
      }
 
      // we escape the preg character just in case
      // we add in two sets of match parens, one for the word and one for the whole match
      // when looking backwards, we need to look from the end rather than the start
      $wordPattern = str_replace('@', '\\@', $wordPattern);
      $pattern = "(({$wordPattern})".self::NON_WORD_CHARS.")";
      if( $reverse ) { $pattern = "@{$pattern}\$@"; }
      else { $pattern = "@^{$pattern}@"; }
 
      // perform the match now and figure out what to do with it
      preg_match($pattern, $text, $matches);
      if( count($matches) < 3 ) { // remember that the first match is the raw text, so we add one
         // we didn't find any words, so return false
         return false;
      }
 
      // strip off the raw text, leaving our two matches
      return array_slice($matches, 1);
   }

Here is the default value for $wordPattern and the constant NON_WORD_CHARS used in the example:

   private $wordPattern = '\b[\w]+(?:[-\']\w+)*\b';
   const NON_WORD_CHARS = '\W*';