Find previous occurrence of string using PHP’s strrev() and preg_match()
Today I wrote a class to iterate words in a string. One challenge was finding my way backwards in a string. Specifically, given a starting position inside the string, I wanted to find the previous “word” and return it. However, since this needs to work localized (not just a-z), and the definition of a “word” is configurable, it was no simple matter of looking back for the previous space character.
So here is what I came up with; a method that finds the next or previous word given a starting position in the string:
/** * Abstracted method for finding the next/prev word. This method assumes that * $pos is greater than zero and less than the length of $text (check before calling) * * @param string $text the string of text to find next/prev word in * @param int $pos the position of first character in current word * @param string $wordPattern the regex definition of a word without any matching parens * @param string $reverse looks backward instead of forward (finds last word in string) * @return mixed false if no more words or array( "the word matched with junk", "the word only") */ private static function nextWordMatch($text, $pos, $wordPattern, $reverse = false) { // we get the substring of text, starting at the current position if( $reverse ) { // in this case, we look at everything before $pos; we reverse it so that // we can run a simple regex on it rather than trying to deal with craziness // of looking backwards in string $text = substr($text, 0, $pos-1); } else { // in this case, we look at everything after $pos $text = substr($text, $pos); } // we escape the preg character just in case // we add in two sets of match parens, one for the word and one for the whole match // when looking backwards, we need to look from the end rather than the start $wordPattern = str_replace('@', '\\@', $wordPattern); $pattern = "(({$wordPattern})".self::NON_WORD_CHARS.")"; if( $reverse ) { $pattern = "@{$pattern}\$@"; } else { $pattern = "@^{$pattern}@"; } // perform the match now and figure out what to do with it preg_match($pattern, $text, $matches); if( count($matches) < 3 ) { // remember that the first match is the raw text, so we add one // we didn't find any words, so return false return false; } // strip off the raw text, leaving our two matches return array_slice($matches, 1); } |
Here is the default value for $wordPattern and the constant NON_WORD_CHARS used in the example:
private $wordPattern = '\b[\w]+(?:[-\']\w+)*\b'; const NON_WORD_CHARS = '\W*'; |
