January 6, 2012 at 1:48 pm · Filed under Programming
I was teaching a good friend a bit about regular expressions and wrote up a regular expression tester. It was a blast to write and simple to implement. It’s all in one HTML page so you can download it and fiddle away. Hope it helps someone else as well:
http://www.zenovations.com/blog/misc/regex.html
January 10, 2010 at 5:59 pm · Filed under Programming
Today I wrote a class to iterate words in a string. One challenge was finding my way backwards in a string. Specifically, given a starting position inside the string, I wanted to find the previous “word” and return it. However, since this needs to work localized (not just a-z), and the definition of a “word” is configurable, it was no simple matter of looking back for the previous space character.
So here is what I came up with; a method that finds the next or previous word given a starting position in the string:
/**
* Abstracted method for finding the next/prev word. This method assumes that
* $pos is greater than zero and less than the length of $text (check before calling)
*
* @param string $text the string of text to find next/prev word in
* @param int $pos the position of first character in current word
* @param string $wordPattern the regex definition of a word without any matching parens
* @param string $reverse looks backward instead of forward (finds last word in string)
* @return mixed false if no more words or array( "the word matched with junk", "the word only")
*/
private static function nextWordMatch($text, $pos, $wordPattern, $reverse = false) {
// we get the substring of text, starting at the current position
if( $reverse ) {
// in this case, we look at everything before $pos; we reverse it so that
// we can run a simple regex on it rather than trying to deal with craziness
// of looking backwards in string
$text = substr($text, 0, $pos-1);
}
else {
// in this case, we look at everything after $pos
$text = substr($text, $pos);
}
// we escape the preg character just in case
// we add in two sets of match parens, one for the word and one for the whole match
// when looking backwards, we need to look from the end rather than the start
$wordPattern = str_replace('@', '\\@', $wordPattern);
$pattern = "(({$wordPattern})".self::NON_WORD_CHARS.")";
if( $reverse ) { $pattern = "@{$pattern}\$@"; }
else { $pattern = "@^{$pattern}@"; }
// perform the match now and figure out what to do with it
preg_match($pattern, $text, $matches);
if( count($matches) < 3 ) { // remember that the first match is the raw text, so we add one
// we didn't find any words, so return false
return false;
}
// strip off the raw text, leaving our two matches
return array_slice($matches, 1);
} |
Here is the default value for $wordPattern and the constant NON_WORD_CHARS used in the example:
private $wordPattern = '\b[\w]+(?:[-\']\w+)*\b';
const NON_WORD_CHARS = '\W*'; |