inicio mail me! sindicaci;ón

Find previous occurrence of string using PHP’s strrev() and preg_match()

Today I wrote a class to iterate words in a string. One challenge was finding my way backwards in a string. Specifically, given a starting position inside the string, I wanted to find the previous “word” and return it. However, since this needs to work localized (not just a-z), and the definition of a “word” is configurable, it was no simple matter of looking back for the previous space character.

So here is what I came up with; a method that finds the next or previous word given a starting position in the string:

   /**
    * Abstracted method for finding the next/prev word. This method assumes that 
    * $pos is greater than zero and less than the length of $text (check before calling)
    *
    * @param string $text the string of text to find next/prev word in
    * @param int $pos the position of first character in current word
    * @param string $wordPattern the regex definition of a word without any matching parens
    * @param string $reverse looks backward instead of forward (finds last word in string)
    * @return mixed false if no more words or array( "the word matched with junk", "the word only")
    */
   private static function nextWordMatch($text, $pos, $wordPattern, $reverse = false) {
      // we get the substring of text, starting at the current position
      if( $reverse ) {
         // in this case, we look at everything before $pos; we reverse it so that
         // we can run a simple regex on it rather than trying to deal with craziness
         // of looking backwards in string
         $text = substr($text, 0, $pos-1);
      }
      else {
         // in this case, we look at everything after $pos
         $text = substr($text, $pos);
      }
 
      // we escape the preg character just in case
      // we add in two sets of match parens, one for the word and one for the whole match
      // when looking backwards, we need to look from the end rather than the start
      $wordPattern = str_replace('@', '\\@', $wordPattern);
      $pattern = "(({$wordPattern})".self::NON_WORD_CHARS.")";
      if( $reverse ) { $pattern = "@{$pattern}\$@"; }
      else { $pattern = "@^{$pattern}@"; }
 
      // perform the match now and figure out what to do with it
      preg_match($pattern, $text, $matches);
      if( count($matches) < 3 ) { // remember that the first match is the raw text, so we add one
         // we didn't find any words, so return false
         return false;
      }
 
      // strip off the raw text, leaving our two matches
      return array_slice($matches, 1);
   }

Here is the default value for $wordPattern and the constant NON_WORD_CHARS used in the example:

   private $wordPattern = '\b[\w]+(?:[-\']\w+)*\b';
   const NON_WORD_CHARS = '\W*';

Quick and Dirty Database Conversion Tool

I created a quick tool for importing database data and modifying structure and data as it goes. It populates a single table at a time, offers decent flexibility, and allows you to specify your own callback functions to deal with data conversion. It can handle inserts and allows you to specify customized queries if desired. It allows any complexity of joins on the source data. Alas, it only works in mysql because that’s all I needed it for. Use an adapt for any non-evil purpose you like – Creative Commons. Usage:

  1. Open configure.php
  2. Set $fake_inserts = true (for testing)
  3. Probably want to set $max_rows (for testing)
  4. Set up your table conversions using the $tablemap arrays
  5. Call the script to test results: php ./converter.php
  6. When satisfied, set $fake_inserts = false (got a backup??)
  7. Whammo!

Here is the config.php file you’ll need to tweak:

<?
 
/*************************************************************
   CONFIGURABLE PARAMETERS
**********************************************************/
 
/**
 * These can also be configured per-query below, if more granularity is needed
 */
$from_db_connection = array('localhost', 'username', 'password', 'db-instance');
$to_db_connection = array('localhost', 'username', 'password', 'db-instance');
 
/**
 * We either output by printing insert statements to the screen for testing(true),
 * or by running the insert queries directly on the target db(false-default). Set this
 * to false when you're ready to import for real
 */
$fake_inserts = false;
 
/**
 * Max rows allows you to only run a certain number of queries per table, for testing. Leave
 * this set to 0 for the real run
 */
$max_rows = 0;
 
/**
 * If true, script will exist on any failed insert query.
 * If false, will continue to run even if queries fail and insert the other records.
 */
$stop_on_failed_query = false;
 
/**
 * $tablemap is an array containing one entry for each table to be converted.
 * Note that tables don't necessarily have to be 1:1 to make this work, since
 * the source query can use whatever sort of joins needed to set up the source
 * fields. There must, of course, be exactly one output table.
 *
 * <p>Here are the fields for each conversion element:
 * <ul>
 *   <li>connection info - an array containin the source database and target database. It contains two
 *       elements, each is an array containing array( host, username, password, db_instance )
 *   <li>source_query - the query to be run against source db, set it up however you like, the field names should correspond with the 'fields' array's keys
 *   <li>fields - see below
 *   <li>defaults - default values to apply to any fields in target table, overwritten by values in 'fields'
 * </ul>
 *
 * <p>If you just want to create insert statements with the data, use dest_table. If you
 * need custom queries or updates to run, you can specify dest_query. Values for any output fields
 * can be substituted using {field_name} in the query.
 *
 * <p>The 'fields' element is an indexed array. The key is the source table's column name. The value is either a string indicating the name of hte column in the destination table, or an array containing exactly two elements: 1) a function to call, which will be passed the value from source table, 2) the destination column name
 *
 * <p>The function is passed a host of arguments, in order:
 * <ol>
 *  <li>value from originating column
 *  <li>entire data row
 *  <li>original column name
 *  <li>destination column name
 * </ol>
 *
 * <p>To list the same field twice (since the keys would be identical) you can use an array index which will be stripped from the column name before querying, such as "field_id[1]", "field_id[2]", which both refer to a column named "field_id"
 */
$tablemap = array();
 
/**
* This example is a simple, straight copy from one table to another
*/
$tablemap[] = array(
	 "connection_info" => array(
				    "source" => $from_db_connection,
				    "dest" => $to_db_connection
	 ),
	 "source_query" => 'select * from source_table',
	 "dest_table" => "destination_table",
	 'fields' => array(
            'source1' => 'dest1',
            'source2' => 'dest2',
			   'posted'  => array('toSqlDate', 'date'), // convert unix timestamp to sql date
			   'id[1]'   => 'id', // here we set two dest fields from "id", using array index
			   'id[2]'   => array('getParentId', 'parent') // another custom function
	 ),
	 'defaults' => array(
			   'dest3' => '0',
            'dest4' => ''
	 )
);
 
/*
 * This example runs a customized query instead of a normal insert
 */
 
$tablemap[] = array(
	 "connection_info" => array(
          "source" => $from_db_connection,
          "dest" => $to_db_connection
	 ),
	 "source_query" => '
         select 
            count(*) as src_total, category
         from 
            source_table_1 s1, source_table_2 s2
         where
            s1.id = s2.id AND s2.id > 100
         group by
            category',
	 "dest_query" => 'update dest_table set total = {dest_total} where catid = {cat}',
	 'fields' => array(
			   'category' => 'cat',
			   'src_total' => 'dest_total'
	 ),
	 'defaults' => array() // these are still applicable, just didn't need any
);
 
 
/*****************************************************************************
  USER FUNCTIONS (methods used by the 'fields' array to convert values)
***************************************************************************/
 
/**
 * Just an example function that creates a value by combining two cols from source table
 */
function getParentId($src_id, $data_row, $src_col_name, $dest_col_name) {
   if( $src_id > 0 ) {
      // if it exists, use it
      return $src_id;
   }
   // otherwise, create one
   return $data_row["number_1"] + $data_row["number_2"];
}
 
/**
 * Example function to create an incremental id for the result data
 */
function getIncrementalId() {
   global $currentIncId;
   return ++$currentIncId;
}
$currentIncId = 0;
 
/**
 * Converts a unix timestamp to a sql datetime
 */
function toSqlDate($utime) {
  return date("Y-m-d H:i:s", $utime);
}
 
?>

And here is the convert.php script you’ll be running:

<?
/*********************************
 GO TO THIS FILE TO CONFIGURE
 ********************************/
  include("config.php");
 
/*********************************
 YOU WON'T NEED TO CHANGE ANYTHING BELOW HERE
 ********************************/
 
/**
 * Prints out contents and adds html chars if this isn't cli
 */
function printIt($txt) {
  $e = php_sapi_name() == "cli"? "\n" : "<br>\n";
  print $txt.$e;
}
 
/**
 * Generates a list of insertable values, escaped for sql
 */
function make_vals($row) {
  $s = '';
  foreach($row as $k=>$v) {
    if( $s ) { $s .= ", "; }
    $s .= fixMySquirrelVal($v);
  }
  return $s;
}
 
/**
 * Adds a data row to the dest table
 * @param resource $dbin the db connection
 * @param string $table dest table
 * @param array $row indexed array of (string)col => (mixed)value - do not escape these
 */
function addRow($dbin, $table, $row) {
  $vals = make_vals($row);
  $cols = '`'.join('`,`',array_keys($row)).'`';
  $query = "INSERT INTO `$table` ($cols) VALUES($vals)";
  return runMySquirrelQuery($dbin, $query);
}
 
/**
 * Runs a custom query rather than a simple insert
 * @param resource $dbin the db connection
 * @param string $query the custom query, with {field_name} anywhere the field vals will be substituted
 * @param array $row indexed array of (string)col => (mixed)value - do not escape these
 */
function customQuery($dbin, $query, $row) {
  // substitute our new vals
  foreach($row as $k=>$v) {
    $query = str_replace("{{$k}}", fixMySquirrelVal($v), $query);
  }
  // run query with custom vals
  return runMySquirrelQuery($dbin, $query);
}
 
function fixMySquirrelVal($v) {
  if(is_null($v)) {
    return "NULL";
  }
  return sprintf(" '%s' ", mysql_real_escape_string($v));
}
 
function runMySquirrelQuery($dbin, $query) {
  global $fake_inserts;
  global $stop_on_failed_query;
  if( $fake_inserts ) {
    printIt($query);
    return true;
  }
  else if( !mysql_query($query, $dbin) ) {
    printIt("!!ERROR!! Unable to insert record ($query): ".mysql_error());
    if( $stop_on_failed_query ) { exit; }
    return false;
  }
  return true;
}
 
/**
 * Given a row from the source table, converts field names and values
 * to be compatible for dest table
 * @param array $row indexed array of (string)col => (mixed)value - do not escape these
 * @param array $fields indexed array of (string)source_field => (mixed)dest_field
 * @param array $defaults indexed array of (string)dest_field => (mixed)value -- can be null
 * @return array indexed by column names for dest table
 */
function convertRow($row, $fields, $defaults) {
  // start by applying default values, these can be overwritten if fields contains same value
  $newrow = is_array($defaults)? $defaults : array();
  foreach($fields as $k=>$v) {
    // for multiple fields with same key, strips off the [n] from end
    $k = preg_replace("@\[[0-9]+\]$@", "", $k); 
    // run user functions as needed to modify values
    $key = is_array($v)? $v[1] : $v;
    $val = is_array($v)? call_user_func($v[0], $row[$k], $row, $k, $key) : $row[$k];
    if( !is_null($val) ) { 
      // create the new column in data row
      $newrow[$key] = $val;
    }
  }
  return $newrow;
}
 
function connectToDb($connection_info) {
  list($host, $user, $pass, $dbname) = $connection_info;
  $dbh = mysql_connect($host, $user, $pass);
 
  if (!$dbh) {
    printIt("Unable to connect to DB: " . mysql_error());
    exit;
  }
 
  if (!mysql_select_db($dbname, $dbh)) {
    printIt("Unable to select $dbname: " . mysql_error());
    exit;
  }
 
  return $dbh;
}
 
function convertTable($map) {
  global $max_rows;
 
  printIt('');
  printIt("-------------------------------------------------");
  if( !empty($map['dest_query']) ) {
    printIt("Running custom query: {$map['dest_query']}");
  }
  else {
    printIt("Migrating values to {$map['dest_table']}");
  }
  printIt("-------------------------------------------------");
  printIt('');
 
  // we call this each time because we can't just store a separate connection for the from instance
  // and to instance... chances are they point to the same server, so php/mysql will re-use the connection
  // and just switch the instance being operated on... this will cause our queries to go on the fritz
  // the upside is that it doesn't have to reconnect since it's reused, so there's no big cost to this approach
  $dbout = connectToDb( $map['connection_info']['source'] );
 
  // select the old data
  $result = mysql_query($map['source_query'], $dbout);
  if (!$result) {
    printIt("Unable to load source data ({$map['source_query']}): " . mysql_error());
    exit;
  }
 
  if (mysql_num_rows($result) == 0) {
    printIt("No rows found, that's weird so I'm exiting");
    exit;
  }
 
  // here again, we call this right before use, because we're going to switch instances, but
  // it's probably to the same mysql server, so connection will get reused
  $dbin = connectToDb( $map['connection_info']['dest'] );
 
  // While a row of data exists, put that row in $row as an associative array
  $i=0;
  $j=0;
  while ($row = mysql_fetch_assoc($result)) {
    $vals = convertRow($row, $map['fields'], $map['defaults']);
    if( !empty($map['dest_query']) ) {
      customQuery($dbin, $map['dest_query'], $vals) || $j++;
    }
    else {
      addRow($dbin, $map['dest_table'], $vals) || $j++;
    }
    $i++;
    if( $max_rows && $i > $max_rows ) { break; }
  }
 
  // recover query memory
  mysql_free_result($result);
 
  // show the user what we did
  printIt("Converted ".($i-$j)." rows");
  if( $j ) { printIt("!!!! $j errors !!!!!!"); }
}
 
// actually run the conversions
foreach($tablemap as $t) {
  convertTable($t);
}
 
printIt("Finished without blowing up the planet");
 
?>

phpize: command not found

Ran into this error (phpize: command not found) while trying to install json extension for PHP.

Here is the solution on CentOS 5/RHEL:

yum install php-devel

On Debian/Ubuntu:

apt-get install php5-dev

Viola! Why do you have to install dev tools to use extensions? Dunno. Very irritating. But it solves it with no side effects.

“undefined function mysql_connect()” While Migrating Virtualmin

While migrating virtualmin to our shiney new CentOS 5  box, I ran into this error on all our PHP sites:

If I look at php info, it shows –with-mysql, but there is no mysql section farther down. It looked like PHP’s Apache module was having trouble finding the mysql.so file. I immediately thought to check for the php-mysql module (there is no php5-mysql for CentOS 5, I noted) and found it installed:

[root@server apf]# yum install php-mysql
...
Parsing package install arguments
Package php-mysql - 5.1.6-20.el5_2.1.i386 is already installed.
Nothing to do

After a short dive into the php.ini settings, I noted that all the restored sites had this in their php.ini file:

extension_dir=/usr/lib/php5/modules

But that dir doesn’t exist. Connecting the dots, I realized that CentOS 5 has moved all the PHP 5 data to the default. So the dir is now at /usr/lib/php/modules. And this solved the rest:

[root@server lib]# ln -s /usr/lib/php /usr/lib/php5
[root@server lib]# ls -ld /usr/lib/php*
drwxr-xr-x 4 root root 4096 Jul 16  2008 php
lrwxrwxrwx 1 root root    3 Mar 30 21:01 php5 -&gt; php
[root@server lib]# /etc/init.d/httpd restart
Stopping httpd:                                            [  OK  ]
Starting httpd:                                            [  OK  ]

Yay!

PHP Class for Accessing Twitter API

I wrote up a quick class for interacting with the twitter API. Right now it returns everything as json, but that would be a five minute project to change out for anyone with m4d sk1||z0rz.

Read the rest of this entry »