inicio mail me! sindicaci;ón

Quick and Dirty Database Conversion Tool

I created a quick tool for importing database data and modifying structure and data as it goes. It populates a single table at a time, offers decent flexibility, and allows you to specify your own callback functions to deal with data conversion. It can handle inserts and allows you to specify customized queries if desired. It allows any complexity of joins on the source data. Alas, it only works in mysql because that’s all I needed it for. Use an adapt for any non-evil purpose you like – Creative Commons. Usage:

  1. Open configure.php
  2. Set $fake_inserts = true (for testing)
  3. Probably want to set $max_rows (for testing)
  4. Set up your table conversions using the $tablemap arrays
  5. Call the script to test results: php ./converter.php
  6. When satisfied, set $fake_inserts = false (got a backup??)
  7. Whammo!

Here is the config.php file you’ll need to tweak:

<?
 
/*************************************************************
   CONFIGURABLE PARAMETERS
**********************************************************/
 
/**
 * These can also be configured per-query below, if more granularity is needed
 */
$from_db_connection = array('localhost', 'username', 'password', 'db-instance');
$to_db_connection = array('localhost', 'username', 'password', 'db-instance');
 
/**
 * We either output by printing insert statements to the screen for testing(true),
 * or by running the insert queries directly on the target db(false-default). Set this
 * to false when you're ready to import for real
 */
$fake_inserts = false;
 
/**
 * Max rows allows you to only run a certain number of queries per table, for testing. Leave
 * this set to 0 for the real run
 */
$max_rows = 0;
 
/**
 * If true, script will exist on any failed insert query.
 * If false, will continue to run even if queries fail and insert the other records.
 */
$stop_on_failed_query = false;
 
/**
 * $tablemap is an array containing one entry for each table to be converted.
 * Note that tables don't necessarily have to be 1:1 to make this work, since
 * the source query can use whatever sort of joins needed to set up the source
 * fields. There must, of course, be exactly one output table.
 *
 * <p>Here are the fields for each conversion element:
 * <ul>
 *   <li>connection info - an array containin the source database and target database. It contains two
 *       elements, each is an array containing array( host, username, password, db_instance )
 *   <li>source_query - the query to be run against source db, set it up however you like, the field names should correspond with the 'fields' array's keys
 *   <li>fields - see below
 *   <li>defaults - default values to apply to any fields in target table, overwritten by values in 'fields'
 * </ul>
 *
 * <p>If you just want to create insert statements with the data, use dest_table. If you
 * need custom queries or updates to run, you can specify dest_query. Values for any output fields
 * can be substituted using {field_name} in the query.
 *
 * <p>The 'fields' element is an indexed array. The key is the source table's column name. The value is either a string indicating the name of hte column in the destination table, or an array containing exactly two elements: 1) a function to call, which will be passed the value from source table, 2) the destination column name
 *
 * <p>The function is passed a host of arguments, in order:
 * <ol>
 *  <li>value from originating column
 *  <li>entire data row
 *  <li>original column name
 *  <li>destination column name
 * </ol>
 *
 * <p>To list the same field twice (since the keys would be identical) you can use an array index which will be stripped from the column name before querying, such as "field_id[1]", "field_id[2]", which both refer to a column named "field_id"
 */
$tablemap = array();
 
/**
* This example is a simple, straight copy from one table to another
*/
$tablemap[] = array(
	 "connection_info" => array(
				    "source" => $from_db_connection,
				    "dest" => $to_db_connection
	 ),
	 "source_query" => 'select * from source_table',
	 "dest_table" => "destination_table",
	 'fields' => array(
            'source1' => 'dest1',
            'source2' => 'dest2',
			   'posted'  => array('toSqlDate', 'date'), // convert unix timestamp to sql date
			   'id[1]'   => 'id', // here we set two dest fields from "id", using array index
			   'id[2]'   => array('getParentId', 'parent') // another custom function
	 ),
	 'defaults' => array(
			   'dest3' => '0',
            'dest4' => ''
	 )
);
 
/*
 * This example runs a customized query instead of a normal insert
 */
 
$tablemap[] = array(
	 "connection_info" => array(
          "source" => $from_db_connection,
          "dest" => $to_db_connection
	 ),
	 "source_query" => '
         select 
            count(*) as src_total, category
         from 
            source_table_1 s1, source_table_2 s2
         where
            s1.id = s2.id AND s2.id > 100
         group by
            category',
	 "dest_query" => 'update dest_table set total = {dest_total} where catid = {cat}',
	 'fields' => array(
			   'category' => 'cat',
			   'src_total' => 'dest_total'
	 ),
	 'defaults' => array() // these are still applicable, just didn't need any
);
 
 
/*****************************************************************************
  USER FUNCTIONS (methods used by the 'fields' array to convert values)
***************************************************************************/
 
/**
 * Just an example function that creates a value by combining two cols from source table
 */
function getParentId($src_id, $data_row, $src_col_name, $dest_col_name) {
   if( $src_id > 0 ) {
      // if it exists, use it
      return $src_id;
   }
   // otherwise, create one
   return $data_row["number_1"] + $data_row["number_2"];
}
 
/**
 * Example function to create an incremental id for the result data
 */
function getIncrementalId() {
   global $currentIncId;
   return ++$currentIncId;
}
$currentIncId = 0;
 
/**
 * Converts a unix timestamp to a sql datetime
 */
function toSqlDate($utime) {
  return date("Y-m-d H:i:s", $utime);
}
 
?>

And here is the convert.php script you’ll be running:

<?
/*********************************
 GO TO THIS FILE TO CONFIGURE
 ********************************/
  include("config.php");
 
/*********************************
 YOU WON'T NEED TO CHANGE ANYTHING BELOW HERE
 ********************************/
 
/**
 * Prints out contents and adds html chars if this isn't cli
 */
function printIt($txt) {
  $e = php_sapi_name() == "cli"? "\n" : "<br>\n";
  print $txt.$e;
}
 
/**
 * Generates a list of insertable values, escaped for sql
 */
function make_vals($row) {
  $s = '';
  foreach($row as $k=>$v) {
    if( $s ) { $s .= ", "; }
    $s .= fixMySquirrelVal($v);
  }
  return $s;
}
 
/**
 * Adds a data row to the dest table
 * @param resource $dbin the db connection
 * @param string $table dest table
 * @param array $row indexed array of (string)col => (mixed)value - do not escape these
 */
function addRow($dbin, $table, $row) {
  $vals = make_vals($row);
  $cols = '`'.join('`,`',array_keys($row)).'`';
  $query = "INSERT INTO `$table` ($cols) VALUES($vals)";
  return runMySquirrelQuery($dbin, $query);
}
 
/**
 * Runs a custom query rather than a simple insert
 * @param resource $dbin the db connection
 * @param string $query the custom query, with {field_name} anywhere the field vals will be substituted
 * @param array $row indexed array of (string)col => (mixed)value - do not escape these
 */
function customQuery($dbin, $query, $row) {
  // substitute our new vals
  foreach($row as $k=>$v) {
    $query = str_replace("{{$k}}", fixMySquirrelVal($v), $query);
  }
  // run query with custom vals
  return runMySquirrelQuery($dbin, $query);
}
 
function fixMySquirrelVal($v) {
  if(is_null($v)) {
    return "NULL";
  }
  return sprintf(" '%s' ", mysql_real_escape_string($v));
}
 
function runMySquirrelQuery($dbin, $query) {
  global $fake_inserts;
  global $stop_on_failed_query;
  if( $fake_inserts ) {
    printIt($query);
    return true;
  }
  else if( !mysql_query($query, $dbin) ) {
    printIt("!!ERROR!! Unable to insert record ($query): ".mysql_error());
    if( $stop_on_failed_query ) { exit; }
    return false;
  }
  return true;
}
 
/**
 * Given a row from the source table, converts field names and values
 * to be compatible for dest table
 * @param array $row indexed array of (string)col => (mixed)value - do not escape these
 * @param array $fields indexed array of (string)source_field => (mixed)dest_field
 * @param array $defaults indexed array of (string)dest_field => (mixed)value -- can be null
 * @return array indexed by column names for dest table
 */
function convertRow($row, $fields, $defaults) {
  // start by applying default values, these can be overwritten if fields contains same value
  $newrow = is_array($defaults)? $defaults : array();
  foreach($fields as $k=>$v) {
    // for multiple fields with same key, strips off the [n] from end
    $k = preg_replace("@\[[0-9]+\]$@", "", $k); 
    // run user functions as needed to modify values
    $key = is_array($v)? $v[1] : $v;
    $val = is_array($v)? call_user_func($v[0], $row[$k], $row, $k, $key) : $row[$k];
    if( !is_null($val) ) { 
      // create the new column in data row
      $newrow[$key] = $val;
    }
  }
  return $newrow;
}
 
function connectToDb($connection_info) {
  list($host, $user, $pass, $dbname) = $connection_info;
  $dbh = mysql_connect($host, $user, $pass);
 
  if (!$dbh) {
    printIt("Unable to connect to DB: " . mysql_error());
    exit;
  }
 
  if (!mysql_select_db($dbname, $dbh)) {
    printIt("Unable to select $dbname: " . mysql_error());
    exit;
  }
 
  return $dbh;
}
 
function convertTable($map) {
  global $max_rows;
 
  printIt('');
  printIt("-------------------------------------------------");
  if( !empty($map['dest_query']) ) {
    printIt("Running custom query: {$map['dest_query']}");
  }
  else {
    printIt("Migrating values to {$map['dest_table']}");
  }
  printIt("-------------------------------------------------");
  printIt('');
 
  // we call this each time because we can't just store a separate connection for the from instance
  // and to instance... chances are they point to the same server, so php/mysql will re-use the connection
  // and just switch the instance being operated on... this will cause our queries to go on the fritz
  // the upside is that it doesn't have to reconnect since it's reused, so there's no big cost to this approach
  $dbout = connectToDb( $map['connection_info']['source'] );
 
  // select the old data
  $result = mysql_query($map['source_query'], $dbout);
  if (!$result) {
    printIt("Unable to load source data ({$map['source_query']}): " . mysql_error());
    exit;
  }
 
  if (mysql_num_rows($result) == 0) {
    printIt("No rows found, that's weird so I'm exiting");
    exit;
  }
 
  // here again, we call this right before use, because we're going to switch instances, but
  // it's probably to the same mysql server, so connection will get reused
  $dbin = connectToDb( $map['connection_info']['dest'] );
 
  // While a row of data exists, put that row in $row as an associative array
  $i=0;
  $j=0;
  while ($row = mysql_fetch_assoc($result)) {
    $vals = convertRow($row, $map['fields'], $map['defaults']);
    if( !empty($map['dest_query']) ) {
      customQuery($dbin, $map['dest_query'], $vals) || $j++;
    }
    else {
      addRow($dbin, $map['dest_table'], $vals) || $j++;
    }
    $i++;
    if( $max_rows && $i > $max_rows ) { break; }
  }
 
  // recover query memory
  mysql_free_result($result);
 
  // show the user what we did
  printIt("Converted ".($i-$j)." rows");
  if( $j ) { printIt("!!!! $j errors !!!!!!"); }
}
 
// actually run the conversions
foreach($tablemap as $t) {
  convertTable($t);
}
 
printIt("Finished without blowing up the planet");
 
?>

“undefined function mysql_connect()” While Migrating Virtualmin

While migrating virtualmin to our shiney new CentOS 5  box, I ran into this error on all our PHP sites:

If I look at php info, it shows –with-mysql, but there is no mysql section farther down. It looked like PHP’s Apache module was having trouble finding the mysql.so file. I immediately thought to check for the php-mysql module (there is no php5-mysql for CentOS 5, I noted) and found it installed:

[root@server apf]# yum install php-mysql
...
Parsing package install arguments
Package php-mysql - 5.1.6-20.el5_2.1.i386 is already installed.
Nothing to do

After a short dive into the php.ini settings, I noted that all the restored sites had this in their php.ini file:

extension_dir=/usr/lib/php5/modules

But that dir doesn’t exist. Connecting the dots, I realized that CentOS 5 has moved all the PHP 5 data to the default. So the dir is now at /usr/lib/php/modules. And this solved the rest:

[root@server lib]# ln -s /usr/lib/php /usr/lib/php5
[root@server lib]# ls -ld /usr/lib/php*
drwxr-xr-x 4 root root 4096 Jul 16  2008 php
lrwxrwxrwx 1 root root    3 Mar 30 21:01 php5 -&gt; php
[root@server lib]# /etc/init.d/httpd restart
Stopping httpd:                                            [  OK  ]
Starting httpd:                                            [  OK  ]

Yay!

Dropping multiple tables in mysql (drop tables with wildcard)

I wanted to drop all mysql tables from a db that had a certain prefix, such as xx_.

Well this turned out to be an adventure…

The First Attempt

The first solution I found was to use sed/grep sort of logic:

#mysqlshow -u username -p dbname xx\\_% |sed 's/[|+-]//g'|sed 's/[ ]*$/,/'>droptables.sql

(xx\\_% is how you tell it to show tables starting with xx_)

This produced a semi-useful list. But I had to manually edit out the extra commas and line feeds to get a pure list, then add “DROP TABLE ” to the beginning of the list. Then feed this into mysql as follows:

#mysql -u username -p dbname < droptables.sql

I wasn’t real happy with that answer, naturally, so I researched more…

A Better Solution

I came across an alternative in this command:

mysqldump -u username -p --add-drop-table --no-data dbname | grep "^DROP.*\`xx_" | mysql -u username -p dbname

(replace xx_ with the prefix you want to remove, to do all tables, try just “grep ^DROP”

You can test it first by cutting off the last “| mysql…” bit and see the output, which is useful for debugging, before you go blowing away your tables.