Quick and Dirty Database Conversion Tool
I created a quick tool for importing database data and modifying structure and data as it goes. It populates a single table at a time, offers decent flexibility, and allows you to specify your own callback functions to deal with data conversion. It can handle inserts and allows you to specify customized queries if desired. It allows any complexity of joins on the source data. Alas, it only works in mysql because that’s all I needed it for. Use an adapt for any non-evil purpose you like – Creative Commons. Usage:
- Open configure.php
- Set $fake_inserts = true (for testing)
- Probably want to set $max_rows (for testing)
- Set up your table conversions using the $tablemap arrays
- Call the script to test results: php ./converter.php
- When satisfied, set $fake_inserts = false (got a backup??)
- Whammo!
Here is the config.php file you’ll need to tweak:
<? /************************************************************* CONFIGURABLE PARAMETERS **********************************************************/ /** * These can also be configured per-query below, if more granularity is needed */ $from_db_connection = array('localhost', 'username', 'password', 'db-instance'); $to_db_connection = array('localhost', 'username', 'password', 'db-instance'); /** * We either output by printing insert statements to the screen for testing(true), * or by running the insert queries directly on the target db(false-default). Set this * to false when you're ready to import for real */ $fake_inserts = false; /** * Max rows allows you to only run a certain number of queries per table, for testing. Leave * this set to 0 for the real run */ $max_rows = 0; /** * If true, script will exist on any failed insert query. * If false, will continue to run even if queries fail and insert the other records. */ $stop_on_failed_query = false; /** * $tablemap is an array containing one entry for each table to be converted. * Note that tables don't necessarily have to be 1:1 to make this work, since * the source query can use whatever sort of joins needed to set up the source * fields. There must, of course, be exactly one output table. * * <p>Here are the fields for each conversion element: * <ul> * <li>connection info - an array containin the source database and target database. It contains two * elements, each is an array containing array( host, username, password, db_instance ) * <li>source_query - the query to be run against source db, set it up however you like, the field names should correspond with the 'fields' array's keys * <li>fields - see below * <li>defaults - default values to apply to any fields in target table, overwritten by values in 'fields' * </ul> * * <p>If you just want to create insert statements with the data, use dest_table. If you * need custom queries or updates to run, you can specify dest_query. Values for any output fields * can be substituted using {field_name} in the query. * * <p>The 'fields' element is an indexed array. The key is the source table's column name. The value is either a string indicating the name of hte column in the destination table, or an array containing exactly two elements: 1) a function to call, which will be passed the value from source table, 2) the destination column name * * <p>The function is passed a host of arguments, in order: * <ol> * <li>value from originating column * <li>entire data row * <li>original column name * <li>destination column name * </ol> * * <p>To list the same field twice (since the keys would be identical) you can use an array index which will be stripped from the column name before querying, such as "field_id[1]", "field_id[2]", which both refer to a column named "field_id" */ $tablemap = array(); /** * This example is a simple, straight copy from one table to another */ $tablemap[] = array( "connection_info" => array( "source" => $from_db_connection, "dest" => $to_db_connection ), "source_query" => 'select * from source_table', "dest_table" => "destination_table", 'fields' => array( 'source1' => 'dest1', 'source2' => 'dest2', 'posted' => array('toSqlDate', 'date'), // convert unix timestamp to sql date 'id[1]' => 'id', // here we set two dest fields from "id", using array index 'id[2]' => array('getParentId', 'parent') // another custom function ), 'defaults' => array( 'dest3' => '0', 'dest4' => '' ) ); /* * This example runs a customized query instead of a normal insert */ $tablemap[] = array( "connection_info" => array( "source" => $from_db_connection, "dest" => $to_db_connection ), "source_query" => ' select count(*) as src_total, category from source_table_1 s1, source_table_2 s2 where s1.id = s2.id AND s2.id > 100 group by category', "dest_query" => 'update dest_table set total = {dest_total} where catid = {cat}', 'fields' => array( 'category' => 'cat', 'src_total' => 'dest_total' ), 'defaults' => array() // these are still applicable, just didn't need any ); /***************************************************************************** USER FUNCTIONS (methods used by the 'fields' array to convert values) ***************************************************************************/ /** * Just an example function that creates a value by combining two cols from source table */ function getParentId($src_id, $data_row, $src_col_name, $dest_col_name) { if( $src_id > 0 ) { // if it exists, use it return $src_id; } // otherwise, create one return $data_row["number_1"] + $data_row["number_2"]; } /** * Example function to create an incremental id for the result data */ function getIncrementalId() { global $currentIncId; return ++$currentIncId; } $currentIncId = 0; /** * Converts a unix timestamp to a sql datetime */ function toSqlDate($utime) { return date("Y-m-d H:i:s", $utime); } ?> |
And here is the convert.php script you’ll be running:
<? /********************************* GO TO THIS FILE TO CONFIGURE ********************************/ include("config.php"); /********************************* YOU WON'T NEED TO CHANGE ANYTHING BELOW HERE ********************************/ /** * Prints out contents and adds html chars if this isn't cli */ function printIt($txt) { $e = php_sapi_name() == "cli"? "\n" : "<br>\n"; print $txt.$e; } /** * Generates a list of insertable values, escaped for sql */ function make_vals($row) { $s = ''; foreach($row as $k=>$v) { if( $s ) { $s .= ", "; } $s .= fixMySquirrelVal($v); } return $s; } /** * Adds a data row to the dest table * @param resource $dbin the db connection * @param string $table dest table * @param array $row indexed array of (string)col => (mixed)value - do not escape these */ function addRow($dbin, $table, $row) { $vals = make_vals($row); $cols = '`'.join('`,`',array_keys($row)).'`'; $query = "INSERT INTO `$table` ($cols) VALUES($vals)"; return runMySquirrelQuery($dbin, $query); } /** * Runs a custom query rather than a simple insert * @param resource $dbin the db connection * @param string $query the custom query, with {field_name} anywhere the field vals will be substituted * @param array $row indexed array of (string)col => (mixed)value - do not escape these */ function customQuery($dbin, $query, $row) { // substitute our new vals foreach($row as $k=>$v) { $query = str_replace("{{$k}}", fixMySquirrelVal($v), $query); } // run query with custom vals return runMySquirrelQuery($dbin, $query); } function fixMySquirrelVal($v) { if(is_null($v)) { return "NULL"; } return sprintf(" '%s' ", mysql_real_escape_string($v)); } function runMySquirrelQuery($dbin, $query) { global $fake_inserts; global $stop_on_failed_query; if( $fake_inserts ) { printIt($query); return true; } else if( !mysql_query($query, $dbin) ) { printIt("!!ERROR!! Unable to insert record ($query): ".mysql_error()); if( $stop_on_failed_query ) { exit; } return false; } return true; } /** * Given a row from the source table, converts field names and values * to be compatible for dest table * @param array $row indexed array of (string)col => (mixed)value - do not escape these * @param array $fields indexed array of (string)source_field => (mixed)dest_field * @param array $defaults indexed array of (string)dest_field => (mixed)value -- can be null * @return array indexed by column names for dest table */ function convertRow($row, $fields, $defaults) { // start by applying default values, these can be overwritten if fields contains same value $newrow = is_array($defaults)? $defaults : array(); foreach($fields as $k=>$v) { // for multiple fields with same key, strips off the [n] from end $k = preg_replace("@\[[0-9]+\]$@", "", $k); // run user functions as needed to modify values $key = is_array($v)? $v[1] : $v; $val = is_array($v)? call_user_func($v[0], $row[$k], $row, $k, $key) : $row[$k]; if( !is_null($val) ) { // create the new column in data row $newrow[$key] = $val; } } return $newrow; } function connectToDb($connection_info) { list($host, $user, $pass, $dbname) = $connection_info; $dbh = mysql_connect($host, $user, $pass); if (!$dbh) { printIt("Unable to connect to DB: " . mysql_error()); exit; } if (!mysql_select_db($dbname, $dbh)) { printIt("Unable to select $dbname: " . mysql_error()); exit; } return $dbh; } function convertTable($map) { global $max_rows; printIt(''); printIt("-------------------------------------------------"); if( !empty($map['dest_query']) ) { printIt("Running custom query: {$map['dest_query']}"); } else { printIt("Migrating values to {$map['dest_table']}"); } printIt("-------------------------------------------------"); printIt(''); // we call this each time because we can't just store a separate connection for the from instance // and to instance... chances are they point to the same server, so php/mysql will re-use the connection // and just switch the instance being operated on... this will cause our queries to go on the fritz // the upside is that it doesn't have to reconnect since it's reused, so there's no big cost to this approach $dbout = connectToDb( $map['connection_info']['source'] ); // select the old data $result = mysql_query($map['source_query'], $dbout); if (!$result) { printIt("Unable to load source data ({$map['source_query']}): " . mysql_error()); exit; } if (mysql_num_rows($result) == 0) { printIt("No rows found, that's weird so I'm exiting"); exit; } // here again, we call this right before use, because we're going to switch instances, but // it's probably to the same mysql server, so connection will get reused $dbin = connectToDb( $map['connection_info']['dest'] ); // While a row of data exists, put that row in $row as an associative array $i=0; $j=0; while ($row = mysql_fetch_assoc($result)) { $vals = convertRow($row, $map['fields'], $map['defaults']); if( !empty($map['dest_query']) ) { customQuery($dbin, $map['dest_query'], $vals) || $j++; } else { addRow($dbin, $map['dest_table'], $vals) || $j++; } $i++; if( $max_rows && $i > $max_rows ) { break; } } // recover query memory mysql_free_result($result); // show the user what we did printIt("Converted ".($i-$j)." rows"); if( $j ) { printIt("!!!! $j errors !!!!!!"); } } // actually run the conversions foreach($tablemap as $t) { convertTable($t); } printIt("Finished without blowing up the planet"); ?> |
