What interactive console commands never do

If you’ve ever used an interactive console task and screwed up some of the settings, the following kind of exchange should look familiar to you:

[niko@localhost:~]$ animal-generator

Welcome to the interactive stuff generator!
Please give a latin name for an animal: Canis Lupus Familiaris
How many instances do you want to generate [1]: 50
What is your favorite color [lavender]: red
Write here the word "inconvenience" [croissants]: inconvenience
Are the above settings correct? [no]: yes

ERROR: You already have the file canis-lupus-familiaris.dat in your home directory. Please delete it and type in all the options again.
Usage: animal-generator [--animal="..."] [--instances="..."] [--color="..."] [--no-inconvenience] [--confirm]

It’s fine and dandy that the command suggests a way to skip the interactive part but it’s not going to be much of a help to us after we’ve already screwed up the process and will have to do it again after cleanup by either going through the wizard a second time or typing all the options as command line parameters. Copying and pasting the names of the parameters and their values will take about the same time as typing them manually, if not longer.

Why can’t the error message look like this?

ERROR: You already have the file canis-lupus-familiaris.dat in your home directory. Please delete it and type in all the options again, or copy the command below to run the command again with the exact same parameters.
Usage: animal-generator [--animal="..."] [--instances="..."] [--color="..."] [--no-inconvenience] [--confirm]
Try again: animal-generator --animal="Canis Lupus Familiaris" --instances="50" --color="red" --no-inconvenience --confirm

Posted in Ideas, Usability | Leave a comment

Writing PHP the European way

I read about a rather brilliant PHP function today, one called __halt_compiler(). By brilliant I of course mean capable of causing all kinds of havoc.

What it does is cause the PHP compiler to ignore everything after the function call, and define a constant __COMPILER_HALT_OFFSET__ that signifies the position in the current PHP file where the function was executed.

How can you use it, you ask? Here’s an example. There’s been discussion what it would be like if PHP were British, but see how easy it would be to make it European:

// How to code PHP the European (or Finnish) way

$fp = fopen(__FILE__, 'r');
fseek($fp, __COMPILER_HALT_OFFSET__);
eval(preg_replace('/(€|FIM)/', '$', stream_get_contents($fp)));

// From now on we can use € or FIM instead of dollars!

for(€i = 0; €i < 10; €i++) {
        echo "Hello FIMi!\n";

There are of course valid uses for the function as well, such as packaging code and binary in one file for system installations or similar. On the other hand, since PHP 5.3 we can package everything with Phar, which is basically Jar for PHP. But it won’t let you write in €uropean… ;)

Posted in Uncategorized | Leave a comment

CRUD with entities from multiple namespaces in Symfony 2 and Doctrine

Using the CRUD generator in Symfony 2 is a breeze, as long as your entities are all in the same database and namespace. In one of my current projects we do not, and ran into problems. We finally got it working like this:

First set up your bundles. You’ll probably have one set up already, but create another one to hold the entities that represent your secondary database. In this example, the primary bundle is OurCompanyMainBundle and the secondary one is OurCompanySecondaryBundle.

Then create your database definitions in /app/config/parameters.ini:

database_driver = pdo_mysql
database_host = localhost
database_port =
database_name = my_main_db
database_user = mainuser
database_password = verysecretpassword

extra_database_driver = pdo_mysql
extra_database_host = localhost
extra_database_port =
extra_database_name = my_secondary_db
extra_database_user = secondaryuser
extra_database_password = alsoverysecret

Now define a new connection in your /app/cache/config.yml. You may need to add the “connections” level if you have only one database set up yet:

default_connection: default
driver: %database_driver%
host: %database_host%
port: %database_port%
dbname: %database_name%
user: %database_user%
password: %database_password%
charset: UTF8
enum: string
driver: %extra_database_driver%
host: %extra_database_host%
port: %extra_database_port%
dbname: %extra_database_name%
user: %extra_database_user%
password: %extra_database_password%
charset: UTF8

auto_generate_proxy_classes: %kernel.debug%
connection: default
OurCompanyMainBundle: ~
connection: extra
OurCompanySecondaryBundle: ~

Congratulations! You now have two entity managers which both manage entities in their own respective databases.

If your secondary database is a legacy one, you may want to use the opportunity to generate your entities right from the database by reverse engineering the structure. This can be done pretty easily as described in the symfony2 documentation, as long as your database matches the following criteria:

  • All your tables must have a primary key.
  • If you have references to external tables, you must have set up foreign keys for them.
  • If your database has fields that Doctrine has problems with (for example enum), you must map them to another field type with mapping_type. I have done this in config.yml above, mapping all enum fields to strings.

After generating the entities and their CRUDs, you may encounter this depressing error message when accessing it:

Unknown Entity namespace alias 'OurCompanySecondaryBundle'.
500 Internal Server Error - ORMException

This error can be a bit confusing. It means your entity manager can’t see the bundle we created. To fix this, we must specify which entity manager we want to use every time we access them.

To do this, change this…

$em = $this->getDoctrine()->getEntityManager();

…to this:

$em = $this->getDoctrine()->getEntityManager('extra');

That’s it – your entities should be good to go! Enjoy managing your legacy data in the fancy new symfony2 environment! :)

Posted in PHP, Troubleshooting | Tagged , , , | 4 Comments

SVN repository relocation woes

After moving an SVN repository to another server, a strange problem surfaced when trying to relocate one of the servers.

[root@whatever wwwroot]# svn sw --relocate https://svn.newhost.fi/repos
svn: PROPFIND request failed on '/repos/importantproject'
svn: PROPFIND of '/repos/importantproject': 301 Moved Permanently (https://svn.newhost.fi)

This is a common problem with misconfigured servers, but what made this case special is the fact that relocating the very same repository on other hosts worked perfectly.

Nothing was output to the error_log, so I decided to check the access_log. Here’s what it looked like: - - [21/Sep/2011:13:41:19 +0300] "PROPFIND /repos/importantproject HTTP/1.1" 401 487 "-" "SVN/1.4.2 (r22196) neon/0.25.5" - myusername [21/Sep/2011:13:41:19 +0300] "PROPFIND /repos/importantproject HTTP/1.1" 301 339 "-" "SVN/1.4.2 (r22196) neon/0.25.5"

The key here is the version number. For some reason older subversion clients may try to access the repository at a too shallow level (1.6.6 will do PROPFIND /repos/importantproject/site/trunk). The server responds with a redirect that the client can’t or won’t follow. After upgrading the client to 1.6 the relocation worked like a charm.

If you encounter problems like this, check out the version of your subversion client and upgrade if possible.

Posted in Troubleshooting, Version control | Leave a comment

Fetch random rows faster with recycled IDs

Say you need to quickly fetch a random row from a database table. What’s your approach? If you answered:

SELECT * FROM tablename ORDER BY rand() LIMIT 1

…you’re probably in the majority, and in many cases right. However, the larger the table, the higher the performance penalty of this solution will be. That’s because you’re not really selecting a random row – you’re selecting all of them, sorting them by an arbitrary number and selecting the one that gets sorted as the first one.

How are you supposed to do it, then? You have many options, but so far a single one has yet to stand out. One commonly used solution is using a random number against the table’s primary ID. Generate a random number between the minimum and maximum IDs, fetch that row and you should be good, right? Absolutely, but that’s assuming you have no gaps in your ID sequence. If you hit an ID that does not exist, you’ll have to either reroll or use WHERE id >= $myrandomnumber. Imagine an exaggerated example of having a gap of one million rows in your ID sequence and you’ll see where the problem is: the row right after the gap is a million times more likely to be selected than the others.

The solution I’ve come up with is adding a secondary indexed column called id2. This is actively kept as gapless as possible so a random number between MIN(id2) and MAX(id2) should hit almost always, or at least with a minimal number of rerolls. This way random rows can be selected lightning-fast, while maintaining an even and unbiased distribution.

Now all we need is a way to keep the table gapless with minimal overhead. Constantly rebuilding the sequence every time a row is inserted or deleted will make both operations really slow, effectively negating the benefits. Instead, let’s set up a recycling bin for the id2s of deleted rows:

CREATE TABLE `recycle_ids` (
  `table_name` varchar(64) default NULL,
  `id2` int(11) default NULL,
  KEY `table_name` (`table_name`,`id2`)

NOTE: Though I’m using MySQL and PHP in these examples, the solution itself is language agnostic.

Assuming your id2 column in your target table is already set up, you’re good to go! Whenever you delete a row, insert the deleted row’s id2 into the recycler:

// After deleting a row...
mysql_query("DELETE FROM test_data WHERE id2 = $random_row");
// ...we also add the id2 of the deleted row to the list of recycled ids
mysql_query("INSERT INTO recycle_ids (table_name, id2) VALUES ('test_data', $random_row)");

When inserting a new row, check the recycler first. If there are no id2s available, create a new one:

function _get_next_id($table = 'test_data') {
	// Check if a recycled ID is available
	$res = mysql_query("SELECT id2 FROM recycle_ids WHERE table_name = '$table' LIMIT 1");
	if(!$res) {
	if(mysql_num_rows($res)==0) {
		// No recycled IDs available, let's use a new one
		$res = mysql_query("SELECT MAX(id2) as max_id FROM $table");
		$row = mysql_fetch_assoc($res);
		return $row['max_id']+1;
	$row = mysql_fetch_assoc($res);
	// The ID has been recycled so we remove it from the list
	mysql_query("DELETE FROM recycle_ids WHERE table_name='$table' AND id2='{$row['id2']}'");
	return $row['id2'];

That’s all there is to it! Well, in a real world scenario we’d of course have better error handling and probably use a smarter database interface than the basic MySQL client library (PDO would be my choice) but for this example I chose the one I think most are familiar with.

Now for the most important question: What are the benchmarks? First of all, here’s the test script I used to insert the test data:

// The test loop. Add rows and every five iterations delete a random row.
for($i=1; $i<=$amount; $i++) {
	if($i % 5 == 0)

Running the loop 10,000 times against a table with 150,000 rows already in it takes 18.8 seconds on my low-spec virtual server. That’s 0.0018 seconds per row, and that includes deleting a random row every 5 iterations.

How about the speed of random row selection compared to ORDER BY rand()? Selecting a random row 100 times using the “traditional” method takes 75.5 seconds on my test table. In comparison, using id2 the same takes 0.03 seconds.

While this isn’t an all-powerful tool to tackle every situation, it’s definitely something you should try out if random queries are giving you performance woes. Whether it works for you or not, I’d be happy to hear from you!

Posted in Ideas, Optimization, SQL | Leave a comment