10 things you (probably) didn’t know about PHP.

PHP is simultaneously the most infuriating and joyful languages I’ve ever worked with. I say “infuriating” primarily because the function names are so inconsistant. Despite the fact that I use it almost everyday, I still have to think to myself “Is it str_pos or strpos? str_split or strsplit?” On the other hand, occasionally I’ll stumble across a gem that perfectly solves the problem at hand with a single line of code.

Here’s a short list of cool features that might have slipped under your radar as well:

  1. Use ip2long() and long2ip() to store IP addresses as integers instead of strings in a database. This will reduce the storage space by almost a factor of four (15 bytes for char(15) vs. 4 bytes for the integer), make it easier to calculate whether a certain address falls within a range, and speed-up searches and sorts (sometimes by quite a bit).
  2. Partially validate email addresses by checking that the domain name exists with checkdnsrr(). This built-in function checks to ensure that a specified domain name resolves to an IP address. A simple user-defined function that builds on checkdnsrr() to partially valid email addresses can be found in the user comments section in the PHP docs. This is handy for catching those occasional folks who think their email address is ‘joeuser@wwwphp.net’ instead of ‘joeuser@php.net’.
  3. If you’re using PHP 5 with MySQL 4.1 or above, consider ditching the mysql_* functions for the improved mysqli_* functions. One nice feature is that you can use prepared statements, which may speed up queries if you maintain a database-intensive website. Some benchmarks.
  4. Learn to love the ternary operator.
  5. If you get the feeling that you might be reinventing the wheel during a project, check PEAR before you write another line. PEAR is a great resource that many PHP developers are aware of, yet many more are not. It’s an online repository containing over 400 reusable snippets that can be dropped right into your PHP application. Unless your project is trully unique, you ought to be able to find a PEAR package that saves at least a little time. (Also see PECL)
  6. Automatically print a nicely formatted copy of a page’s source code with highlight_file().This function is handy for when you need to ask for some assistance with a script in a messageboard, IRC, etc. Obviously, some care must be taken not to accidently show your source when it contains DB connection information, passwords, etc.
  7. Prevent potentially sensitive error messages from being shown to users with the error_reporting(0) function. Ideally error reporting should be completely disabled on a production server from within php.ini. However if you’re on a shared webhost and you aren’t given your own php.ini, then your best bet is to add error_reporting(0); as the first line in each of your scripts (or use it with require_once().) This will prevent potentially sensitive SQL queries and path names from being displayed if things go awry.
  8. Use gzcompress() and gzuncompress() to transparently compress/decompress large strings before storing them in a database. These built-in functions use the gzip algorithm and can compress plaintext up to 90%. I use these functions almost everytime I read/write to a BLOB field within PHP. The only exception is when I need full text indexing capabilities.
  9. Return multiple values from a function with “by reference” parameters. Like the ternary operator, most PHP developers who come from a more formalized programming background already know this one. However, those who’s background is more HTML than Pascal, probably have wondered at one time “how do I get multiple values back from a function I wrote, even though I can only use one return value?” The answer is that you precede a variable with “&” and use it “by reference” instead of “by value”.
  10. Fully understand “magic quotes” and the dangers of SQL injection. I’m hoping that most developers reading this are already familiar with SQL injection. However, I list it here because it’s absolutely critical to understand. If you’ve never heard the term before, spend the entire rest of the day googling and reading.

76 Replies to “10 things you (probably) didn’t know about PHP.”

  1. It’s a huge mistake to use error_reporting(0). Yes, it can hide error to user, but you cannot log it ! error_reporting should be E_ALL (dev or prod), display_errors to “off” and log_errors to “on”.
    If you have some bugs or someone trying to hack your site, log can help you. If error_reporting is set to “0”, no errors are logged.

    If you cannot change php.ini, use set_error_handler to log errors and hide it to user.

    NEVER, NEVER, NERVER set error_reporting to “0” !!

  2. Federic,

    I think it’s more a matter of personal tastes on how you want your errors delivered to you. While I can see the advantages to your method, I normally prefer trap errors one at a time with my own function. With error reporting disabled, I might use something like this:

    define(“ERROR_EMAIL”, “myemail@example.com”);
    $result = mysql_query($q); handleDBError($q);

    function handleDBError($query) {
    global $link;
    if( mysql_errno($link) != 0 ) {
    $msg = “nn”.date(“r”).str_repeat(“-“, 10).”n”.$query.”n”.str_repeat(“-“, 20).”n”;
    $msg .= str_repeat(“=”, 20).”n[“.mysql_errno($link).”] “.mysql_error($link).”n”.str_repeat(“=”,20).”nn”;

    mail(ERROR_EMAIL, “Database error!”, $msg, ‘From: PHP “);
    }
    }

    This method has served me well for many years. However, as I do more and more work with PHP5 though, I’m trying to break old habits by using the new exception handling features (which should been apart of PHP long ago, IMHO).

  3. Of course, you MUST manage your errors before it return a “php” error, but you wrote “Prevent potentially sensitive error messages from being shown to users” that can be translated by “display_errors = off” and not “error_reporting(0)”.

    Personnaly, I changed error_handler to throw exception and use try/catch inside my code.

  4. I think #9 is misleading. PHP functions have only a single return value. There is an important distiction between mutating a variable via a reference versus a genuine return value.

    The least of which is that the return value is usless unless captured in the calling scope, where with references that isn’t true. Another would be that referenced variables are modified in real time, and not when the function exits. Also, for the reference trick you have to actually pass them in. If PHP had real multiple return values then that wouldn’t be the case and you could leave your function’s signature alone.

    So, yes in the end you effectively maniuplate multiple variables in the calling scope. But I would not call this technique “multiple return values”.

    The best way to fake multiple return values is to just return an array of those values, and you can even use list() in the calling scope to extract them to separate variables if you want.

  5. Peter,

    Yeah, I probably could have worded it a bit more clearly.

    In my experience, user defined functions are used one of two ways: 1) to take input X, do something, and return a status code or to return a function of X (“function” in the mathmatical sense). or 2) serving primarily as way of organizing code, without much regard to independent scope. The first method I picture as a black box (much like OOP). The second is more of a filter which acts on multiple variables in parallel. I think by reference variables are better suited when used in the filter context.

  6. Quiton,
    Interesting… I’ll have to play with it a bit this weekend. Other than the diagnostic functions listed in the user comments, what other uses have you for this? Nothing is immediately jumping out at me.

  7. Apart from attempting to handle every error scenario, I believe the that adding the following lines in your .htaccess is probably the cleanest way to handle unexpected errors within your apps:

    php_value display_errors off
    php_value log_errors on
    php_value log_errors_max_len 1024
    php_value error_log PHP_ERROR.log

    This will turn off the displaying of errors to the end-user but log them into a file so you can still have access to them at any time.

    Note that you should check that file periodically and make sure it isn’t growing out of control.

    Adding those directives in .htaccess is a lot more portable than php.ini and a lot less intrusive than adding the ini_set and such other functions on every one of your file, although I do not believe it would function on IIS.

    Some other interesting directives you can pass through .htaccess are

    php_value magic_quotes_gpc off
    php_value magic_quotes_runtime off
    php_value magic_quotes_sybase off
    php_value zlib.output_compression off
    php_value memory_limit 32M
    php_value track_errors 1

    One final pointer I would give to anyone who generates large pages that tend to suffer from latency over the network is to implement output compression on your server before looking at ways to speed up your code itself, as this tends to be a bottleneck with many applications. Apache provides mod_deflate and mod_gzip and IIS has some other ways of doing this but I don’t remember how.

  8. Pingback: meneame.net
  9. Nice tips!

    I don’t agree with #8, though. The actual effort to write the blob is not depending directly on the size of the blob, but in the fact that the heads of the HD must be positioned and the write started. Compressing adds a lot of CPU overhead and won’t reduce execution time, IMHO. As of space in the database, I think actually space is the cheapest variable in every system (for most systems).

    Best regards,
    diego.

  10. One catch with point 1: ip2long doesn’t work with IPv6.

    If you ever put your code in an IPv6 environment, it’s far easier to resize a field in the database, than replacing who knows how many “ip2long”s.

  11. Diego,
    Yeah, I guess it depends on which you’re trying to conserve more of: CPU cycles or storage space. In my rsswebwatcher.com project, I’m planning to cache thousands and thousands of HTML documents. The websites are crawled non-interactively from a queue. In this case, I’m willing to spare a few milliseconds per record to save many gigs down the road. Thanks for the feedback!

  12. JarFil,
    Good point. I would hope that PHP would evolve to take IPv6 into account before it becomes prevalent though. Of course, MySQL would also have to support 128-bit integers too.

    It’s worth noting that Postgres already supports a native IPv6 datatype. Another reason not to simply default to MySQL when building a web app.

  13. Instead of using error_reporting(0) you should use a custom error reporting function using set_error_handler()

    Also the compression is not a good idea when you deal with traffic, because any compression algoithm will increase the overhead.

  14. Another thing some don’t know about PHP is some of the great frameworks you can use. Rapid development, rails-esque, MVC flavored frameworks are making PHP a better place to be.

    My personal favorite is CakePHP.

  15. I think the issue here is every project is going to have different requirements. the important thing is to be consistent within that project, such as using the text compression tools all the time or none of the time. Imagine what would happen if you compressed text into a field but then later on forgot to decompress it. Sure it’s a quick fix, but you get the idea.

    Another one I’m surprised people haven’t mentioned from your tips is the ternary operator. I would use it for a person project or one at work where there’s only a couple people, but I’d never use it for a project I planned to be open source and have several developers from everywhere. While nice and “to the point” it can cause needless confusion for people that have no idea what it is or when someone tries to do something more complicated than it’s really meant for. Which then goes back to consistency, (if you want to add an ‘s’ if more than one) would you use $count > 1 ? 's' : '' or if ($count > 1) { echo 's'; }?

    And I’ll hype up pear a little more. We use it for all our stuff at work for databases and it works a lot better than the php functions that are built in. Also the pear mail class works well.

  16. Using a Ternary Operator will not make your work easy to read…

    I’ll would stick with the If/Else since
    1. it’s easier to understand,
    2. easier to read,
    3. more programmers know about it
    4. used in other languages

  17. Wow… where did all of these comments come from?

    Nick: I agree on the set_error_handler() point. Though I still think choosing to use compression or not is best considered on a per-project basis.

    John: I’ve briefly looked at CakePHP, but for some reason I can’t seem to wrap my head around it. I’ve been tinkering with Ruby (and RoR) a bit recently and am thinking about using it to develop some of my side projects. It seems more sensical to use a language that was designed to be OOP and MVC based from the start, rather than tacking on a framework like Cake or Symfony. Do you have enough experience with both Cake and Ruby to recommend one over the other?

    Todd/marcel: I’m kinda surprised at the resistance to the ternary operator. It’s not an esoteric PHP thing: the ternary operator is also used in Perl, Java, JavaScript, Ruby, C, and (recently added to) Python. I guess it does take a little getting used to the syntax, but I find it more intutive than using an if statement when doing a simple variable assignment. But the fact that two PHP developers (somewhat randomly sampled from a small pool of readers) don’t care for it could be a reason not to use it in an OSS or other collaborative project.

  18. You should try to avoid magic_quotes in all its flavors, use add_slashes() and strip_slashes() instead with user input and you will save time and avoid common problems that come along.

  19. There’s some great tips here. I’d like to put in my vote in favour of the ternary operator. I’m with you – if it’s a simple variable assignment, it’s much more concise.

    Here’s something I just learned recently that I’ve been using quite a bit: Variable variables (http://ca3.php.net/manual/en/language.variables.variable.php)

    An example where this is handy is when you have a navigation include file and you want to style the link to the current page differently than the other nav links.

    At the top of every page include this:


    // this page's codename
    $page_name = 'contactus';

    The nav include would look like this:

    Home
    About Us
    Portfolio
    Contact Us

  20. Western Infidels, thanks for catching that.

    daba,
    I normally use a function that I adapted from an example I found in the PHP docs:

    function smart_quote($to_quote) {
    if( $to_quote=='' ) return 'NULL';

    if (get_magic_quotes_gpc()) {
    $to_quote = stripslashes($to_quote);
    }

    if (!is_numeric($to_quote)) {
    $to_quote = "'" . mysql_real_escape_string($to_quote) . "'";
    }
    return $to_quote;
    }

  21. Great read!

    I didn’t know about some of these ten, so my code will probably see some major improvements now that I’m actually playing around with IP (v4) addresses for my project. 🙂

  22. Using a Ternary Operator will not make your work easy to read…

    I disagree, there are many cases where the ternary operator is *far* clearer than if/else, and it’s available in many languages. For example:

    $a = ($b == null) ? "default" : $b;

    Or,

    some_fn($a, $b, $c > 0 ? "a" : "b");

    It is very useful where the logic is trite. If/else would require temporary variables and slowing down the important flow, of what the code is actually supposed to do. The trite logic only exists as an aside to a function’s purpose. Of course, though, it can be horribly misused (ternary-in-ternary!).

  23. Sorry, my code didn’t display properly above. I’m going to try again:


    // THIS IS AT THE TOP OF EVERY PAGE:
    // this page's codename
    $page_name = 'contactus';
    include('nav.php');

    // THIS IS THE NAV INCLUDE:
    $$page_name = " class='thispage'";
    $nav = <<<EOF
    <ul>
    <li><a$home href=''>Home</a></li>
    <li><a$about href='about'>About Us</a></li>
    <li><a$portfolio href='portfolio'>Portfolio</a></li>
    <li><a$contactus href='contact'>Contact Us</a></li>
    </ul>
    EOF;
    print $nav;

  24. I totally agree with MX.

    Ternary operators make the code more, not less, readable, in particular at a higher level, by not using branching statements for simple assignments or minor alternate choices.
    If you use if/else statements only for serious flow control and not also for trivial details your code will be more readable by reducing the visual impact of the details at the higher levels.

    Even in Todd’s example, I’ll take:

    echo $count. ‘ book’.($counts > 1 ? ‘s’ : ”).’ found.’;

    over

    echo $count.’ book’;
    if ($count > 1) echo ‘s’;
    echo ‘ found’;

    any time. Not to mention cases in which the else part of the statement also plays a role, like here:

    echo $count. ‘ strawberr’.($counts > 1 ? ‘y’ : ‘ies’).’ found.’;

    or

    echo $count.’ strawberr’;
    if ($count > 1) echo ‘ies’ else echo ‘y’;
    echo ‘ found’;

    Personally I’d want any programmer that works on a project I’m involved in to know how to use the ternary operator.

  25. great comments – i knew about the 10 things already, although theres nothing wrong with a good review once in a while. I found the discussion on the ternary operator great. My very first freelance article was on “Our Friend ?:”.

    I liked the snippet for magic quotes too.

  26. About point 1 (“Use ip2long() and long2ip() to store IP addresses as integers instead of strings in a database”), some SGBD support functions like “INET_ATON()” and “INET_NTOA()” to do these conversion tasks, and they may be a better approach to insert IP addresses.

  27. Like the list, howvere as a bit of a php newbie some of it just goes way over my head at present, cant wait til im more experienced.

    Paul

  28. Pingback: Enakans Blog
  29. For checkdnsrr, you can take it one step further and use the most useful PEAR class, PEAR::Validate. It can validate email addresses, urls, numbers, and a whole host of other data.

    It’s really nice for email addresses since it will detect common injection techniques used by spammers.

    and while mysqli is useful, if you can’t rely on a custom php5 (mysqli isn’t compiled by default), then use PEAR::DB or PEAR::MDB2 which also allow prepared statements and have the benefit of working in php4 and 5

  30. “the function names are so inconsistant.” Actually, its inconsistent but who’s checking spelling these days…

    Thats what we get when we use open-source, every person’s differing opinion about how to name and do something.

  31. The by-reference “trick” can lead to seriously difficult-to-read, bug-prone code by others or even yourself. Use sparingly and document profusely. You’re breaking encapsulation — only do so for a good reason.

  32. I’m concerned about #2, using checkdnsrr to help prevent false email addresses.

    It is possible that a domain name may not resolve to an IP address but still accept mail through the use of a DNS MX record. This setup is quite common.

    Am I missing something?

  33. never use mysql_* or any database specific API directly. Use PEAR DB instead. It makes it much easier to port to a different backend when needed.

  34. 1. ip2long and long2ip only handle IPv4. At least with strings you can migrate to IPv6 relatively seamlessly. (Obviously the logic to determine subnets must be modified.) It’s 2007! You should at least address IPv6 (no pun intended) in your standard library. It’s not like detecting which type has been passed is terribly complex or resource-intensive.

    Oh yeah, the fact that the function returns -1 on error is dumb too since they will likely be ignored and MySQL doesn’t support CHECK syntax as a safety catch.

    2. No comment.

    3. You shouldn’t use either. Your queries should not be hard-coded in your code, and you should not intimately tie your app to any particular database without good reason. (Use PEAR’s DB layer instead.) This is of course completely glossing over the fact that PHP lacks proper namespacing, which is why you have to have “mysql_” and “mysqli_” prefixes in the first place. But I digress.

    4. The ternary operator does not enhance performance. Use it only when it enhances readability. When an if-else statement would be more readable, avoid the ternary. A better suggestion would be to learn to love easily readable code. Code is for humans, not computers. Computers are more than willing to accept a long series of ones and zeros. It’s we humans that don’t handle the ones and zeros too well.

    5. Agreed, code reuse is best. PEAR is far better than reinventing the wheel. Too bad the author missed that PEAR also includes a database abstraction layer.

    6. Not “some care,” a huge amount of care. In fact, you should make sure calls like that are only from authenticated or otherwise protected URLs.

    7. You’ve suppressed the error, but you haven’t mentioned that you should always log any errors. Users shouldn’t ever see SQL queries, but you damn well better be logging them for you, the other developers, and the sysadmins.

    8. Depends on the database. PostgreSQL and some other databases compress and decompress textual data automatically in the background. This tip only helps work around one of MySQL’s limitations.

    9. Good tip, just don’t overuse it.

    10. This tip should be, “Don’t ever use magic quotes, period!” String concatenation should *never* be used for database access in a public-facing app. Scratch that. String concatenation should *never* be used for database access, period! Use either prepared statements or simple parameterized statements with SQL. Don’t wait for that one time you were coding at 4am and forgot to escape the query input. Let the built-in libraries do that for you.

    String concatenation for SQL queries is not a choice, it’s always a bug. Use examples like this instead:

    $res = $db->query( ‘SELECT id FROM users WHERE login=?’, array( $name ) );

    No muss, no fuss, just as efficient (if not more so), easy to read, and always safe from an SQL injection attack.

  35. #2 is wrong. You don’t use A records to validate email addresses. You can’t get an A record for the domain in my email address, but you can send me email.

  36. Heh. This is pretty newbie stuff. So is that comment post about ticks (handy, those). But I guess everyone has to start somewhere.

    Two notable things about using ip2long that you didn’t mention – yes it speeds up searching and sorting but more importantly it allows natural sorting of IPs for display purposes. It also allows you to search within a RANGE of ip addresses using BETWEEN , which you certainly couldn’t do with strings 😉

    There are MySQL functions that do the same as the PHP ones too, they’re INET_ATON will turn a string IP into an integer, and INET_NTOA turns the integer into a string.

  37. Not to display an AOL tendancy, but I have to throw a cup of “me too” into the MX/Andrea fray.

    I find the only people who think the ternary operator is difficult to read are people who are new to languages. Perhaps that’s the case with Marcel.

    Personally, the ternary operator makes code very brief; I can see the condition, and the two return values at a glance without having to look at a conditional structure.

    Much faster to read. Easier on the brain. I love it.

  38. I think I will stick with this method.


    $action = 'default';
    if (isset($_POST['action']) && $_POST['action'] != '') {
    $action = $_POST['action'];
    }

  39. Oddly you never mentioned the strange issue with ip2long & long2ip: they return negative integers!

    Since PHP’s integers are signed, if the integer gets too large, it goes negative. A simple fix by using sprintf / printf will allow you to resolve this issue, and allows you to make a database row unsigned if you desire…


    function _ip2long ($ip) {
    return sprintf("%u", ip2long($ip));
    }

    Now there are faster & more efficient ways of doing this, but generally this will get you through the day 🙂

  40. I didn’t know about the mysqli functions. It makes rollback very easy to do! I’m sure i’ll find other great uses for it too. Thanks!

  41. jjbegin

    If you are going to save thousands of HTML documents, please do yourself a service and save them as files.

    Performance will be much better and it will result in a much cleaner design.

    If you are going to make the content searchable, do yourself another favor by using a real indexer. Either use Lucene or Xapian (which has PHP bindings)

  42. It’s really sad to see so many people still clinging onto PHP for all it’s worth. You know, PHP was inspired mainly by Perl but was designed for a single purpose. PHP is the 1992 Ford Aerostar of web development: a big fat american piece of shit. Sure, a lot of people bought [into] it, but that doesn’t mean it’s sound technology.

  43. My addition to the list: serialize() and unserialize()

    I knew about what they did for a long time before figuring out how to use them to make my life easier. If you’ve ever made various simple PHP forms with database access over time, you’ll know it’s pretty tedious to create a table with all the fields and then write long-ish queries that insert, update and/or select each and every one (not to mention cleaning each one separately to avoid SQL injections).

    But eventually I realised you could just have one long text field in which you would keep something like serialize($_POST) and later do unserialize($string_from_db) and you’ve got the array of the values ready to use, as if you’d never left the code that saved them.

    Of course, it’s not a very good solution for larger systems where you’ll need to fetch data from different fields, etc. But for a simple survey or guestbook for a small site this tip will save you a lot of time.

  44. Uhh… #2 is wrong.

    The domain in an email address does not need to resolve to an IP to be valid. An e-mail address domain can be perfectly good without an IP address, so if you follow this “tip” you will reject some perfectly good e-mail addresses.

    To properly validate the domain in an e-mail, you must do an MX record lookup.

  45. Only problem with checkdnsr is that its an apache pacific function. So if you are like most, and first create your code on Windows and then upload it to an apache server, it doesn’t work on your test server so you can never tell what will happen.

  46. Why only 10 things?
    2. I would check for MX, A, CNAME entries for a host
    3. I think it’s better to use PDO now if you want to make your code database-independent to some extent.

  47. I’m still learning PHP, but I’ve already come to love the ternary operator for simple (non-nested) assignments…it keeps code simpler and less visually jarring. To help visually parse the ternary statement, however, I place parens around each of the pieces, like so:

    $var = (testvalue) ? (truevalue) : (falsevalue);

    which makes it quite easy to read.

Comments are closed.