[Explained]: What does “binary safe” mean in PHP?
John Mwaniki / Updated on 07 Jul 2024When going through the documentation of various in-built functions in PHP, you will most likely come across the term "binary safe" quite often.
In this article, we will cover what "binary safe" means, and how binary-safe functions are different from those which are not with aid of several examples.
In order to get a better understanding, let's first cover some concepts briefly:
- ASCII character encoding
- The null character
- PHP functions
ASCII character encoding
ASCII (American Standard Code for Information Interchange) is a character encoding standard (character set) used between computers and other electronic devices on the Internet.
It consists of 128 characters which include the numbers from 0 to 9, the upper and lower case letters from A to Z, and some special characters. It contains a binary code for all these characters (from 000 0000 to 111 1111).
It is the basis for modern character sets such as UTF-8 (default encoding for HTML5) and ISO-8859-1 (default for HTML 4.01).
The null character
The null character is a control character representing nothing, with the value of binary zero, but may have special meaning when interpreted as text, as in marking the end of character strings.
The null character is represented with hex code "x00" and is escaped with a backslash \ where x indicates hexadecimal notation.
A PHP string with a null character will look as below.
<?php
$string = "I like coding \x00 in PHP";
$length = strlen($string);
echo "The string length is: $length";
Output:
The string length is: 22
You will notice that if you manually count the characters of the string above you will get 25 instead. However, the strlen() function returns the string length as 22. This is because "\x00" is parsed to mean null character and thus occupies one byte instead of 4. That's why the length is less by 3.
PHP functions
A function is a named group of reusable code that performs a specific task, which can be called anywhere in your program.
Functions usually take in data, process it, and return a result.
The code in the function does not get executed until the function is called.
There are two types of functions in PHP: in-built functions and user-defined functions. The in-built functions already exist internally in PHP and all you have to do is to call them when you want to use them. On the other hand, user-defined functions are created by the programmer to accomplish a custom task.
For instance, strlen() as used above is an in-built function for getting the length of a string. We pass the string when calling it and it returns its length.
In our case here, we are more interested in the in-built PHP functions. They are all documented on the PHP official website.
While going through their documentation, you will find most of them described as "binary-safe".
What does binary-safe mean?
Traditionally there are two ways to mark the end of a string: by adding a null character (\x00) at the end of the string (C language uses this method) or by storing its length along with the string data (PHP uses this method).
The limitation of the former (using the null character to mark the end of the string) is that you cannot use a null byte/character anywhere else in the string but at the end.
PHP is an interpreted language with its interpreter written in C language. Therefore, some of its functions might be based on C functions.
Any function can process a string that contains only ASCII characters and no null characters correctly.
However, some functions may process strings containing non-ASCII bytes and/or null bytes incorrectly. For instance, a PHP function might be based on a C function that expects null-terminated strings, so if the string contains a null character, the function would ignore anything after it. This is because it treats the null character as the end of the string.
Binary safety can therefore be defined as a property of functions which means they process any string correctly.
Most of C's standard library string functions can be classified as "non-binary safe" since they rely on the null character for termination.
A "binary safe function" is a function that works correctly even when you pass arbitrary binary data eg. a string containing non-ASCII bytes and/or null bytes.
Non-binary safe function example
A good example of a not binary-safe function is strcoll().
The strcoll() is an in-built case-sensitive PHP function that compares two strings.
Syntax
strcoll(str1,str2)
Parameters
Parameter | Requirement | Description |
---|---|---|
str1 | Required | The first string to compare |
str2 | Required | The second string to compare |
function returns:
- 0 - if the two strings are equal
- <0 - if str1 is less than str2
- >0 - if str1 is greater than str2
Example
<?php
$str1 = "Hello";
$str2 = "Hello\x00 world!";
if(strcoll($str1, $str2) === 0){
echo "The two strings are the same.";
}
else{
echo "The two strings are different.";
}
Output:
The two strings are the same.
From the example above, you can clearly see that the two strings are different. $str1 has value "Hello" while $str2 has value "Hello\x00 world!". The reason why the output of the strcoll() function is 0 (which means they are the same) is that it is not binary safe and interprets the null character "\x00" as the end of the string $str2 and ignores everything after. So it assumes the value of $str2 to be "Hello".
Note: The comparison of the strings may vary depending on the locale settings. If the current locale is C or POSIX, this function works the same way as strcmp() explained below.
Binary safe function example
Most of the functions in PHP are binary-safe. But for the sake of demonstration, we will use strcmp() function which is very similar to the above in its functionality except that it is binary-safe.
The strcmp() is an in-built, case-sensitive PHP function used to compare two strings.
Syntax
strcmp(str1,str2)
Parameters
Parameter | Requirement | Description |
---|---|---|
str1 | Required | The first string to compare |
str2 | Required | The second string to compare |
function returns:
- 0 - if the two strings are equal
- <0 - if str1 is less than str2
- >0 - if str1 is greater than str2
Example
<?php
$str1 = "Hello";
$str2 = "Hello\x00 world!";
if(strcmp($str1, $str2) === 0){
echo "The two strings are the same.";
}
else{
echo "The two strings are different.";
}
Output:
The two strings are different.
The strcmp() function correctly processes the string $str2 up to the last character (does not terminate it at the null character "\x00") and thus the two strings are not considered equal.
That's all!
It's my hope that this explanation helps you to get a better understanding.