Memory challenge for Phalcon - user space code wins?

Hi guys,

recently I've discovered one interesting thing in the PHP engine: memory consumption of returned values is optimized for user space code, but is not optimized for internal code. This deeply affects Phalcon, because, passing high amount of data through it, may become quite ineffective.

The issue is found in any internal method/function (in any PHP extension), which returns a value. If you look at the mechanism, how 'return_value' variable is passed in and out of internal PHP function in C, then you realize, that you need to fully copy all the returned data into this variable. There is no ability just to increase a 'refcount' of an already existing zval and just return it. This means, that it is not possible to engage PHP's memory optimization mechanism (refcounting for "copy-on-write").

At the same time, returning values from a user space method/function is fully optimized by PHP - "copy-on-write" technique is engaged there.

I've composed and verified the issue in the following script: http://pastebin.com/MZ0AZmGA . And you can see the results of running it here: http://pastebin.com/uhgfpLUn

In the test I took the most simple object in Phalcon and tested, how the memory changes, when passing data in and out non-modified. Essentially, that is what tested: ```php $string = str_repeat('a', 5000000); $rawValue = new Phalcon\Db\RawValue($string); $tmp = $rawValue->getValue(); ```

The test confirms, that memory consumption increases by 5Mb, when returning value from Phalcon object. Same implementation of the object, made in a normal user space PHP (see links above - the test has it), doesn't suffer from such an issue. Upon returning data, no additional memory is consumed, as the returned value is just a refcounted reference to the original data.

One good note: the issue is important for string and array zvals only; it doesn't affect internal methods/functions that either return objects, or return value by reference. One bad note: this issue is not obvious for a PHP developer, because an experienced programmer knows about reference counting in PHP. So, while heavily relying on this mechanism in an everyday work, a programmer becomes deceived in his expectations of an application performance.

Overall, this all brings a challenge to Phalcon, because internal functions/methods cannot compete with user space code under scenarios, where high amount of data is traveling across the system mostly for read-only purposes. Not only Phalcon consumes much more memory under those conditions, it also spends much computer time on performing unnecessary duplication, which is high for complex structures like associative arrays.

So I'm just wondering, guys, what do you think about this issue?



81.2k

Why do you assume that Phalcon isn't using highly optimized copy-on-write?

Phalcon uses copy-on-write mechanics, where it can be used. But it is not possible in returning values from Phalcon methods. To be more precise - it is not an issue of Phalcon, rather it is a drawback of current PHP engine, so other extensions have exactly the same problem. At the same time user space PHP code works perfectly and have no such an issue at all.

I've provided the test script in the post, so it can be easily verified by anyone.



81.2k

Your point is valid, and I think you know this (http://stackoverflow.com/questions/17844379/how-is-to-return-array-from-a-php-extension-without-copying-it-in-memory), it would be better if returnvalueptr has a valid address to copy-on-write the property instead of fully copying it. I think we can pass a valid returnvalueptr to most methods since we have our own functions to call methods and functions.

Are you absolutely sure that your measurements are correct? Looking at the opcode dump, I fail to see where PHP optimizes return value:

number of ops:  31
compiled vars:  !0 = $string, !1 = $rawValue, !2 = $tmp, !3 = $tmp2, !4 = $phpRawValue, !5 = $tmpPhp, !6 = $tmpPhp2
line     # *  op                           fetch          ext  return  operands
---------------------------------------------------------------------------------
   2     0  >   SEND_VAL                                                 'a'
         1      SEND_VAL                                                 5000000
         2      DO_FCALL                                      2  $0      'str_repeat'
         3      ASSIGN                                                   !0, $0
   4     4      ECHO                                                     '%0A'
   5     5      ZEND_FETCH_CLASS                              4  :2      'Phalcon%5CDb%5CRawValue'
         6      NEW                                              $3      :2
         7      SEND_VAR                                                 !0
         8      DO_FCALL_BY_NAME                              1          
         9      ASSIGN                                                   !1, $3
   6    10      ZEND_INIT_METHOD_CALL                                    !1, 'getValue'
        11      DO_FCALL_BY_NAME                              0  $7      
        12      ASSIGN                                                   !2, $7
   7    13      ZEND_INIT_METHOD_CALL                                    !1, 'getValue'
        14      DO_FCALL_BY_NAME                              0  $10     
        15      ASSIGN                                                   !3, $10
   9    16      ECHO                                                     '%0A'
  10    17      ZEND_FETCH_CLASS                              4  :12     'PhpRawValue'
        18      NEW                                              $13     :12
        19      SEND_VAR                                                 !0
        20      DO_FCALL_BY_NAME                              1          
        21      ASSIGN                                                   !4, $13
  11    22      ZEND_INIT_METHOD_CALL                                    !4, 'getValue'
        23      DO_FCALL_BY_NAME                              0  $17     
        24      ASSIGN                                                   !5, $17
  12    25      ZEND_INIT_METHOD_CALL                                    !4, 'getValue'
        26      DO_FCALL_BY_NAME                              0  $20     
        27      ASSIGN                                                   !6, $20
  15    28      ECHO                                                     '%0A'
  18    29      NOP                                                      
  32    30    > RETURN                                                   1

Opcodes generated for both cases look the same and I see that in both cases PHP assigns the return value first to a temporary variable and then assigns it to the real variable.

Phalcon,

Yep, I know that SO question a little.

My intent was to know the team's and framework users opinion on this matter: whether you aware of this issue, do you do any active work to promote/help it to be fixed in PHP core, do you plan to put the info about the issue somewhere in Phalcon documentation, other thoughts?..

Vladimir, the measurements are correct. I intentionally prepared and published the script, so anyone can verify the measurements and confirm them or find a mistake.

Regarding opcodes - I cannot give you the answer right now, because I have little experience to properly read this raw vm data. However, when examining the PHP vm I noticed, that all the user space functions there are called with zvalptrptr for returning values. I suspect, that the main difference is not visible in opcodes. It is not important, where data is assigned, but rather what kind of data it is. In case of user space it is the same zval, it case of internal function, it is a copied zval. So my thought, is that details of copying/reference counting are not visible in opcodes.

Give me a couple of hours and I will prove you that you are underestimating Phalcon :-)

PS: returnvalueptr_ptr is used only when the function is declared to return a reference, ie function& getValue()

Please try the latest 1.3.0 when you have a free minute ;-)

Creating $string: 5.00M
Creating Phalcon RawValue: 0.00M
Getting Phalcon string back: 0.00M
Getting Phalcon string back again: 0.00M

Creating $string2: 5.00M
Creating PhpRawValue: 0.00M
Getting PHP string back: 0.00M
Getting PHP string back again: 0.00M

Good job and interesting commit, Vladimir :) Though, had no time verify, how it works, just looked briefly into the diffs. Going to test it tomorrow.

Meanwhile I wonder - does it work for internal calls, i.e. calling extension methods from inside an extension? If I understand it right, then the implemented solution was to hack into internals of PHP VM, ignore incoming "return_value" pointer and rather use global state structure. This works for calls from user space code, where VM properly prepares state, but may not work for calls from internal code, where VM is not engaged and all the data is passed via ordinary C function in/out parameters and returned values.

I will build updated 1.3.0 Phalcon and research it tomorrow.

This will work for both calls from the userspace and calls from the extension (in 5.5; in 5.3 and 5.4 Zend does not invoke zendexecuteinternal() from zendcallfunction() — however, this is not an issue in Phalcon because of our own implementation of method calls).

The implemented solution is not a hack — incoming returnvalue is NOT ignored. What the code does is passes a pointer to returnvalue in returnvalueptr; no global structures are modified (we do not try to modify EG(returnvalueptr_ptr) etc).

A cleaner solution is to set 'returnreference' flag in ZENDBEGINARGINFOEX() — note that you are NOT obliged to return a reference in this case, but it allows you to use returnvalue_ptr.