Phalcon Resultsets and large MySQL tables

Hi there! We tried to execute the Model::find() method on a mysql dataset with over a million entries and got an memory overflow before iterating over the Resultset. The overflow does not happen, when querying the same table with mysql_query() or $phalconDb->query().

After setting PDO::MYSQL_ATTR_USE_BUFFERED_QUERY = false, we didn't get an overflow, but the Resultset was empty.

$phalconDb = new \Phalcon\Db\Adapter\Pdo\Mysql(...);
$result = $phalconDb->query("SELECT * FROM schinkens");
while($schinken = $result->fetch()) 
    print_r($schinken);

Any ideas what the problem is? What do we need to consider when handling huge tables like this?

Thanks for your help!
Lukas



83.0k

What is a memory overflow?

The memory is exhausted. The limit is set to 512MB.

Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 24 bytes) in C:\path\to\file.php on line 55


83.0k

OK, but there's no such thing as a memory overflow in CS

Phalcon ORM only keeps one record in memory, it's detailed in the docs:

While findFirst() returns directly an instance of the called class (when there is data to be returned), the find() method returns a Phalcon\Mvc\Model\Resultset\Simple. This is an object that encapsulates all the functionality a resultset has like traversing, seeking specific records, counting, etc. These objects are more powerful than standard arrays. One of the greatest features of the Phalcon\Mvc\Model\Resultset is that at any time there is only one record in memory. This greatly helps in memory management especially when working with large amounts of data.

http://docs.phalconphp.com/en/latest/reference/models.html#model-resultsets

Maybe the record is being referenced somewhere in the app causing the GC cannot remove it from memory?

edited Oct '14

That is the reason why we chose Phalcon. But the issue happens before the resultset is returned from phalcon. The table has 10 columns. 1,700,00 entries and the size is 262 MiB.

$visitors = Visitors::find();
echo $visitors->count() // this line is never reached.

The same issue appears if we use the queryBuilder using the modelsManager.

If we do the exact same thing using the (deprecated but simple) mysql_query, everything works fine, and the result is fetched within 3 sec.

$result = mysql_query("SELECT * FROM visitors", $link);
while($row = mysql_fetch_assoc($result)) ... // this works as expected and no fatal errors are thrown.


83.0k

But $visitors = Visitors::find(); does not return the whole resultset in memory it just creates a cursor pointing to the first record

Yes, I know it points/should point to just one.

Have you some experience with that amount of data? Can you tell us, if there might be some issues we could look into?

We could also reproduce it using with plain PDO. After using the $pdo->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false); it worked, also when added the attribute in the options when initializing the phalcon mysql adapter.



83.0k

Yeah, the ORM is designed to handle such amounts of data and even more and we have had experience with bigger data than that.

  • Are you using some kind of cache?
  • XDebug?
  • Can you post the relevant code of your app somewhere?

Cache and XDebug are disabled. Let me sleep over that and if I can't fix it tomorrow I will come back to you with a complete test script :)



305
edited Oct '14

Hi,

I´m experiencing the same problem. The Phalcon Mysql Adapter uses PDO.

From this link http://php.net/manual/de/mysqlinfo.concepts.buffering.php

Queries are using the buffered mode by default. This means that query results are immediately transferred from the MySQL Server to PHP and is then kept in the memory of the PHP process. This allows additional operations like counting the number of rows, and moving (seeking) the current result pointer. It also allows issuing further queries on the same connection while working on the result set. The downside of the buffered mode is that larger result sets might require quite a lot memory. The memory will be kept occupied till all references to the result set are unset or the result set was explicitly freed, which will automatically happen during request end the latest. The terminology "store result" is also used for buffered mode, as the whole result set is stored at once.

This means for me, that there is a high chance that this is the reason why our memory limit is reached. Therefor we used the PDO::MYSQL_ATTR_USE_BUFFERED_QUERY = false. But the problem with that is, that Phalcon then is not working correctly anymore. It then says that the $result->count() is 0.

We tried now a test case with just 100 results. If PDO::MYSQL_ATTR_USE_BUFFERED_QUERY = false in the constructor of the mysql adapter, count() results in 0... That sounds like a bug to me. Or do you have some workarounds?



305
edited Oct '14

Tried on

  • Windows 7, PHP 5.5.11, Phalcon 1.2.6 & Phalcon 1.3.3
  • Linux, PHP 5.5.18 Phalcon 1.2.6

A super simple test case where the memory limit is reached (Before actually fetching anything).

We created a robots table with a range of column from a-j and filled it with 1.000.000 entries the size was 800mb.

The first code is (Buffered Table ind PDO is true, memory limit reached):

    use Phalcon\DI;
    use Phalcon\Db\Adapter\Pdo\Sqlite as Connection;
    use Phalcon\Mvc\Model\Manager as ModelsManager;
    use Phalcon\Mvc\Model\Metadata\Memory as MetaData;
    use Phalcon\Mvc\Model;

    $di = new DI();

    //Setup a connection
    $di->set('db', function(){
        return new \Phalcon\Db\Adapter\Pdo\Mysql(array(
            "host" => "localhost",
            "username" => "root",
            "password" => "secret",
            "dbname" => "test"
        ));
    });

    //Set a models manager
    $di->set('modelsManager', new ModelsManager());

    //Use the memory meta-data adapter or other
    $di->set('modelsMetadata', new MetaData());

    //Create a model
    class Robots extends Model {}

    //Use the model
    $robots = Robots::find(); // memory limit is reached somewhere in here.

During Robots::find() the memory of 512mb is reached.

Another try (Buffered Query False -> count = 0):

    use Phalcon\DI;
    use Phalcon\Db\Adapter\Pdo\Sqlite as Connection;
    use Phalcon\Mvc\Model\Manager as ModelsManager;
    use Phalcon\Mvc\Model\Metadata\Memory as MetaData;
    use Phalcon\Mvc\Model;

    $di = new DI();

    //Setup a connection
    $di->set('db', function(){
        return new \Phalcon\Db\Adapter\Pdo\Mysql(array(
            "host" => "localhost",
            "username" => "root",
            "password" => "secret",
            "dbname" => "test",
            "options" => array(
                // Added
                \PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => false
            )
        ));
    });

    //Set a models manager
    $di->set('modelsManager', new ModelsManager());

    //Use the memory meta-data adapter or other
    $di->set('modelsMetadata', new MetaData());

    //Create a model
    class Robots extends Model {}

    //Use the model
    $robots = Robots::find(); // this now does not reach the memory limit

    print_r($robots->count()); // gives 0

Hi, did you had time to look into this yet? We are still having this issue...



1.8k

Anything new here? Having the same problem.

edited Nov '14

Just confirmed this on a database with ~1.7million rows.

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 24 bytes)
7   0.4107  1045864 Postcodes::find( )  ..\Product.php:599
8   0.4107  1046448 find ( )    ..\Postcodes.php:206
9   0.4107  1048392 execute ( ) ..\Postcodes.php:206
10  0.4108  1053784 _executeSelect ( )  ..\Postcodes.php:0
11  0.4173  1067456 query ( )   ..\Postcodes.php:0
12  0.4173  1067728 query ( )   ..\Postcodes.php:0

Knowing that my memory_limit is set to 128M I decided to limit my find query until the error didn't appear and I was able to get to a value of 38,839.

    var_dump(count(Postcodes::find(array(
        'limit' => 38839
    ))));
int 38839

@phalcon it seems that those lines you reference to in the code don't seem to be applying. At least not on version 1.3.3 anyway.



305

Hey @phalcon will you ever look into this or do we have to change the framework?



83.0k
edited Nov '14

As I stated before, the framework is using a cursor that only puts one record at time in memory to work with big resultsets of data.

The resultset switch to cursor mode when count() shows that the resultset has more than 32 rows:

In 1.3.4:

In 2.0.0:



1.8k

Thats all fine in theory, but it obviously doesn't work, hence the error messages.

As I stated before, the framework is using a cursor that only puts one record at time in memory to work with big resultsets of data.

The resultset switch to cursor mode when count() shows that the resultset has more than 32 rows:

In 1.3.4:

In 2.0.0:



305

@phalcon yes as you stated before, it seems you does not read the problem. Again: The problem is in executing the query, not during fetching. Your stated lines of code, are never reached, as the problem happens during query execution. As in the example of @Rewt0r you can clearly see, that there it has nothing to do with rows.

edited Nov '14

Hey guys,

I tested $customers = Customer::find(); print_r($customers->count()); exit; on a table with 5804011 rows and 1.4 GiB of data The model for this table was generated by console - phalcon model ...

  • PHP 5.5.15-1~dotdeb.1 (cli) (built: Jul 24 2014 16:44:04)
  • mysql Ver 14.14 Distrib 5.5.38, for debian-linux-gnu (x86_64) using readline 6.2
  • Linux Debian 7.0
  • Phalcon 1.3.3
  • memory_limit = 512M

Everything worked fine :)

CREATE TABLE customer ( country char(2) NOT NULL, id int(10) unsigned NOT NULL AUTOINCREMENT, customer_id int(11) unsigned DEFAULT NULL, locale varchar(5) NOT NULL, email varchar(128) NOT NULL, first_name varchar(64) DEFAULT NULL, last_name varchar(64) DEFAULT NULL, gender tinyint(3) unsigned DEFAULT NULL, date_of_birth datetime DEFAULT NULL, birthday_date datetime DEFAULT NULL, password_encryption_type char(10) DEFAULT NULL, password_hash varchar(64) DEFAULT NULL, registration_uri varchar(255) DEFAULT NULL, registration_ip varchar(16) DEFAULT NULL, confirmation_hash varchar(255) DEFAULT NULL, confirmation_ip varchar(16) DEFAULT NULL, activation_date datetime DEFAULT NULL, last_login datetime DEFAULT NULL, login_count int(11) DEFAULT NULL, corporate_code varchar(32) DEFAULT NULL, referrer_id int(10) unsigned DEFAULT '0', affiliate_id int(10) unsigned DEFAULT NULL, tradedoubler_uid varchar(32) DEFAULT NULL, default_billing_address_id int(10) unsigned DEFAULT NULL, default_shipping_address_id int(10) unsigned DEFAULT NULL, newsletter_subscription_type tinyint(3) unsigned DEFAULT NULL, email_change_address varchar(128) DEFAULT NULL, email_change_ip varchar(16) DEFAULT NULL, email_change_date datetime DEFAULT NULL, email_change_confirmation_date datetime DEFAULT NULL, complaints smallint(5) unsigned DEFAULT '0', bounces smallint(5) unsigned DEFAULT '0', soft_bounces smallint(5) unsigned DEFAULT '0', hard_bounces smallint(5) unsigned DEFAULT '0', block_bounces smallint(5) unsigned DEFAULT '0', is_half_registered tinyint(3) unsigned DEFAULT NULL, has_uncanceled_orders tinyint(3) unsigned DEFAULT NULL, is_created_in_admin tinyint(3) unsigned DEFAULT '0', is_active tinyint(3) unsigned NOT NULL DEFAULT '1', created_at datetime DEFAULT NULL, modified_at datetime DEFAULT NULL, first_purchase datetime DEFAULT NULL, last_purchase datetime DEFAULT NULL, is_buyer tinyint(1) DEFAULT '0', is_deleted tinyint(3) unsigned NOT NULL DEFAULT '0', has_accepted tinyint(4) NOT NULL DEFAULT '0' COMMENT '1 = cookies policy', manual_fidelity_score smallint(5) unsigned DEFAULT NULL, system_fidelity_score smallint(5) unsigned DEFAULT NULL, PRIMARY KEY (id), UNIQUE KEY i_email_country (email,country), UNIQUE KEY i_country_customer_id (country,customer_id), KEY i_created_at (created_at), KEY i_referrer_id (referrer_id), KEY i_customer_id (customer_id), KEY i_email_change_address (email_change_address), KEY i_is_active (is_active), KEY i_is_deleted (is_deleted), KEY is_buyer (is_buyer), KEY first_purchase (first_purchase), KEY last_purchase (last_purchase), KEY i_modified_at (modified_at) ) ENGINE=InnoDB AUTOINCREMENT=6718314 DEFAULT CHARSET=utf8;



83.0k
edited Nov '14

I was checking the issue with PDO/Db and I see the following:

Scenario 1 (PDO + MYSQL_ATTR_USE_BUFFERED_QUERY:false):

$instance = new PDO('mysql:host=localhost;dbname=assets;charset=utf8', 'root', 'secret');
$instance->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$instance->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);

echo memory_get_usage(), PHP_EOL;

$x = $instance->prepare('SELECT * FROM audit');
$p = $x->execute();

echo memory_get_usage(), PHP_EOL;

$x->rowCount();

echo memory_get_usage(), PHP_EOL;

Results in:

234832
242296
242296

Scenario 1 (PDO + MYSQL_ATTR_USE_BUFFERED_QUERY:true):

$instance = new PDO('mysql:host=localhost;dbname=assets;charset=utf8', 'root', 'secret');
$instance->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$instance->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, true);

echo memory_get_usage(), PHP_EOL;

$x = $instance->prepare('SELECT * FROM audit');
$p = $x->execute();

echo memory_get_usage(), PHP_EOL;

$x->rowCount();

echo memory_get_usage(), PHP_EOL;

Results in:

234832
73682784
73682784

Scenario 3 (PDO + MYSQL_ATTR_USE_BUFFERED_QUERY:default):

<?php

$db = new \Phalcon\Db\Adapter\Pdo\MySQL(array(
        'username' => 'root',
        'password' => 'secret',
        'dbname' => 'assets',
));

$r = $db->query('SELECT * FROM audit');
echo memory_get_usage(), PHP_EOL;

$r->getInternalResult()->rowCount();

echo memory_get_usage(), PHP_EOL;

Shows:

73685824
73685872

Scenario 4 (PDO + MYSQL_ATTR_USE_BUFFERED_QUERY:false):

$db = new \Phalcon\Db\Adapter\Pdo\MySQL(array(
        'username' => 'root',
        'password' => 'assets',
        'dbname' => 'secret',
        'options' => array(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => false)
));

$r = $db->query('SELECT * FROM audit');
echo memory_get_usage(), PHP_EOL;

$r->getInternalResult()->rowCount();

echo memory_get_usage(), PHP_EOL;

exit;

Results:

246160
246208

It seems the driver only fetches all the data when querying if MYSQLATTRUSEBUFFEREDQUERY is not false.

So changing this field in the connect parameters will fix the bug with Model::find()?

It looks like it won't make it crash & burn, but you don't get a row count either with the buffering disabled?



7.1k
edited Nov '14

This is behavior of PDO.

this code has same result

<?php

ini_set('memory_limit', '-1');

$conn = new PDO('mysql:dbname=polaris;host=127.0.0.1', 'root', 'corner',array(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY=>false));

echo $conn->getAttribute(constant('PDO::MYSQL_ATTR_USE_BUFFERED_QUERY')) . "\n";

echo $conn->getAttribute(constant('PDO::ATTR_DRIVER_NAME')) . "\n";

$sql = 'SELECT * from parts;';

echo memory_get_peak_usage(true), PHP_EOL; 
$res = $conn->query($sql);

echo memory_get_peak_usage(true), PHP_EOL;
var_dump($res->rowCount());

I think, the only way to solve this problem - refusal of the PDO in favor MySQLi

Sorry, It will not work too...

The behaviour of mysqli_num_rows() depends on whether buffered or unbuffered result sets are being used. For unbuffered result sets, mysqli_num_rows() will not return the correct number of rows until all the rows in the result have been retrieved.


7.1k
edited Nov '14

Nonetheless... This code use mysqli directly without PDO

echo "Memory limit : ", ini_get('memory_limit'), PHP_EOL;

$mysqli = new mysqli("localhost", "root", "corner", "polaris");


echo "Memory use before query : ", memory_get_peak_usage(true)/1024, "K", PHP_EOL;

$res = $mysqli->query("SELECT * from parts");

echo "Memory use after query : ", memory_get_peak_usage(true)/1024, "K", PHP_EOL;


echo "Number of rows : ", $res->num_rows, PHP_EOL;

$res->data_seek(1234);

$record = $res->fetch_object();

var_dump($record);

Result :

Memory limit : 512M
Memory use before query : 256K
Memory use after query : 178432K
Number of rows : 3249787
object(stdClass)#3 (6) {
  ["id"]=>
  string(4) "1235"
  ["assembly_id"]=>
  string(2) "41"
  ["tag"]=>
  string(2) "29"
  ["sku"]=>
  string(7) "5220588"
  ["name"]=>
  string(18) "PLATE,REAR MOUNT  "
  ["qty"]=>
  string(1) "1"
}

And PDO code

ini_set('memory_limit', '-1');

$conn = new PDO('mysql:dbname=polaris;host=127.0.0.1', 'root', 'corner',array(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY=>true));

echo $conn->getAttribute(constant('PDO::MYSQL_ATTR_USE_BUFFERED_QUERY')) . "\n";

echo $conn->getAttribute(constant('PDO::ATTR_DRIVER_NAME')) . "\n";

$sql = 'SELECT * from parts;';

echo "Memory use before query : ", memory_get_peak_usage(true)/1024, "K", PHP_EOL; 
$res = $conn->query($sql);

echo "Memory use after query : ", memory_get_peak_usage(true)/1024, "K", PHP_EOL;
var_dump($res->rowCount());

Result :

1
mysql
Memory use before query : 256K
Memory use after query : 1092864K
int(3249787)

http://jpauli.github.io/2014/07/21/php-and-mysql-communication-mysqlnd.html#mysqli



1.8k

Sooooo...? Do we just have to cross this "only one entry in memory" thing from the feature list? Or will there be support for unbuffered queries (not relying on the result size) in Phalcon?



83.0k

There's no bug in Model::find() or in Phalcon. Support for unbuffered queries depends on the initialization flags for PDO, if you see my previous comment, PDO itself (without Phalcon ORM/Db) fetches the whole resultset depending on whether buffered or unbuffered queries is enabled.



1.8k
edited Nov '14

The Problem with it is that it just doesn't work as advertised. The documenation clearly states:

These objects are more powerful than standard arrays. One of the greatest features of the Phalcon\Mvc\Model\Resultset is that at any time there is only one record in memory. This greatly helps in memory management especially when working with large amounts of data.

With buffered queries, thats just not true as you said yourself. The PDO fetches all records and so they are all in memory.

But Phalcon doesn't work with unbuffered queries either as you can see in this example (robots is a db table with ~800MB worth of entries):

use Phalcon\DI;
use Phalcon\Db\Adapter\Pdo\Sqlite as Connection;
use Phalcon\Mvc\Model;

$di = new DI\FactoryDefault();

//Setup a connection
$di->set('db', function(){
    return new Phalcon\Db\Adapter\Pdo\Mysql(array(
        "host" => "localhost",
        "username" => "root",
        "password" => "secret",
        "dbname" => "test",
        "options" => array(\PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => false)
    ));
 });

//Create a model
class Robots extends Model {}

//Use the model
$robots = Robots::find();

echo ($robots->valid() ? 'true' : 'false') . PHP_EOL;
echo count($robots) . PHP_EOL;
var_dump($robots->toArray());

$first = $robots->getFirst();
var_dump($first ? $first->toArray() : null);

The output is: ``` false 0 array(0) { } NULL ```

Thats clearly not the desired behavior and it would be awesome if Phalcon could recognized an unbuffered query and act accordingly.

I don't know why its behaving so strangely. But in my case if I rewind the pointer after fetching the result. This error never occur. Although before executing rewind() and after executing it. The pointer key is at 0. So its something like this:

$data = Data::find();
$data->rewind();
while($data->valid()){
    // do whatever you want
    $data->next();
}

It's actually not possible to write a long-running process with Phalcon due to this situation. Every query allocates 1mb (default buffer size) of memory. When processing jobs in a loop, memory usage increases at 1mb increments until all memory is exhausted.

Disabling the buffer using \PDO::MYSQLATTRUSEBUFFEREDQUERY breaks all ->count() operations within the ORM.

It would be amazing if somehow count() could fall back to some other method that doesn't require the buffer.



477
edited Dec '16

I think I have the same problem on version 3. can't fix.