Product import takes too long

Help for integrating the Laravel package
Forum rules
Always add your Laravel, Aimeos and PHP version as well as your environment (Linux/Mac/Win)
Spam and unrelated posts will be removed immediately!
kdim95
Advanced
Posts: 210
Joined: 26 Aug 2022, 12:17

Product import takes too long

Post by kdim95 » 17 Apr 2024, 11:00

Laravel framework version: 11.3.1
Aimeos Laravel version: 2023.10.8
PHP Version: 8.2.17
Environment: Linux
aimeoscom/ai-elastic: 2023.04.*

Hello,

I am running an import of 122,000 products and the entire import process takes around 2 hours.
Is it normal for the import process to take this long?
What can I do to make the import time shorter?

I'm using only the elastic index, this is my configuration:

Code: Select all

return [
    'resource' => [
        'es' => [
            'hosts' => [
                '127.0.0.1:9200',
            ],
            'index' => 'aimeos',
            // 'SSLVerification' => false, // for self-signed certificates
            // 'basicAuthentication' => ['elastic', '<password>'], // ElasticSearch 8+
            'selectorClass' => '\Elasticsearch\ConnectionPool\Selectors\StickyRoundRobinSelector',
            'settings' => [
                'number_of_shards' => 4, // Distribute data across multiple nodes ( large indexes are split into smaller 'shards' )
                'number_of_replicas' => 3, // Number of copies of primary shards ( redundancy and search speed )
                'max_result_window' => 200000, // maximum number of results retrieved
                // 'refresh_interval' => -1, // for initial indexing only
            ],
            // 'norefresh' => false, // for initial indexing only
        ],
    ],

    'mshop' => [
        'index' => [
            'manager' => [
                'name' => 'Elastic',
                'attribute' => [
                    'name' => 'Elastic',
                ],
                'catalog' => [
                    'name' => 'Elastic',
                ],
                'price' => [
                    'name' => 'Elastic',
                ],
                'supplier' => [
                    'name' => 'Elastic',
                ],
                'text' => [
                    'name' => 'Elastic',
                ],
            ],
        ],
        'product' => [
			'manager' => [
				'name' => 'Elastic',
				'lists' => [
					'name' => 'Elastic',
					'type' => [
						'name' => 'Elastic',
					],
				],
				'property' => [
					'name' => 'Elastic',
					'type' => [
						'name' => 'Elastic',
					],
				],
				'type' => [
					'name' => 'Elastic',
				],
			]
		],
    ]
];
Best regards

User avatar
aimeos
Administrator
Posts: 7932
Joined: 01 Jan 1970, 00:00

Re: Product import takes too long

Post by aimeos » 19 Apr 2024, 12:39

Which importer do you use? CSV, XML or an own implementation?
Professional support and custom implementation are available at Aimeos.com
If you like Aimeos, Image give us a star

kdim95
Advanced
Posts: 210
Joined: 26 Aug 2022, 12:17

Re: Product import takes too long

Post by kdim95 » 19 Apr 2024, 13:22

Hello,

I am using the default XML importer included with Aimeos.
With the command php artisan aimeos:jobs product/import/xml.

kdim95
Advanced
Posts: 210
Joined: 26 Aug 2022, 12:17

Re: Product import takes too long

Post by kdim95 » 22 Apr 2024, 11:34

Hello,

Do you have any update on this?

Best regards

nos3
Posts: 89
Joined: 01 Sep 2015, 13:26

Re: Product import takes too long

Post by nos3 » 22 Apr 2024, 11:44

The XML importer is the fastest standard option but it still fetches the products, updates them and stores everything back. When using ElasticSearch and your extension, it's much faster to assign an ID locally and just store/overwrite the data in ES without fetching the products first. Then, it's possible to import 100k products in minutes instead of two hours.

kdim95
Advanced
Posts: 210
Joined: 26 Aug 2022, 12:17

Re: Product import takes too long

Post by kdim95 » 22 Apr 2024, 12:20

By "storing locally" you mean having the products in both the database and elastic?
E.g. changing the config to have the products in both the database and in the elastic index?

nos3
Posts: 89
Joined: 01 Sep 2015, 13:26

Re: Product import takes too long

Post by nos3 » 23 Apr 2024, 07:04

No, I've said "assign an ID locally and just overwrite the data in ES (without fetching the products first)". Products need to be in ES only but not with an autogenerated ID from ES. Then, you can overwrite the products in ES without the need to fetch them first.

kdim95
Advanced
Posts: 210
Joined: 26 Aug 2022, 12:17

Re: Product import takes too long

Post by kdim95 » 23 Apr 2024, 11:24

Hello,

I'm afraid I don't understand you, am I supposed to change the default import logic of your XML importer?
What and where exactly should I change?

Best regards

User avatar
aimeos
Administrator
Posts: 7932
Joined: 01 Jan 1970, 00:00

Re: Product import takes too long

Post by aimeos » 24 Apr 2024, 14:10

We've added a new config option for the product XML importer in the master branch which allows replacing products by their "ref" value when using document-oriented storages like ElasticSearch. You can check the commit so see what that means:
https://github.com/aimeos/ai-controller ... 3f96094a41
Professional support and custom implementation are available at Aimeos.com
If you like Aimeos, Image give us a star

kdim95
Advanced
Posts: 210
Joined: 26 Aug 2022, 12:17

Re: Product import takes too long

Post by kdim95 » 15 May 2024, 08:11

Hello,

I have the updated code you mentioned.
The import process still takes too long.

I am doing some measurements in the importNodes() function.
The procedure that takes most time is $manager->save( $item );

Code: Select all

/**
	 * Imports the given DOM nodes
	 *
	 * @param \DomElement[] $nodes List of nodes to import
	 */
	protected function importNodes( array $nodes )
	{
		$codes = [];

		$size = sizeof( $nodes );

		foreach( $nodes as $index => $node )
		{
			if( ( $attr = $node->attributes->getNamedItem( 'ref' ) ) !== null ) {
				$codes[$attr->nodeValue] = null;
			}
		}
		
		$start = microtime(true);
		$manager = \Aimeos\MShop::create( $this->context(), 'index' );
		$search = $manager->filter()->slice( 0, count( $codes ) )->add( ['product.code' => array_keys( $codes )] );
		$map = $manager->search( $search, $this->domains() )->col( null, 'product.code' );
		$index_search_time = microtime(true) - $start;
		$this->total_execution_time += $index_search_time;

		$product_process_time = 0;
		$product_save_time = 0;
		$type_add_time = 0;

		foreach( $nodes as $node )
		{
			if( ( $attr = $node->attributes->getNamedItem( 'ref' ) ) !== null && isset( $map[$attr->nodeValue] ) ) {
				$start = microtime(true);
				$item = $this->process( $map[$attr->nodeValue], $node );
				$product_process_time += microtime(true) - $start;
			} else {
				$start = microtime(true);
				$item = $this->process( $manager->create(), $node );
				$product_process_time += microtime(true) - $start;
			}

			$start = microtime(true);
			$manager->save( $item );
			$product_save_time += microtime(true) - $start;

			$start = microtime(true);
			$this->addType( 'product/type', 'product', $item->getType() );
			$type_add_time += microtime(true) - $start;
		}

		$this->total_execution_time += $product_process_time;
		$this->total_execution_time += $product_save_time;
		$this->total_execution_time += $type_add_time;

		// Print execution times with equal distances
		printf(
			"Execution times:\n\tTotal execution time: %20.14f\n\tIndex search time: %20.14f\n\tProduct process time: %20.14f\n\tProduct save time: %20.14f\n\tType add time: %20.14f\n",
			$this->total_execution_time,
			$index_search_time,
			$product_process_time,
			$product_save_time,
			$type_add_time
		);
	}
Here is some measurement data (the values are in seconds):

Code: Select all

Execution times:
        Total execution time:     7.09538602828979
        Index search time:     0.06340694427490
        Product process time:     0.14792037010193
        Product save time:     6.88343572616577
        Type add time:     0.00062298774719
Execution times:
        Total execution time:    14.13299822807312
        Index search time:     0.03748011589050
        Product process time:     0.15118074417114
        Product save time:     6.84836721420288
        Type add time:     0.00058412551880
Execution times:
        Total execution time:    21.25851416587830
        Index search time:     0.03790497779846
        Product process time:     0.14960741996765
        Product save time:     6.93744492530823
        Type add time:     0.00055861473083
Execution times:
        Total execution time:    28.37132883071899
        Index search time:     0.04441905021667
        Product process time:     0.15412592887878
        Product save time:     6.91371273994446
        Type add time:     0.00055694580078
Every group of execution times corresponds to 100 imported products.
Is it normal for it to take so much time?
For about 120,000 products it would take about 2:30 hours to import.

Best regards

Post Reply