Developers/Controller/Import products from CSV

From Aimeos documentation

Developers
Other languages:
English 100%


2017.x+ version

Most of the time, you already have an ERP system which manages all your products and most of the information required to fill your shop are stored there. In this case, you can either create a new product manager to access the required information directly from the ERP system or - more likely - you need to import the product information from the ERP system into your shop and update it regularly. In the later case, you can use the product import job controller which processes CSV files and imports/updates the products accordingly.

The product import is triggered via a cronjob/scheduler that executes the "product/import/csv" job controller. The import button in the administration interface only imports texts for existing products but no products itself!

Data location and format

When you export your product data from your ERP system, you need to store the files in a location where they are accessible by the importer. This location has to be configured by the controller/jobs/product/import/csv/location setting and it must point to the import file or directory (depending on the container type you would like to use).

Several container types are supported as long as their content consists of CSV-like data:

  • Directory container / CSV files
  • Zip container / compressed CSV files
  • PHPExcel container / PHPExcel sheets

You can configure the container type and content format via the controller/jobs/product/import/csv/container/type and controller/jobs/product/import/csv/container/content settings. Depending on the container/content, you are able to use additional options which are described in the article about creating and reading container/content files

The default container type is "directory", so you need to configure a directory where one or more import files are stored! To be able to use the PHPExcel container, you need to install the ai-container extension.

For CSV files, there exists a wide range of possibilities about their format because it's not standardized besides the fact that fields are separated by comma (,) but even that isn't set in stone. As a guideline, you should use the following format, which is able to handle all edge cases and which can be also created by Excel/OpenOffice as well:

"a string","an ""escaped quote""","a new
line","a second string"\n

The basic rules are:

  • All field data must be surrounded by double quotation marks (")
  • Fields should be separated by comma
  • Double quotation marks need to be escaped by a double quotation mark (duplicating the character)
  • A new line character within the field data doesn't need to be treated specially
  • The line ending must be a new line character (or a "\r\n" produced by Windows)

Your CSV files can start with a header describing the columns so they are more readable by humans. In this case, you need to configure the import to skip these lines using the controller/jobs/product/import/csv/skip-lines configuration.

For additional tuning of the CSV import, the controller/common/product/import/csv/max-size setting can help you to speed up the import or to reduce the memory consumption of the importer. If you set the number of CSV lines processed in one chunk higher, it will save some database queries at the cost of more memory required to cache the result sets. Similarly, lower values will result in more database queries but also to less allocated memory, which is useful if you have a hard memory limit.

If you are using TYPO3, you have to put all configuration into the TSConfig field of the appropriate scheduler task. For all other frameworks, the settings must be added to the configuration file.

If something goes wrong or for the progress status when importing big files, messages will be written to the "madmin_log" table in your database. You can see them in the "Overview" tab of the expert mode administration interface.

Default mapping

You can freely configure how your data is organized in the CSV file but for a quick start, there's a default mapping available that can also be used as example:

  1. array(
  2. 	'item' => array(
  3. 		0 => 'product.code', // e.g. unique EAN code
  4. 		1 => 'product.label', // UTF-8 encoded text, also used as product name
  5. 		2 => 'product.type', // type of the product, e.g. "default" or "selection"
  6. 		3 => 'product.status', // enabled (1) or disabled (0)
  7. 	),
  8. 	'text' => array(
  9. 		4 => 'text.type', // e.g. "short" for short description
  10. 		5 => 'text.content', // UTF-8 encoded text
  11. 		6 => 'text.type', // e.g. "long" for long description
  12. 		7 => 'text.content', // UTF-8 encoded text
  13. 	),
  14. 	'media' => array(
  15. 		8 => 'media.url', // relative URL of the product image on the server
  16. 	),
  17. 	'price' => array(
  18. 		9 => 'price.currencyid', // three letter ISO currency code
  19. 		10 => 'price.quantity', // the quantity the price (for block pricing)
  20. 		11 => 'price.value', // price with decimals separated by a dot
  21. 		12 => 'price.taxrate', // tax rate with decimals separated by a dot
  22. 	),
  23. 	'attribute' => array(
  24. 		13 => 'attribute.code', // code of an attribute, will be created if not exists
  25. 		14 => 'attribute.type', // e.g. "size", "length", "width", "color", etc.
  26. 	),
  27. 	'product' => array(
  28. 		15 => 'product.code', // e.g. EAN code of another product
  29. 		16 => 'product.lists.type', // e.g. "suggestion" for suggested product
  30. 	),
  31. 	'property' => array(
  32. 		17 => 'product.property.value', // arbitrary value for the corresponding type
  33. 		18 => 'product.property.type', // e.g. "package-weight"
  34. 	),
  35. 	'catalog' => array(
  36. 		19 => 'catalog.code', // e.g. Unique category code
  37. 		20 => 'catalog.lists.type', // e.g. "promotion" for top seller products
  38. 	),
  39. )

As you can see, the data from the CSV file is mapped according to its field position in the line to the key of a MShop domain item (e.g. "product.code"). Additionally, fields that belong together are grouped together. The keys of these groups are the names of the data processors that cares about importing the data.

Test import

You can test the CSV import using the example file and default mapping: product-import-example.csv

Before you start, add these settings to your configuration:

The CSV file must be the only file in your configured directory. If you are using TYPO3, the configuration must be added to the TS-Config field of the scheduler task.

Adapt the mapping

In order to change the default mapping, you can either use the controller/common/product/import/csv/mapping or - specific for the job controller - the controller/jobs/product/import/csv/mapping setting.

You can freely rearrange the group names and e.g. put "media" before "text" or after "attribute". This changes the order at which the data is imported and in this example, first the product item, than the text and media data is imported and so on. The only hard requirement is that the "item" group must be the first one!

Similarly, the mapping inside the groups can be changed to your needs and can be reordered as well. The indexes of the mapping in the groups don't have to be consecutive either, e.g. the "product.code" is at index "0", the next field contains some irrelevant data and afterwards the "media.url" is stored at position "2" before the rest of the product item data follows at index "3" and above. This would lead to a mapping like this:

  1. array(
  2. 	'item' => array(
  3. 		0 => 'product.code', // e.g. unique EAN code
  4. 		5 => 'product.status', // enabled (1) or disabled (0)
  5. 		3 => 'product.label', // UTF-8 encoded text, also used as product name
  6. 		4 => 'product.type', // type of the product, e.g. "default" or "selection"
  7. 	),
  8. 	// ...
  9. 	'media' => array(
  10. 		2 => 'media.url', // relative URL of the product image on the server
  11. 	),
  12. 	// ...
  13. )

In fact, you can leave out indexes that shouldn't be imported and use an arbitrary index order. For the reason to understand the mapping immediately, you should at least use a more or less consecutive indexing within the processor group.

Processor groups

Each group in the mapping (e.g. "item", "text" or "media") defines the CSV fields that will be evaluated by the corresponding processor implementation. A processor is a class that cares about importing "its" data for the product, e.g. the "text" processor imports the data that will be stored in the mshop_text table and to associate this data via the mshop_product_list table to the corresponding product.

To speed up importing the data, all existing product related data is fetched at once for each product item. You can change the retrieved relations via the controller/common/product/import/csv/domains or - specific for the job controller - the controller/jobs/product/import/csv/domains setting. This is especially useful if you don't want to import certain relations like associated products as it reduces the amount of data retrieved from the storage and speeds up the import.

If you remove a name from the "domains" configuration but still have a mapping defined and your CSV file contains data for this domain, the import is likely to fail because the importer than tries to import duplicate entries. On the other hand, retrieving more domain items than necessary only slows down the importer.

All processors besides the "item" processor are able to import multiple sets at once, e.g. the "text" processor can store several texts available in the CSV line for one product. This means that you can configure the mapping to reference an arbitrary number of text, media, etc. data in one processor group:

  1. array(
  2. 	// ...
  3. 	'text' => array(
  4. 		5 => 'text.content',
  5. 		8 => 'text.type',
  6. 		6 => 'text.content',
  7. 		9 => 'text.type',
  8. 		7 => 'text.content',
  9. 		10 => 'text.type',
  10. 	),
  11. 	// ...
  12. )

This mapping tells the processor to expect three texts ("text.content") and their types ("text.type") in the CSV fields.

The data that belongs together needs to be listed together in the mapping! If the processor finds the same domain item key (e.g. "text.content") in the mapping, it assumes that a new data set begins.

The position of the data in the CSV fields is arbitrary again like shown in the example. It would expect a CSV structure like this:

...,"text 1","text 2","text 3","name","short","long",...

Contrary, this mapping won't work:

  1. array(
  2. 	// ...
  3. 	'text' => array(
  4. 		5 => 'text.content',
  5. 		6 => 'text.content',
  6. 		7 => 'text.type',
  7. 		8 => 'text.type',
  8. 	),
  9. 	// ...
  10. )

In this case the processor would assume that the first field (position 5) doesn't have a text type associated and the last field (position 8) doesn't contain a text content. Both will fail during the import.

Item

The "item" group is the most important data group because it contains the mapping for the product item that must be stored in the database before the rest of the processors can start to insert or update the data they are managing. It accepts data for these domain item keys:

  1. array(
  2. 	'item' => array(
  3. 		0 => 'product.code',
  4. 		1 => 'product.typeid',
  5. 		2 => 'product.type',
  6. 		3 => 'product.label',
  7. 		4 => 'product.status',
  8. 		5 => 'product.datestart',
  9. 		6 => 'product.dateend',
  10. 	),
  11. )

Values for code and type (or typeid) are the absolute minimum that is required to create or update a product item. If no status value is available, the value for "enabled" ("1") will be automatically added. The product label is also used for the product name if no text of type "name" is imported.

Keep in mind that the "product.code" item key must be the first field in each CSV line!

Text

Several product related texts can be part of each CSV line. Supported domain item keys are:

  1. array(
  2. 	'text' => array(
  3. 		0 => 'text.languageid',
  4. 		1 => 'text.typeid',
  5. 		2 => 'text.type',
  6. 		3 => 'text.label',
  7. 		4 => 'text.content',
  8. 		5 => 'text.status',
  9. 	),
  10. )

The content and the type (or typeid) is required as the minimum amount of data. If you don't have a CSV field for the language ID, the text is imported with the language of the first locale item of the site the importer is running for. Similarly, the label will be the content shorten to max. 100 bytes and the status is set to enabled ("1") if not available.

Additionally, you can import values for the product list relation as well:

  1. array(
  2. 	'text' => array(
  3. 		// ...
  4. 		6 => 'product.lists.typeid',
  5. 		7 => 'product.lists.type',
  6. 		8 => 'product.lists.datestart',
  7. 		9 => 'product.lists.dateend',
  8. 		10 => 'product.lists.config',
  9. 		11 => 'product.lists.position',
  10. 		12 => 'product.lists.status',
  11. 	),
  12. )

Here, the type (or typeid) is absolutely necessary. If no value for the position is available, the automatically calculated position is used. The status is set to "enabled" ("1") if not set explicitly.

If one or more relations should stay untouched, you can explicitly configure the list of product list types that will be inserted, updated or deleted via the controller/common/product/import/csv/processor/text/listtypes setting.

Media

Several product related images and other media types can be part of each CSV line. Supported domain item keys are:

  1. array(
  2. 	'media' => array(
  3.  		0 => 'media.languageid',
  4. 		1 => 'media.typeid',
  5. 		2 => 'media.type',
  6. 		3 => 'media.label',
  7. 		4 => 'media.mimetype',
  8. 		5 => 'media.preview',
  9. 		6 => 'media.url',
  10. 		7 => 'media.status',
  11. 	),
  12. )

The URL and the type (or typeid) is required as the minimum amount of data. If you don't have a CSV field for the language ID, the media item is imported with no language and is then considered language independent. The label and preview fields will be filled with the URL and the status is set to enabled ("1") if not available.

Additionally, you can import values for the product list relation as well:

  1. array(
  2.  	'media' => array(
  3. 		// ...
  4. 		8 => 'product.lists.typeid',
  5. 		9 => 'product.lists.type',
  6. 		10 => 'product.lists.datestart',
  7. 		11 => 'product.lists.dateend',
  8. 		12 => 'product.lists.config',
  9. 		13 => 'product.lists.position',
  10. 		14 => 'product.lists.status',
  11. 	),
  12. )

Here, the type (or typeid) is absolutely necessary. If no value for the position is available, the automatically calculated position is used. The status is set to "enabled" ("1") if not set explicitly.

If one or more relations should stay untouched, you can explicitly configure the list of product list types that will be inserted, updated or deleted via the controller/common/product/import/csv/processor/media/listtypes setting.

Price

Several product related prices can be part of each CSV line. Supported domain item keys are:

  1. array(
  2. 	'price' => array(
  3. 		0 => 'price.typeid',
  4. 		1 => 'price.type',
  5. 		2 => 'price.label',
  6. 		3 => 'price.currencyid',
  7. 		4 => 'price.value',
  8. 		5 => 'price.costs',
  9. 		6 => 'price.rebate',
  10. 		7 => 'price.taxrate',
  11. 		8 => 'price.status',
  12. 	),
  13. )

The value, currencyid and type (or typeid) are required as the minimum amount of data. If you don't have a CSV field for the status, it's set to enabled ("1").

Additionally, you can import values for the product list relation as well:

  1. array(
  2. 	'price' => array(
  3. 		// ...
  4. 		9 => 'product.lists.typeid',
  5. 		10 => 'product.lists.type',
  6. 		11 => 'product.lists.datestart',
  7. 		12 => 'product.lists.dateend',
  8. 		13 => 'product.lists.config',
  9. 		14 => 'product.lists.position',
  10. 		15 => 'product.lists.status',
  11. 	),
  12. )

Here, the type (or typeid) is absolutely necessary. If no value for the position is available, the automatically calculated position is used. The status is set to "enabled" ("1") if not set explicitly.

If one or more relations should stay untouched, you can explicitly configure the list of product list types that will be inserted, updated or deleted via the controller/common/product/import/csv/processor/price/listtypes setting.

Attribute

Several product related attributes can be part of each CSV line. Supported domain item keys are:

  1. array(
  2. 	'attribute' => array(
  3. 		0 => 'attribute.typeid',
  4. 		1 => 'attribute.type',
  5. 		2 => 'attribute.code',
  6. 		3 => 'attribute.label',
  7. 		4 => 'attribute.position',
  8. 		5 => 'attribute.status',
  9. 	),
  10. )

The value and the type (or typeid) is required as the minimum amount of data.

Attributes are usually shared among several products and therefore, the attribute processor associates them to the imported product. Nevertheless, if an attribute doesn't exist, it will be created automatically. You will then be able to enrich these attributes manually by adding texts for different languages or prices if it's an optional attribute. Furthermore, if you don't have a CSV field for the status, it's set to enabled ("1").

Additionally, you can import values for the product list relation as well:

  1. array(
  2. 	'attribute' => array(
  3. 		// ...
  4. 		6 => 'product.lists.typeid',
  5. 		7 => 'product.lists.type',
  6. 		8 => 'product.lists.datestart',
  7. 		9 => 'product.lists.dateend',
  8. 		10 => 'product.lists.config',
  9. 		11 => 'product.lists.position',
  10. 		12 => 'product.lists.status',
  11. 	),
  12. )

Here, the type (or typeid) is absolutely necessary. If no value for the position is available, the automatically calculated position is used. The status is set to "enabled" ("1") if not set explicitly.

If one or more relations should stay untouched, you can explicitly configure the list of product list types that will be inserted, updated or deleted via the controller/common/product/import/csv/processor/attribute/listtypes setting.

Product

Several product relations can be part of each CSV line. Supported domain item key is only

  1. array(
  2. 	'product' => array(
  3. 		0 => 'product.code',
  4. 	),
  5. )

and it's also the minimum amount of data. The real power of the product relations is in the values for the product list relation:

  1. array(
  2. 	'product' => array(
  3. 		// ...
  4. 		1 => 'product.lists.typeid',
  5. 		2 => 'product.lists.type',
  6. 		3 => 'product.lists.datestart',
  7. 		4 => 'product.lists.dateend',
  8. 		5 => 'product.lists.config',
  9. 		6 => 'product.lists.position',
  10. 		7 => 'product.lists.status',
  11. 	),
  12. )

Here, the type (or typeid) is absolutely necessary. If no value for the position is available, the automatically calculated position is used. The status is set to "enabled" ("1") if not set explicitly.

If one or more relations should stay untouched, you can explicitly configure the list of product list types that will be inserted, updated or deleted via the controller/common/product/import/csv/processor/product/listtypes setting.

Property

Several product related properties (non-shared product data) can be part of each CSV line. Supported domain item keys are:

  1. array(
  2. 	'property' => array(
  3. 		0 => 'product.property.typeid',
  4. 		1 => 'product.property.type',
  5. 		2 => 'product.property.languageid',
  6. 		3 => 'product.property.value',
  7. 	),
  8. )

The value and the type (or typeid) is required as the minimum amount of data. If you don't have a CSV field for the language ID, the property is imported with no language and is then considered language independent.

Catalog

Several category relations can be part of each CSV line. Supported domain item key is only

  1. array(
  2. 	'catalog' => array(
  3. 		0 => 'catalog.code',
  4. 	),
  5. )

and it's also the minimum amount of data. The real power of the catalog relations is in the values for the catalog list relation:

  1. array(
  2. 	'catalog' => array(
  3. 		// ...
  4. 		1 => 'catalog.lists.typeid',
  5. 		2 => 'catalog.lists.type',
  6. 		3 => 'catalog.lists.datestart',
  7. 		4 => 'catalog.lists.dateend',
  8. 		5 => 'catalog.lists.config',
  9. 		6 => 'catalog.lists.position',
  10. 		7 => 'catalog.lists.status',
  11. 	),
  12. )

Here, the type (or typeid) is absolutely necessary. If no value for the position is available, the automatically calculated position is used. The status is set to "enabled" ("1") if not set explicitly.

If one or more relations should stay untouched, you can explicitly configure the list of product list types that will be inserted, updated or deleted via the controller/common/product/import/csv/processor/catalog/listtypes setting.

Stock

If you want to set or update the stock levels during the product import as well, you can configure the mapping for your CSV file to support these domain item keys:

  1. array(
  2. 	'stock' => array(
  3. 		0 => 'stock.stocklevel',
  4. 		1 => 'stock.type',
  5. 		2 => 'stock.typeid',
  6. 		3 => 'stock.dateback',
  7. 	),
  8. )

The stock level is required as the minimum amount of data. If you don't have a CSV field for the type or type ID, the "default" type is assumed. Both, the "stock.stocklevel" and the "stock.dateback" values can be empty. For the stock level this means an unlimited value while for the "dateback" value it's an unknown date.

Resolving types

Most of the domain items need a type before they can be stored in the database. If no type is configured in the mapping or the field in the CSV line is empty, the type "default" is assumed. This only works if the "default" type is available and valid for the domain item. You can see what types are available for the different kind of domain items in the administration interface in the "Type" tab.

For domain items like texts or attributes, a type must be explicitly part of the CSV fields as there's no implicit assumption about the types for the fields. This means that the e.g. for the text data, you need to have a field in your CSV file that tells the processor of which type the associated text content is.

If you know the type IDs for the data, you can add a column with the type IDs for the data to the CSV file and map it to the "*.typeid" domain keys. Easier and more convenient is to specify the type codes instead as these values don't change - independent of the database used. The type code is then automatically translated to the corresponding type ID and added to the domain item before it's stored in the database.

  1. array(
  2. 	'item' => array(
  3. 		// ...
  4. 		2 => 'product.type', // type of the product, e.g. "default" or "selection"
  5. 		// ...
  6. 	),
  7. 	'text' => array(
  8. 		4 => 'text.type', // e.g. "short" for short description
  9. 		// ...
  10. 		6 => 'text.type', // e.g. "long" for long description
  11. 		// ...
  12. 	),
  13. 	// ...
  14. 	'attribute' => array(
  15. 		12 => 'attribute.type', // e.g. "size", "length", "width", "color", etc.
  16. 		// ...
  17. 	),
  18. 	'product' => array(
  19. 		// ...
  20. 		15 => 'product.lists.type', // e.g. "suggestion" for suggested product
  21. 	),
  22. 	'property' => array(
  23. 		// ...
  24. 		17 => 'product.property.type', // e.g. "package-weight"
  25. 	),
  26. 	'catalog' => array(
  27. 		// ...
  28. 		19 => 'catalog.lists.type', // e.g. "promotion" for top seller products
  29. 	),
  30. )

All the field values these "*.type" keys are mapped to will be translated into their type ID value automatically, so you don't have to care about changing IDs depending on the database you are importing your products.

Data converters

Not all data in the CSV file is already in the required format. Maybe the text encoding isn't UTF-8, the date is not in ISO format or something similar. In order to convert the data before it's imported, you can specify a list of converter objects that should be applied to the data from the CSV file.

To each field in the CSV file, you can apply one or more converters, e.g. to encode a Latin text to UTF8 for the second CSV field via the controller/common/product/import/csv/converter or - specific for the job controller - controller/jobs/product/import/csv/converter settings:

  1. array(
  2. 	1 => 'Text/LatinUTF8'
  3. )

Similarly, you can also apply several converters at once to the same field:

  1. array(
  2. 	1 => array(
  3. 		'Text/LatinUTF8',
  4. 		'DateTime/EnglishISO'
  5. 	)
  6. )

It would convert the data of the second CSV field first to UTF-8 and afterwards try to translate it to an ISO date format.

Keep in mind that the position of the CSV fields start at zero (0). If you only need to convert a few fields, you don't have to configure all fields. Only specify the positions in the array you really need!

The available converter objects are named "MW_Convert_<type>_<conversion>" where <type> is the data type and <conversion> the way of the conversion. In the configuration, the type and conversion must be separated by a slash (<type>/<conversion>).

To create your own custom converter, you can use the UTF-8 converter class as skeleton:

  1. namespace Aimeos\MW\Convert\Text;
  2.  
  3. class LatinUTF8 implements \Aimeos\MW\Convert\Interface
  4. {
  5. 	/**
  6. 	 * Translates a value to another one.
  7. 	 *
  8. 	 * @param mixed $value Value to translate
  9. 	 * @return mixed Translated value
  10. 	 */
  11. 	public function translate( $value )
  12. 	{
  13. 		return utf8_encode( $value );
  14. 	}
  15.  
  16. 	/**
  17. 	 * Reverses the translation of the value.
  18. 	 *
  19. 	 * @param mixed $value Value to reverse
  20. 	 * @return mixed Reversed translation
  21. 	 */
  22. 	public function reverse( $value )
  23. 	{
  24. 		return utf8_decode( $value );
  25. 	}
  26. }

The reverse() method isn't necessary for the product importer, but the converter classes are used for the search plugins as well which translates data to keep the defined semantics.

Backups

After a CSV file was imported successfully, you can move it to another location, so it won't be imported again and isn't overwritten by the next file that is stored at the same location in the file system. You should use an absolute path for the controller/jobs/product/import/csv/backup setting to be sure but can be relative path if you absolutely know from where the job will be executed from.

The name of the new backup location can contain placeholders understood by the PHP strftime() function to create dynamic paths, e.g. "backup/%Y-%m-%d" which would create "backup/2000-01-01". For more information about the strftime() placeholders, please have a look into the PHP documentation of strftime() function.

If no backup name is configured, the file or directory won't be moved away. Please make also sure that the parent directory and the new directory are writable so the file or directory could be moved.