You need to configure the following to handle large volume data.
- Configure larger Kernel memory
- Use high-speed multi-core processors
- DB lookups - use same connection rather than new connection per record
- DB lookups - implement database caching inside Data Mapper
- XML lookups - use for better performance compared to DB lookups by pre-fetching database records into XML and then doing XML lookups
- Schema - skip non-required field in source schema
- Handling a large number of files in file event - abpm.event.concurrency
- Handling mail source activity errors - abpm.mailEvent.retry
- Maintenance - enable frequent cleanup of logs and repository
- Mapping - use splitting and parallel processing in Mapping. There are the following scenarios in which you can enable splitting in the Data Mapper.
For information on how to apply splitting, click here.
- Multiple concurrent jobs can be running or the mapping rules are complex or have external DB lookups: File size is more than 1% of the Kernel heap size. For example, if the Kernel heap size is 12GB then use splitting if the file size is more than 120MB.
- Multiple concurrent jobs can be running and the mapping rules are simple and there are no external DB lookups: File size is more than 2% of the Kernel heap size. For example, if the Kernel heap size is 12GB then use splitting if the file size is more than 240MB.
- Note that it may not be possible to enable splitting in the data mapping if there are aggregate functions that are used in the mapping rules that require all records to be processed at the same time. In this case, we recommend using a solution design recommended below.
- Solution design - Split the data and do parallel processing within the process. If the file size is more than 8% of the Kernel heap size then we recommend the approach to process this file should be carefully considered. For example, if the Kernel heap size is 12GB then use a design approach given below if the file size is more than 1GB.
- Design process flow to enable processing in smaller chunks right from the beginning such as using more frequent triggers to get smaller chunks of source data or to have a pre-process step to split the source file before getting it to the schema and mapping stage.
- Another approach to consider would be to avoid using the Adeptia Data Mapper and to possibly use a custom plugin or a database bulk loader to simply load the file into a staging database and apply the processing rules in DB queries and stored procedures for the more efficient handling of a very large volume of data.