Case Study Team
|Id||Name||Institution (type*)||Related WG||Role in the team|
|1||Elisabeth Larsson||Uppsala University, SE (A)||WG4, WG2||Coordinator|
|Id||Name||Institution (type*)||Related WG|
|1||Corinne Ancourt||MINES ParisTech, FR (A)||WG2|
|2||Clemens Grelck||University of Amsterdam, NL (A),||WG2|
|3||Giorgio Giordanengo||Politecnico di Torino, IT (A)||WG4|
|4||Christoph Kessler||Linköping University, SE (A)||WG2|
|5||Francesca Vipiana||Politecnico di Torino, IT (A)||WG4|
|6||Afshin Zafari,||Uppsala University, SE (A)||WG2|
In industrial antenna design and electromagnetic compatibility simulation, large 3-D problems need to be solved. A commonly used approach is the Fast Multipole Method (FMM) and other methods from the same family. The algorithm has a hierarchical structure and is difficult to implement with traditional methods. The main challenges are
- All-to-all communication across all levels.
- How to overlap with computations and communication efficiently.
- Unstructured and problem dependent granularity of tasks.
- Variable levels of parallelism across the execution.
- Distributed load balancing. Migration of data between iterations could be one solution.
The case study has in part been carried in collaboration with a company whose interest it is to understand what is the best path to take for future development of their software products. Large costs are associated with rewriting an industrial code, both with respect to changes in the algorithm, and with respect to the methods of parallelization. The current implementation does not support distributed computing, and this is in many cases needed for the relevant problem sizes.
Existing Solution(-s) (Models, Tools)
An efficient 3-D FMM solver has been implemented in StarPU, but it is for problems where particles are distributed in a volume, and in the typical electromagnetic case, charges are distributed on a surface, making the work size at the finest level much smaller.
Industrial codes have been parallelized with OpenMP (not using tasks), but the resulting implementations do not manage to fully exploit parallel hardware.
There are open source implementations like PUMA-EM. Licenses do not currently allow for industrial use (unless open source).
Proposed Solution(-s) (Models, Tools)
Proposed directions for the implementation are
- Task based parallelism, merging small tasks, mixing of stages.
- Memory management (e.g., some duplication of data for speed).
- Use of heterogeneous computing with some parts of the code being run on accelerator hardware.
There are (at least) two different formulations of the problem. Using the kernel independent formulation, all tasks can be of a similar size, and with operations of matrix-vector product type. This is beneficial for the parallelization.
For the kernel-based formulations, the translation operations in the middle of the algorithm are scalar products, which have a lower computational density, and impact the efficiency. Furthermore, the tasks at the courser levels are orders of magnitudes larger, computationally heavy, and will need to be split over different threads/processes in order to have enough parallelism.
Practical Scenarios (-s)
For solving industrial scale problems such as the simulation of a whole aircraft or satellite, the numbers of unknowns range from millions to billions. And simulation times can be of the order of months, making efficient solvers highly relevant.
Task parallel implementation of a solver for electromagnetic scattering problems