|
|
|
|
How does task scheduling by HCTM work? HCTM is a self-scheduling task manager and does not apply any load balancing between the available threads (cores) selected by the user for task processing. Once the number of threads is selected in the task queue (file hctm.in), the maximum throughput is achieved when all allocated cores are occupied concurrently for the maximum amount of time. To maximize throughput across the selected number of cores consideration should be given to the expected duration of each task in the task queue (hctm.in). A simple rule is to place the longest running tasks at the head of the queue and the shortest tasks at the end of the queue. This ordering in decreasing runtime avoids the possibility that shorter tasks have already completed while the last task takes the longest time and leaves all other cores idle as it occupies one core to complete. Three examples of task queues are included in this distribution to demonstrate HCTM operation and throughput performance with different models and data. Exact throughput results will depend on the processor generation and total number of cores allocated. Back to top of page. How does HCTM deal with multithreaded models? While both single (serial) and multi threaded (parallel) tasks are allowed in the input queue a prudent approach that may optimize workload throughput is to separate them into separate task queue files based on the number of threads the model will use. The following cautions should be noted:
In this last case, when the number of threads demanded by any (or all) parallel tasks in the queue is comparable to the available number of cores on the platform, then the conservative approach is to select as 1 for the number of threads in the input file hctm.in. Back to top of page. What about the memory capacity required? The memory footprint of HCTM is very small. However, for the models it processes care should be exercised by estimating the total memory required when tasks execute concurrently. To estimate this multiply the memory requirement of each concurrently executing model by the number of threads allocated by HCTM (for a single model type), or the sum of individual models (if models are different). When the total memory requirement of all models executed concurrently by the thread team exceeds the physical memory available on the computer then page swapping will lower throughput performance. Back to top of page.
|
Send mail to
george@hiclas1.com
with
questions or comments about this web site.
|