|
HiCLAS1 Technical Reports
HPC-2007-3: AERMOD-HPCS and AERMOD-EPA Workload Through-put (dual CPU node) Copyright © 2007 HiCLAS1 George Delic and Arnold R. Srackangast
1. INTRODUCTION This is a workload throughput analysis report for IA-32 commodity platforms when applied to the Air Quality Model (AQM) AERMOD. Results are presented for AERMOD in two version: the executable model released by the U.S. EPA (hereafter AERMOD-EPA) and the High Performance Computing (HPC) version developed by HiCLAS1 (AERMOD-HPCS). Both version are designed to execute the U.S. EPA's regulatory AERMOD model on a single processor CPU (or core). The purpose of this report is to display throughput results observed for both versions of AERMOD on commodity architectures when the workload is a second (concurrent) execution of the same model. 2.0 CHOICE OF HARDWARE AND OPERATING SYSTEM The hardware used for the results reported here is the Intel Pentium 4 Xeon processor identified as Machine B in Table 1 of the report HPC-2007-1. This hardware is a node with dual single core processors on a FSB architecture using the Microsoft Windows™ operating system for 32-bit architectures. The results found are typical of such platforms and may be considered as representative. Subsequent reports will investigate results for multi-core CPUs. 3.0 CHOICE OF COMPILERS The compiler used for AERMOD-EPA executable distributed by the U.S. EPA is not known, but is assumed to the the Compaq Visual Fortran compiler (CVF), as is discussed in a previous report (HPC-2007-2). The executable distributed by the U.S. EPA was obtained from the distribution center at http://www.epa.gov/scram001 and applied in all the results designated here as AERMOD-EPA. Other results designated here as AERMOD-HPCS were obtained from a compilation of AERMOD-HPCS source code that was modified away from the U.S. EPA source distribution available at the above named U.S. EPA SCRAM Web portal. The compiler used for AERMOD-HPCS in this analysis (and distribution) is un-named but has been chosen after testing of the most popular compilers currently available. For the purposed of clarity, Table 1 summarizes the notation used here, the compilers, and the arithmetic precision used in the compilation.
4.0 CHOICE OF BENCHMARKS The AERMOD model describes pollutant dispersion and deposition and is now an approved regulatory model for new source reviews and other permitting applications. It is available in the AERMOD-EPA version at the U.S. EPA’s Support Center for Regulatory Air Models at the URL portal named above. The version used here is AERMOD 07026 and is designated as described in Table 1. The High Performance Computing (HPC) version of AERMOD is designated AERMOD-HPCS. For workload throughput testing the four Cases listed in Table 2 were used as benchmarks. These benchmarks are considered to be representative of actual applications for AERMOD. The workload impact test consisted of first executing a single model run on a dedicated platform, and then executing two model runs concurrently on the same platform. In the latter case the workload is the other (concurrent) execution of the model.
5.0 BENCHMARK RESULTS For machine B described in the previous report and cases listed in Table 2, respectively, single and dual concurrent executions of AERMOD-HPCS were initiated. Table 3 shows the results and the overhead observed for the dual concurrent executions in two ways. First, the wall clock time increases for both the concurrent runs compared to a single execution on the same resource. Second, the resulting increase in wall clock time is customarily measured by the expansion factor. This is defined as the ratio of wall clock times observed with and without a workload, respectively. The last column of Table 3 shows the expansion factor for the cases of Table 2. Values larger than 1 are to be expected when a workload is present as any application competes with the workload for resources. In this case the workload is the other execution of AERMOD-HPCS. It should be noted that since there are two CPU's the competition is not for arithmetic processing, or on-board resources such as cache, but for memory and I/O access via the FSB. During such interrupts the corresponding CPU pipeline will stall while such instructions await processing.
Table 4 summarizes the corresponding results of the U.S. EPA model AERMOD-EPA. Here the time was captured only to the nearest minute and thus there is some coarse timing granularity in the first two cases. Apart from the obvious feature that AERMOD-HPCS is much faster than AERMOD-EPA (as discussed in report HPC-2007-1), the increase in wall-clock time is also much larger for the concurrent AERMOD-EPA executions indicating a larger overhead with that model when a workload is present. Another way of seeing this impact is to compare the ratios of times for AERMOD-EPA and AERMOD-HPCS for single and dual model executions. For case 5 the AERMOD-HPCS versus EPA version shows speed-up factors of 2.19 (single) and 2.34 (dual) executions, respectively.
The Fig. 1 shows a graphical representation of the ratio of the increase in wall-clock time for AERMOD-EPA to that for AERMOD-HPCS. This shows clearly that when two concurrent runs of the EPA model execute they suffer a larger expansion in run time relative to AERMOD-HPCS for longer running cases.
Fig. 1: For each of the four cases listed in Tables 3 and 4, this shows the ratio of values in the fourth column of Tables 4 and 3 respectively. Thus Case 5 has the value 145 / 29.5 = 4.9 showing that the relative increase in wall clock time, due to a workload being present, is much larger for the U.S. EPA model when compared to AERMOD-HPCS. 6.0 CONCLUSIONS This workload analysis of AERMOD in either the EPA or AERMOD-HPCS version shows that both take longer to complete when a workload is present (two executions are concurrent) on a Microsoft Windows™ platform. This is due to a competition for resources other than arithmetic processing units and registers. However, while the expansion factors are similar for shorter run cases, they differ in the longest running case. Specifically, AERMOD-HPCS is not only, more than two times faster than AERMOD-EPA, but it suffers considerably less from a competing workload being present on the same resource. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||