System Name System 6
System Availability Available
System Category Datacenter · —
System Size 8x Accelerator 6
Model Name Llama-3.1-8B
Division Closed
Model Precision INT4
Model Link —
Transformation Link —
Model Notes INT4 precision weights
Dataset Name CNN_DailyMail
Dataset Type Accuracy + Performance
Average Input Tokens 870
Average Output Tokens 128
Dataset Link Hugging Face
Measured Accuracy Score
rouge1 38.7781
rouge2 16.135
rougeL 24.5952
rougeLsum 35.8385
gen_len 8193529
gen_num 13,368
Throughput vs Interactivity
Throughput vs Concurrency
Time to First Token vs Concurrency
Interactivity vs Concurrency
Processor
Processor Model Name Processor 6
Processors per Node 2
Cores Per Processor 32
VCPUs Per Processor 112
Accelerator
Accelerator Model Name Accelerator 6
Accelerators per Node 8
Memory Type
Memory Capacity 256 GB
Accelerator Interconnect
Host-Accelerator Interconnect
Host / Storage
Host Memory Capacity 1TB
Memory Configuration
Storage Capacity
Storage Type
Cooling Air-cooled
Hardware Notes
Framework vLLM
Operating System Linux
Other Software Inference Backend v1.0
Software Notes —
| Field Name | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Run 6 | Run 7 | Run 8 | Run 9 | Run 10 | Run 11 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Run Date | 02/25/2026 | 02/25/2026 | 02/25/2026 | 02/24/2026 | 02/24/2026 | 02/24/2026 | 02/24/2026 | 02/24/2026 | 02/24/2026 | 02/24/2026 | 02/24/2026 |
| Concurrency | 0.02 | 0.04 | 0.08 | 0.16 | 0.32 | 0.65 | 1.29 | 2.58 | 5.16 | 10.33 | 20.65 |
| System Tokens/Second | 0.8 | 1.5 | 3.0 | 5.6 | 9.8 | 15.8 | 22.6 | 28.9 | 33.0 | 34.3 | 34.1 |
| Tokens/Second per User | 37.5 | 37.2 | 36.8 | 34.4 | 30.5 | 24.5 | 17.5 | 11.2 | 6.4 | 3.3 | 1.7 |
| TTFT P99 (ms) | 26908.7 | 27402.5 | 29185.2 | 31673.7 | 36086.2 | 39429.6 | 47732.6 | 68134.4 | 451034.7 | 1194731.5 | 3113226.7 |
| Utilization | 2.2% | 4.4% | 8.7% | 16.2% | 28.7% | 46.2% | 65.9% | 84.5% | 96.4% | 100.0% | 99.5% |
| Configuration Summary | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 |
| QPS | 0.0059 | 0.0117 | 0.0232 | 0.0434 | 0.0769 | 0.1236 | 0.1764 | 0.2261 | 0.2580 | 0.2677 | 0.2665 |
| Total Output Tokens | 1,711,102 | 1,711,102 | 1,711,102 | 1,711,102 | 1,711,102 | 1,711,102 | 1,711,102 | 1,711,102 | 1,711,102 | 1,711,102 | 1,711,102 |
| Run Duration (s) | 2,259,982.38 | 1,141,485.97 | 576,223.41 | 307,982.80 | 173,785.54 | 108,186.87 | 75,802.35 | 59,127.94 | 51,815.82 | 49,934.08 | 50,169.11 |
| Total Requests | 13,368 | 13,368 | 13,368 | 13,368 | 13,368 | 13,368 | 13,368 | 13,368 | 13,368 | 13,368 | 13,368 |
| Time To First Token (TTFT) (ms) | |||||||||||
| Minimum | 2853.0 | 2789.6 | 2948.3 | 3196.0 | 3766.4 | 4002.5 | 4724.4 | 6043.5 | 7391.1 | 8280.5 | 139440.2 |
| Average | 11170.8 | 11588.5 | 12018.1 | 13256.1 | 14154.0 | 15585.6 | 18785.5 | 24378.0 | 38803.5 | 216163.1 | 1791993.7 |
| P50 | 9548.7 | 10727.4 | 10837.1 | 11690.2 | 12511.8 | 13337.2 | 16099.5 | 19949.7 | 28654.0 | 71737.5 | 1778153.4 |
| P90 | 18411.8 | 19277.9 | 20267.0 | 21765.2 | 23382.8 | 25727.0 | 29788.2 | 36443.7 | 50932.1 | 728659.3 | 2210943.7 |
| P95 | 22283.7 | 22669.0 | 23326.8 | 25250.9 | 27298.6 | 29959.8 | 34743.1 | 43103.5 | 61597.1 | 863516.7 | 2319031.9 |
| P99 | 26908.7 | 27402.5 | 29185.2 | 31673.7 | 36086.2 | 39429.6 | 47732.6 | 68134.4 | 451034.7 | 1194731.5 | 3113226.7 |
| P999 | 30889.4 | 31295.5 | 34471.3 | 43249.7 | 55020.2 | 176489.8 | 277677.9 | 430841.1 | 773465.8 | 1561933.0 | 3510064.9 |
| Maximum | 97148.2 | 138855.5 | 135940.0 | 131553.2 | 150596.1 | 227363.1 | 333904.2 | 516537.2 | 855429.3 | 1646303.7 | 3696981.3 |
| Time Per Output Token (TPOT) (ms) | |||||||||||
| Minimum | 1206.4 | 1210.3 | 1207.7 | 1209.1 | 1225.3 | 1270.7 | 1484.3 | 1610.4 | 1812.0 | 2560.5 | 2736.4 |
| Average | 1243.0 | 1253.2 | 1262.7 | 1346.2 | 1525.3 | 1913.8 | 2704.5 | 4257.5 | 7482.8 | 13160.0 | 15477.1 |
| P50 | 1219.5 | 1227.6 | 1229.7 | 1324.6 | 1509.4 | 1895.5 | 2681.7 | 4238.9 | 7568.0 | 13714.6 | 15686.2 |
| P90 | 1308.5 | 1324.5 | 1353.0 | 1481.1 | 1713.6 | 2179.5 | 3155.8 | 5076.5 | 9076.2 | 16005.4 | 15964.9 |
| P95 | 1415.3 | 1377.5 | 1399.1 | 1530.1 | 1773.2 | 2268.2 | 3300.6 | 5326.9 | 9663.2 | 16136.6 | 16310.0 |
| P99 | 1569.1 | 1514.2 | 1489.7 | 1626.8 | 1889.9 | 2440.2 | 3602.5 | 5869.1 | 11001.3 | 16527.0 | 17347.7 |
| P999 | 1619.2 | 1673.8 | 1600.2 | 1725.6 | 2014.6 | 2689.9 | 3963.4 | 6498.4 | 12398.0 | 17795.9 | 18150.7 |
| Maximum | 1648.7 | 1746.0 | 1730.9 | 1846.3 | 2208.8 | 3111.7 | 4213.5 | 6725.1 | 12907.3 | 18637.7 | 19061.6 |
| Request Latency (ms) | |||||||||||
| Minimum | 156356.0 | 156856.4 | 157695.9 | 159173.1 | 163560.2 | 174309.1 | 212693.1 | 228987.9 | 252268.4 | 363538.8 | 1944601.4 |
| Average | 169025.7 | 170747.1 | 172380.9 | 184225.2 | 207867.9 | 258640.5 | 362257.6 | 565084.4 | 989118.2 | 1887483.0 | 3757582.9 |
| P50 | 165713.2 | 168435.6 | 169881.0 | 181998.5 | 205649.8 | 256044.2 | 358827.1 | 560736.8 | 996886.7 | 1882921.6 | 3765010.8 |
| P90 | 181493.3 | 183122.0 | 186334.3 | 202945.5 | 233677.7 | 294345.7 | 422170.0 | 671468.7 | 1198526.4 | 2709295.2 | 4202160.8 |
| P95 | 191650.0 | 188899.7 | 192346.3 | 209639.4 | 242036.7 | 306292.2 | 441772.7 | 705595.0 | 1283463.2 | 2858369.6 | 4300685.2 |
| P99 | 211857.3 | 204773.7 | 204855.8 | 224111.4 | 258775.9 | 331074.9 | 492153.7 | 801334.9 | 1521684.7 | 3038176.8 | 5084278.0 |
| P999 | 224347.5 | 226696.7 | 219617.3 | 242265.8 | 289769.4 | 443582.8 | 642870.1 | 1039406.8 | 1799376.3 | 3344532.9 | 5485008.0 |
| Maximum | 251524.2 | 314254.8 | 313454.2 | 329896.9 | 362202.9 | 523346.8 | 680619.3 | 1113839.4 | 1897840.0 | 3441031.1 | 5658673.0 |