System Name System 5
System Availability Available
System Category Datacenter · —
System Size 8x Accelerator 5
Model Name Llama-3.1-8B
Division Closed
Model Precision INT4
Model Link —
Transformation Link —
Model Notes INT4 precision weights
Dataset Name CNN_DailyMail
Dataset Type Accuracy + Performance
Average Input Tokens 870
Average Output Tokens 128
Dataset Link Hugging Face
Measured Accuracy Score
rouge1 38.7781
rouge2 16.135
rougeL 24.5952
rougeLsum 35.8385
gen_len 8193529
gen_num 13,368
Throughput vs Interactivity
Throughput vs Concurrency
Time to First Token vs Concurrency
Interactivity vs Concurrency
Processor
Processor Model Name Processor 5
Processors per Node 1
Cores Per Processor 32
VCPUs Per Processor 72
Accelerator
Accelerator Model Name Accelerator 5
Accelerators per Node 8
Memory Type
Memory Capacity 256 GB
Accelerator Interconnect
Host-Accelerator Interconnect
Host / Storage
Host Memory Capacity 1TB
Memory Configuration
Storage Capacity
Storage Type
Cooling Air-cooled
Hardware Notes
Framework vLLM
Operating System Linux
Other Software Inference Backend v1.0
Software Notes —
| Field Name | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Run 6 | Run 7 | Run 8 | Run 9 | Run 10 | Run 11 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Run Date | 02/25/2026 | 02/25/2026 | 02/25/2026 | 02/23/2026 | 02/23/2026 | 02/23/2026 | 02/23/2026 | 02/23/2026 | 02/23/2026 | 02/23/2026 | 02/23/2026 |
| Concurrency | 0.02 | 0.04 | 0.08 | 0.16 | 0.32 | 0.65 | 1.29 | 2.58 | 5.16 | 10.33 | 20.65 |
| System Tokens/Second | 1.3 | 2.6 | 5.0 | 9.3 | 16.7 | 28.5 | 42.2 | 55.0 | 63.5 | 67.2 | 67.0 |
| Tokens/Second per User | 66.4 | 65.0 | 61.9 | 57.5 | 51.8 | 44.2 | 32.7 | 21.3 | 12.3 | 6.5 | 3.2 |
| TTFT P99 (ms) | 14307.4 | 14568.2 | 14812.7 | 15241.0 | 17508.8 | 19781.0 | 25207.3 | 31961.5 | 244075.7 | 563573.6 | 1464105.9 |
| Utilization | 2.0% | 3.9% | 7.4% | 13.8% | 24.9% | 42.4% | 62.8% | 81.9% | 94.5% | 100.0% | 99.7% |
| Configuration Summary | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 |
| QPS | 0.0105 | 0.0205 | 0.0390 | 0.0724 | 0.1306 | 0.2227 | 0.3297 | 0.4298 | 0.4961 | 0.5247 | 0.5231 |
| Total Output Tokens | 1,711,102 | 1,711,102 | 1,711,102 | 1,711,102 | 1,711,102 | 1,711,102 | 1,711,102 | 1,711,102 | 1,711,102 | 1,711,102 | 1,711,102 |
| Run Duration (s) | 1,277,224.07 | 652,616.46 | 342,837.91 | 184,581.06 | 102,367.05 | 60,030.72 | 40,551.03 | 31,102.84 | 26,947.15 | 25,476.65 | 25,557.41 |
| Total Requests | 13,368 | 13,368 | 13,368 | 13,368 | 13,368 | 13,368 | 13,368 | 13,368 | 13,368 | 13,368 | 13,368 |
| Time To First Token (TTFT) (ms) | |||||||||||
| Minimum | 1782.4 | 1853.6 | 1803.4 | 1959.2 | 2196.0 | 2182.1 | 2405.3 | 3201.5 | 4455.6 | 4261.7 | 75584.5 |
| Average | 6000.1 | 6203.3 | 6297.0 | 6591.7 | 7066.7 | 7836.2 | 9890.7 | 12603.2 | 20014.6 | 44447.6 | 481567.9 |
| P50 | 5059.0 | 5505.1 | 5481.2 | 5753.9 | 6232.2 | 6788.3 | 8378.4 | 10342.7 | 14693.3 | 25190.5 | 401338.9 |
| P90 | 9574.1 | 9748.8 | 10045.8 | 10751.4 | 11755.8 | 12810.9 | 15607.7 | 18718.8 | 25459.2 | 66028.9 | 793085.7 |
| P95 | 11857.3 | 12090.4 | 12244.4 | 12547.4 | 13610.3 | 14835.7 | 18387.6 | 21752.5 | 30453.7 | 100155.0 | 867331.6 |
| P99 | 14307.4 | 14568.2 | 14812.7 | 15241.0 | 17508.8 | 19781.0 | 25207.3 | 31961.5 | 244075.7 | 563573.6 | 1464105.9 |
| P999 | 15694.2 | 16382.8 | 17463.2 | 20669.6 | 69332.5 | 84593.6 | 135089.9 | 225352.4 | 408852.8 | 778556.3 | 1723303.9 |
| Maximum | 61266.2 | 77066.1 | 79560.0 | 85803.7 | 90723.6 | 120887.2 | 159252.0 | 278166.0 | 434801.1 | 829670.7 | 1810980.2 |
| Time Per Output Token (TPOT) (ms) | |||||||||||
| Minimum | 692.4 | 684.0 | 697.7 | 722.0 | 761.8 | 784.9 | 849.4 | 901.5 | 1088.2 | 1304.2 | 1920.0 |
| Average | 705.0 | 719.9 | 758.0 | 817.6 | 908.4 | 1068.8 | 1448.2 | 2240.6 | 3893.8 | 7301.5 | 11299.4 |
| P50 | 704.5 | 712.0 | 752.8 | 812.1 | 902.2 | 1059.9 | 1437.5 | 2221.7 | 3845.8 | 6848.9 | 11605.9 |
| P90 | 708.5 | 747.5 | 793.5 | 867.2 | 982.3 | 1179.7 | 1654.7 | 2659.5 | 4864.1 | 10866.4 | 11703.5 |
| P95 | 709.7 | 762.1 | 809.7 | 885.8 | 1008.4 | 1220.0 | 1718.4 | 2776.3 | 5104.7 | 11340.2 | 11727.9 |
| P99 | 724.2 | 794.0 | 839.4 | 926.2 | 1062.4 | 1305.8 | 1842.2 | 2978.6 | 5587.8 | 11597.4 | 11778.0 |
| P999 | 770.4 | 820.0 | 870.8 | 976.5 | 1125.1 | 1413.4 | 1987.9 | 3368.9 | 5823.8 | 11677.6 | 11836.7 |
| Maximum | 829.0 | 834.2 | 902.4 | 1025.6 | 1205.8 | 1537.5 | 2081.8 | 3695.1 | 5931.2 | 11688.6 | 11857.6 |
| Request Latency (ms) | |||||||||||
| Minimum | 89712.0 | 89785.2 | 91318.6 | 96279.5 | 102115.6 | 108817.9 | 120585.7 | 120932.0 | 163055.8 | 197397.3 | 608648.4 |
| Average | 95534.8 | 97627.0 | 102558.9 | 110423.0 | 122435.9 | 143572.6 | 193813.8 | 297157.2 | 514529.1 | 971738.6 | 1916585.8 |
| P50 | 94598.7 | 96893.8 | 101739.2 | 109680.7 | 121563.7 | 142274.4 | 192135.8 | 293809.3 | 505315.7 | 897076.1 | 1865249.5 |
| P90 | 99607.0 | 103013.6 | 108541.8 | 117877.5 | 132690.2 | 158606.2 | 220955.5 | 351277.0 | 641370.9 | 1450410.6 | 2250152.8 |
| P95 | 101839.4 | 105195.0 | 110774.5 | 120776.8 | 136558.4 | 164342.0 | 230063.6 | 367688.8 | 679688.3 | 1508837.8 | 2340533.1 |
| P99 | 104731.2 | 109239.5 | 115175.5 | 126537.6 | 143571.8 | 176633.3 | 250673.5 | 400148.5 | 754846.7 | 1601823.7 | 2921688.3 |
| P999 | 107338.6 | 114554.5 | 120879.0 | 133681.6 | 184259.1 | 211439.8 | 318532.2 | 597134.8 | 987869.5 | 1924953.9 | 3178950.8 |
| Maximum | 151486.1 | 169167.7 | 175869.1 | 209798.4 | 208807.1 | 258287.2 | 340403.3 | 632273.4 | 1027301.4 | 2013520.9 | 3261327.2 |