System Name System 8
System Availability Available
System Category Datacenter · Cloud
System Size 8x Accelerator 8
Model Name Llama-3.1-70B
Division Closed
Model Precision BF16
Model Link —
Transformation Link —
Model Notes —
Dataset Name OpenOrca
Dataset Type Accuracy + Performance
Average Input Tokens 222.33
Average Output Tokens 218.89
Dataset Link GitHub
Measured Accuracy Score —
Throughput vs Interactivity
Throughput vs Concurrency
Time to First Token vs Concurrency
Interactivity vs Concurrency
Processor
Processor Model Name Processor 8
Processors per Node 2
Cores Per Processor 32
VCPUs Per Processor —
Accelerator
Accelerator Model Name Accelerator 8
Accelerators per Node 8
Memory Type
Memory Capacity 256 GB
Accelerator Interconnect
Host-Accelerator Interconnect
Host / Storage
Host Memory Capacity 1TB
Memory Configuration
Storage Capacity
Storage Type
Cooling Air-cooled
Hardware Notes
Framework SGLang
Operating System Linux
Other Software Inference Backend v1.0
Software Notes —
| Field Name | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Run 6 | Run 7 | Run 8 | Run 9 | Run 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Run Date | 02/25/2026 | 02/25/2026 | 02/25/2026 | 02/25/2026 | 02/25/2026 | 02/25/2026 | 02/25/2026 | 02/25/2026 | 02/25/2026 | 02/25/2026 |
| Concurrency | 0.91 | 1.82 | 2.73 | 3.64 | 5.46 | 7.28 | 10.92 | 14.56 | 21.83 | 29.11 |
| System Tokens/Second | 32.5 | 60.3 | 84.5 | 105.9 | 141.1 | 169.2 | 165.1 | 215.9 | 210.7 | 206.8 |
| Tokens/Second per User | 35.7 | 33.1 | 31.0 | 29.1 | 25.9 | 23.2 | 15.1 | 14.8 | 9.6 | 7.1 |
| TTFT P99 (ms) | 160424.0 | 94628.5 | 75703.5 | 67214.4 | 78077.5 | 141026.3 | 203491.0 | 250019.3 | 383753.9 | 568735.2 |
| Utilization | 15.1% | 27.9% | 39.1% | 49.1% | 65.4% | 78.3% | 76.4% | 100.0% | 97.6% | 95.8% |
| Configuration Summary | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 |
| QPS | 0.1339 | 0.2490 | 0.3487 | 0.4368 | 0.5821 | 0.6978 | 0.6808 | 0.8906 | 0.8690 | 0.8530 |
| Total Output Tokens | 5,965,062 | 5,949,986 | 5,956,127 | 5,960,490 | 5,958,391 | 5,958,417 | 5,958,417 | 5,958,417 | 5,958,417 | 5,958,417 |
| Run Duration (s) | 183,532.39 | 98,699.00 | 70,475.88 | 56,257.69 | 42,217.87 | 35,221.64 | 36,098.56 | 27,593.61 | 28,282.22 | 28,810.00 |
| Total Requests | 24,576 | 24,576 | 24,576 | 24,576 | 24,576 | 24,576 | 24,576 | 24,576 | 24,576 | 24,576 |
| Time To First Token (TTFT) (ms) | ||||||||||
| Minimum | 3804.2 | 3870.3 | 3885.0 | 3933.8 | 3942.8 | 3918.1 | 4061.5 | 4001.1 | 4186.3 | 3979.4 |
| Average | 16612.9 | 14477.5 | 13237.5 | 12994.0 | 13276.5 | 13821.0 | 18252.7 | 20447.6 | 30608.3 | 49760.2 |
| P50 | 7336.0 | 7781.1 | 7925.8 | 8249.2 | 8707.9 | 8995.8 | 11218.4 | 10976.9 | 13437.0 | 15557.7 |
| P90 | 39348.4 | 34464.7 | 28613.5 | 27112.9 | 24656.8 | 22656.5 | 26358.0 | 24353.3 | 30126.4 | 115098.6 |
| P95 | 77394.0 | 53346.6 | 44717.7 | 39311.9 | 34046.2 | 31178.2 | 37216.1 | 35919.6 | 149861.7 | 310115.5 |
| P99 | 160424.0 | 94628.5 | 75703.5 | 67214.4 | 78077.5 | 141026.3 | 203491.0 | 250019.3 | 383753.9 | 568735.2 |
| P999 | 257146.8 | 150344.3 | 128007.4 | 98512.9 | 127481.5 | 166551.6 | 233459.8 | 302208.0 | 443234.8 | 875006.1 |
| Maximum | 432156.7 | 253484.4 | 191495.3 | 177778.9 | 246347.1 | 215834.3 | 239227.1 | 314870.7 | 445388.9 | 986368.3 |
| Time Per Output Token (TPOT) (ms) | ||||||||||
| Minimum | 1739.6 | 1817.9 | 433.2 | 852.7 | 1303.9 | 4.5 | 521.0 | 4.7 | 4080.8 | 12.9 |
| Average | 1913.6 | 2062.7 | 2189.7 | 2318.0 | 2574.1 | 2819.9 | 4209.8 | 4170.3 | 6056.8 | 7683.0 |
| P50 | 1895.8 | 2043.5 | 2170.6 | 2297.7 | 2552.6 | 2798.8 | 4174.8 | 4126.4 | 5975.5 | 7576.5 |
| P90 | 1971.5 | 2136.2 | 2281.5 | 2421.3 | 2702.4 | 2958.0 | 4376.5 | 4364.7 | 6267.5 | 7942.6 |
| P95 | 2014.1 | 2193.2 | 2348.0 | 2496.3 | 2786.8 | 3049.7 | 4512.5 | 4505.2 | 6446.7 | 8176.5 |
| P99 | 2301.5 | 2526.5 | 2718.5 | 2945.3 | 3244.0 | 3609.4 | 5176.9 | 5359.0 | 7921.3 | 10225.8 |
| P999 | 3787.8 | 3913.6 | 4063.6 | 4544.1 | 6169.5 | 6971.7 | 10242.0 | 10569.6 | 17329.4 | 21611.3 |
| Maximum | 5906.5 | 10572.8 | 7840.7 | 10317.3 | 11354.8 | 16287.1 | 33604.1 | 65392.8 | 179052.6 | 140525.5 |
| Request Latency (ms) | ||||||||||
| Minimum | 6464.4 | 7437.5 | 6103.9 | 6189.9 | 6327.9 | 7459.1 | 11522.5 | 8906.3 | 13885.9 | 14818.1 |
| Average | 475068.8 | 507246.2 | 537206.7 | 567696.6 | 629275.8 | 688793.6 | 1024810.8 | 1015719.2 | 1470435.5 | 1877016.0 |
| P50 | 450588.0 | 480478.9 | 507270.6 | 537205.5 | 594482.9 | 652228.2 | 967515.9 | 960000.7 | 1387169.9 | 1773763.1 |
| P90 | 798950.6 | 849053.4 | 902221.4 | 949972.5 | 1053911.6 | 1158329.8 | 1719060.9 | 1706131.1 | 2466617.3 | 3149608.3 |
| P95 | 982583.1 | 1053935.9 | 1109809.9 | 1181066.9 | 1309625.7 | 1436293.4 | 2140146.7 | 2116034.3 | 3067674.3 | 3900937.7 |
| P99 | 1892682.2 | 1946212.8 | 2070760.6 | 2239232.9 | 2456970.4 | 2662766.4 | 3992459.5 | 3942232.0 | 5689630.4 | 7217454.0 |
| P999 | 2008062.3 | 2145613.6 | 2268195.7 | 2394851.0 | 2670375.1 | 2923674.6 | 4353892.8 | 4317420.5 | 6249831.7 | 7957346.9 |
| Maximum | 2240850.4 | 2281521.1 | 2356602.8 | 2470723.8 | 2728848.5 | 3017352.2 | 4434690.4 | 4431438.4 | 6487397.8 | 8335438.8 |