System Name System 7
System Availability Available
System Category Datacenter · Cloud
System Size 8x Accelerator 7
Model Name Llama-3.1-70B
Division Closed
Model Precision BF16
Model Link —
Transformation Link —
Model Notes —
Dataset Name OpenOrca
Dataset Type Accuracy + Performance
Average Input Tokens 222.33
Average Output Tokens 218.89
Dataset Link GitHub
Measured Accuracy Score —
Throughput vs Interactivity
Throughput vs Concurrency
Time to First Token vs Concurrency
Interactivity vs Concurrency
Processor
Processor Model Name Processor 7
Processors per Node 2
Cores Per Processor 32
VCPUs Per Processor —
Accelerator
Accelerator Model Name Accelerator 7
Accelerators per Node 8
Memory Type
Memory Capacity 256 GB
Accelerator Interconnect
Host-Accelerator Interconnect
Host / Storage
Host Memory Capacity 1TB
Memory Configuration
Storage Capacity
Storage Type
Cooling Air-cooled
Hardware Notes
Framework vLLM
Operating System Linux
Other Software Inference Backend v1.0
Software Notes —
| Field Name | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Run 6 | Run 7 | Run 8 | Run 9 | Run 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Run Date | 02/26/2026 | 02/26/2026 | 02/26/2026 | 02/26/2026 | 02/26/2026 | 02/26/2026 | 02/26/2026 | 02/26/2026 | 02/26/2026 | 02/26/2026 |
| Concurrency | 0.91 | 1.82 | 2.73 | 3.64 | 5.46 | 7.28 | 10.92 | 14.56 | 21.83 | 29.11 |
| System Tokens/Second | 39.6 | 73.3 | 84.1 | 109.2 | 163.7 | 190.6 | 218.7 | 236.9 | 228.6 | 241.9 |
| Tokens/Second per User | 43.5 | 40.3 | 30.8 | 30.0 | 30.0 | 26.2 | 20.0 | 16.3 | 10.5 | 8.3 |
| TTFT P99 (ms) | 133445.4 | 83413.8 | 74078.9 | 75133.9 | 83998.6 | 118872.3 | 165691.5 | 241619.9 | 359637.5 | 466422.3 |
| Utilization | 16.4% | 30.3% | 34.8% | 45.1% | 67.7% | 78.8% | 90.4% | 97.9% | 94.5% | 100.0% |
| Configuration Summary | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 | DP=1 TP=4 |
| QPS | 0.1628 | 0.3025 | 0.3463 | 0.4500 | 0.6747 | 0.7858 | 0.8998 | 0.9747 | 0.9406 | 0.9956 |
| Total Output Tokens | 5,979,513 | 5,954,161 | 5,970,998 | 5,962,724 | 5,964,063 | 5,961,100 | 5,972,600 | 5,972,600 | 5,972,600 | 5,972,600 |
| Run Duration (s) | 150,963.54 | 81,235.06 | 70,967.05 | 54,618.72 | 36,426.83 | 31,275.86 | 27,313.73 | 25,214.74 | 26,128.80 | 24,685.83 |
| Total Requests | 24,576 | 24,576 | 24,576 | 24,576 | 24,576 | 24,576 | 24,576 | 24,576 | 24,576 | 24,576 |
| Time To First Token (TTFT) (ms) | ||||||||||
| Minimum | 3726.7 | 3803.3 | 4366.5 | 4359.7 | 4199.9 | 4438.6 | 5704.0 | 6090.6 | 4410.1 | 4690.1 |
| Average | 14992.0 | 12962.8 | 13090.3 | 13834.0 | 12967.7 | 14155.8 | 18175.0 | 23303.8 | 35371.5 | 48930.5 |
| P50 | 6252.3 | 6975.7 | 8387.2 | 9423.5 | 9648.8 | 10650.4 | 12345.9 | 14582.5 | 19051.9 | 22007.0 |
| P90 | 37320.2 | 29735.6 | 26746.2 | 28303.0 | 21659.4 | 20903.8 | 22185.6 | 24271.0 | 29637.7 | 35251.5 |
| P95 | 67432.0 | 45667.5 | 42925.4 | 41517.7 | 30567.3 | 28833.5 | 30027.6 | 33416.4 | 197879.3 | 307987.3 |
| P99 | 133445.4 | 83413.8 | 74078.9 | 75133.9 | 83998.6 | 118872.3 | 165691.5 | 241619.9 | 359637.5 | 466422.3 |
| P999 | 197054.5 | 131536.6 | 116279.2 | 96915.9 | 106459.1 | 150562.6 | 221889.8 | 287324.2 | 383690.3 | 519255.1 |
| Maximum | 442065.8 | 202092.7 | 175194.4 | 144182.0 | 156950.5 | 207603.2 | 222008.6 | 287469.9 | 425555.4 | 538872.5 |
| Time Per Output Token (TPOT) (ms) | ||||||||||
| Minimum | 1259.6 | 1431.0 | 2012.5 | 1966.0 | 4.2 | 2.9 | 1522.0 | 6.1 | 5.5 | 5.8 |
| Average | 1565.5 | 1691.3 | 2206.7 | 2240.5 | 2224.0 | 2499.4 | 3161.1 | 3764.8 | 5526.5 | 6556.6 |
| P50 | 1550.7 | 1674.7 | 2186.7 | 2218.9 | 2202.5 | 2476.1 | 3132.9 | 3733.3 | 5464.1 | 6473.4 |
| P90 | 1602.2 | 1750.1 | 2272.7 | 2324.9 | 2334.5 | 2632.2 | 3304.1 | 3972.5 | 5724.8 | 6774.9 |
| P95 | 1633.2 | 1794.0 | 2323.9 | 2390.2 | 2402.5 | 2715.6 | 3398.8 | 4108.2 | 5889.0 | 7012.9 |
| P99 | 1872.3 | 2077.1 | 2659.9 | 2754.6 | 2772.5 | 3245.5 | 4044.2 | 4815.4 | 7066.1 | 8781.9 |
| P999 | 3039.9 | 3225.0 | 4137.6 | 4246.2 | 4940.0 | 5515.5 | 8397.0 | 9647.3 | 13101.9 | 20088.5 |
| Maximum | 7002.5 | 7325.0 | 12207.3 | 9731.5 | 13796.5 | 15767.5 | 81413.2 | 59776.9 | 143470.2 | 148014.9 |
| Request Latency (ms) | ||||||||||
| Minimum | 5735.2 | 6506.2 | 7602.6 | 7947.3 | 7475.3 | 7542.8 | 10881.9 | 9093.2 | 18396.3 | 16422.2 |
| Average | 391054.3 | 417343.3 | 542269.5 | 550120.0 | 545458.4 | 611965.5 | 774725.0 | 924771.9 | 1355382.3 | 1609145.5 |
| P50 | 366872.8 | 392851.6 | 509601.5 | 517135.7 | 512043.6 | 575381.4 | 732043.7 | 870901.6 | 1280337.0 | 1516147.9 |
| P90 | 654348.6 | 699849.1 | 914524.9 | 924473.4 | 918762.0 | 1026108.7 | 1310366.5 | 1560283.2 | 2289377.7 | 2717536.7 |
| P95 | 815229.8 | 870127.6 | 1125792.1 | 1141530.3 | 1143891.7 | 1285161.1 | 1608454.5 | 1911163.5 | 2806738.8 | 3322266.1 |
| P99 | 1521273.2 | 1585862.9 | 2138591.6 | 2110431.2 | 2042757.5 | 2423927.2 | 3035012.3 | 3566366.2 | 5302460.4 | 6272371.8 |
| P999 | 1622269.4 | 1754948.7 | 2277201.3 | 2313726.6 | 2300925.9 | 2595558.6 | 3265347.3 | 3912793.1 | 5752133.4 | 6794360.0 |
| Maximum | 1781630.8 | 1895749.8 | 2325655.1 | 2383943.7 | 2344497.3 | 2693176.9 | 3335827.6 | 4062430.0 | 5835240.1 | 6922201.4 |