Menu
1 Node

System 1

Submitted by Submitter 1on 2026-02-19. Published on 2026-03-16

SUT Summary
System Name System 1
System Availability Available
System Category Datacenter · On-premise
System Size 8x Accelerator 1
Model
Model Name GPT-OSS 120B
Division Closed
Model Precision mxfp4
Model Link
Transformation Link
Model Notes
Datasets
Dataset Name MLPerf Inference GPT-OSS Performance Dataset
Dataset Type Performance
Average Input Tokens 5010.64
Average Output Tokens 1300
Dataset Link Link
Measured Accuracy Score

Throughput vs Interactivity

0100200300400System TPS050100150200Interactivity (tok/s/user)

Throughput vs Concurrency

0100200300400System TPS020406080Concurrency

Time to First Token vs Concurrency

05.0k10kTTFT P99 (s)020406080Concurrency

Interactivity vs Concurrency

050100150200Interactivity (tok/s/user)020406080Concurrency
Hardware

Processor

Processor Model Name Processor 1
Processors per Node 2
Cores Per Processor 32
VCPUs Per Processor

Accelerator

Accelerator Model Name Accelerator 1
Accelerators per Node 8
Memory Type
Memory Capacity 256 GB
Accelerator Interconnect
Host-Accelerator Interconnect

Host / Storage

Host Memory Capacity 1TB
Memory Configuration
Storage Capacity
Storage Type
Cooling Passive OAM
Hardware Notes
Software
Framework vLLM
Operating System Linux
Other Software
Software Notes
Run Data
Field NameRun 1Run 2Run 3Run 4Run 5Run 6Run 7
Run Date03/06/202603/06/202603/06/202603/06/202603/06/202603/06/202603/06/2026
Concurrency0.170.692.765.5222.0944.1888.36
System Tokens/Second33.581.5185.9245.4396.6420.0421.0
Tokens/Second per User193.9118.067.344.418.09.54.8
TTFT P99 (ms)6827.84477.18742.4104930.0495260.42639222.713065074.1
Utilization7.9%19.4%44.1%58.3%94.2%99.8%100.0%
Configuration SummaryTP=4, PP=1, precision=FP4, cont_batchTP=4, PP=1, precision=FP4, cont_batchTP=4, PP=1, precision=FP4, cont_batchTP=4, PP=1, precision=FP4, cont_batchTP=4, PP=1, precision=FP4, cont_batchTP=4, PP=1, precision=FP4, cont_batchTP=4, PP=1, precision=FP4, cont_batch
QPS0.00950.02330.05340.07000.11370.11980.1203
Total Output Tokens22,427,55122,387,66122,246,05722,407,09822,315,59322,417,18022,387,231
Run Duration (s)670,138.61274,800.66119,680.2391,317.4956,266.6653,371.0153,173.75
Total Requests6,3966,3966,3966,3966,3966,3966,396
Time To First Token (TTFT) (ms)
Minimum1102.41465.52238.53123.66984.367289.0170792.0
Average2202.52341.83721.96111.533358.11801413.88584735.1
P501891.02172.42950.73928.88866.91997308.610312446.2
P902852.02935.03957.35275.712947.12251361.210597636.7
P954255.83379.64482.16069.8249806.52334694.810717307.7
P996827.84477.18742.4104930.0495260.42639222.713065074.1
P99919545.216497.065084.4128672.6515076.73665757.013464134.1
Maximum77085.816635.565091.9128682.3515994.74736661.413494291.7
Time Per Output Token (TPOT) (ms)
Minimum145.5352.0432.5702.52211.61897.73549.9
Average2137.73553.56202.29444.522406.832341.432421.9
P502001.63304.25823.38870.421061.530083.730179.2
P903220.75309.19254.014217.333700.648990.849132.2
P953743.86191.810568.716546.438556.756705.556656.6
P994887.48190.913739.722185.051386.578101.774276.5
P9997580.515704.918697.730332.476476.4111552.4101017.4
Maximum17399.325161.034471.835229.3177460.1147740.0178619.6
Request Latency (ms)
Minimum31834.766850.152051.787834.0356598.91431147.11782787.1
Average418971.3686824.51191955.61808173.14370744.08050099.114864139.3
P50400901.5658928.11147500.61728456.94199659.47802771.315639162.0
P90590815.2968200.81673188.92551220.06126371.910714231.918714110.3
P95662359.81075910.81852093.82841350.86838981.111793383.919785209.6
P99816140.31370790.72319437.03602457.08686452.714407677.922351816.5
P9991214458.21977796.92903047.05180534.711359379.719005547.725715757.2
Maximum1263405.52068099.43528129.15417045.012910347.223026557.628913186.6