Menu
1 Node

System 2

Submitted by Submitter 2on 2026-02-19. Published on 2026-03-16

SUT Summary
System Name System 2
System Availability Available
System Category Datacenter · On-premise
System Size 8x Accelerator 2
Model
Model Name GPT-OSS 120B
Division Closed
Model Precision mxfp4
Model Link
Transformation Link
Model Notes
Datasets
Dataset Name MLPerf Inference GPT-OSS Performance Dataset
Dataset Type Performance
Average Input Tokens 5010.64
Average Output Tokens 1300
Dataset Link Link
Measured Accuracy Score

Throughput vs Interactivity

0100200300400System TPS050100150200Interactivity (tok/s/user)

Throughput vs Concurrency

0100200300400System TPS020406080Concurrency

Time to First Token vs Concurrency

05.0k10k15kTTFT P99 (s)020406080Concurrency

Interactivity vs Concurrency

050100150200Interactivity (tok/s/user)020406080Concurrency
Hardware

Processor

Processor Model Name Processor 2
Processors per Node 2
Cores Per Processor 32
VCPUs Per Processor

Accelerator

Accelerator Model Name Accelerator 2
Accelerators per Node 8
Memory Type
Memory Capacity 256 GB
Accelerator Interconnect
Host-Accelerator Interconnect

Host / Storage

Host Memory Capacity 1TB
Memory Configuration
Storage Capacity
Storage Type
Cooling Passive & Active
Hardware Notes
Software
Framework Inference Framework
Operating System Linux
Other Software Inference Backend v1.0
Software Notes
Run Data
Field NameRun 1Run 2Run 3Run 4Run 5Run 6Run 7
Run Date03/07/202603/07/202603/07/202603/07/202603/07/202603/06/202603/06/2026
Concurrency0.170.692.765.5222.0944.1888.36
System Tokens/Second31.773.4151.7210.7350.8376.2372.9
Tokens/Second per User183.6106.454.938.215.98.54.2
TTFT P99 (ms)7376.18202.711045.0117445.0566319.42744948.415132101.2
Utilization8.4%19.5%40.3%56.0%93.2%100.0%99.1%
Configuration SummaryTP=4, PP=1, precision=FP4, cont_batchTP=4, PP=1, precision=FP4, cont_batchTP=4, PP=1, precision=FP4, cont_batchTP=4, PP=1, precision=FP4, cont_batchTP=4, PP=1, precision=FP4, cont_batchTP=4, PP=1, precision=FP4, cont_batchTP=4, PP=1, precision=FP4, cont_batch
QPS0.00910.02100.04330.06050.10020.10800.1067
Total Output Tokens22,330,92322,403,34322,395,86622,291,61122,400,63922,287,87222,359,894
Run Duration (s)704,769.16305,025.16147,636.54105,795.2663,854.2159,237.6059,959.01
Total Requests6,3966,3966,3966,3966,3966,3966,396
Time To First Token (TTFT) (ms)
Minimum1207.91712.32670.73615.57406.945461.389405.5
Average2412.83037.64465.27085.536925.81982966.99690425.2
P502037.92563.83602.74626.010193.62202151.111589523.0
P903280.83806.04900.96260.914862.22509911.011942149.5
P954566.05113.75500.67108.0264765.32620285.312637373.6
P997376.18202.711045.0117445.0566319.42744948.415132101.2
P99920430.062938.671212.7146420.0587808.04306026.515466318.2
Maximum80479.063080.071216.6146426.4588829.64856981.915499261.8
Time Per Output Token (TPOT) (ms)
Minimum208.6285.4580.51024.92462.43112.33618.8
Average2235.13848.57447.610666.325096.635986.337346.0
P502087.13619.07029.19968.723597.833468.734524.2
P903347.25749.711140.215912.837342.654886.857178.2
P953861.26629.212782.218420.742956.163297.966992.4
P995205.98807.816761.724646.957081.686215.989453.9
P9997781.714330.125677.338501.186461.2124100.5127450.4
Maximum11687.023560.831628.699617.7220982.8267943.5198467.8
Request Latency (ms)
Minimum37316.773242.8116075.6205669.1532022.71639480.21859259.4
Average440623.1762347.81470411.12098297.84955298.98922224.616807261.7
P50421480.3732175.51407452.82011210.34754899.88700339.417643330.6
P90624313.21073267.12069158.62964609.76960485.711852587.021171967.4
P95694314.81196354.92318143.53282161.17835440.813119819.922380592.9
P99865105.21493647.62900401.84066207.29849525.815826930.224866288.1
P9991082802.22095815.03530041.75529029.413079076.120137359.428998216.2
Maximum1302289.92284929.84351186.16185412.014710440.224253892.732271645.3