Benchmark Testing

Now the fun begins.  All benchmark testing is done prior to taking the SSD apart if it has a heatsink, for pictures.  For all benchmarking, we are either using the manufacturer heatsink, or our motherboards SSD heat spreader.  We also swivel the fan around so it is blowing over the heat spreader to ensure we are testing the absolute best performance from the SSD for our benchmarking sessions.  Benchmarking is repeated several times to ensure proper results.  For temperature testing, we remove the fan and test temperature with solely the heatsink, and no active airflow.

UL PCMark 10

PCMark 10 full system drive benchmark

UL’s PCMark 10 is a powerful application benchmark for storage devices.  It is a good overall application type benchmark for drives with real-world application traces.  We use the full system drive benchmark.  This takes a while to run.  It uses many real-world traces from applications and common tasks to test performance.  It uses 23 traces, running 3 passes with each trace. 

The traces tested include: Booting Windows 10, Adobe Acrobat, Adobe After Effects, Adobe Illustrator, Adobe Premiere Pro, Adobe Lightroom, Adobe Photoshop, Battlefield V, Call of Duty Black Ops 4, Overwatch, Microsoft Excel, Adobe Illustrator, Adobe InDesign, Microsoft PowerPoint, Copying ISO images, and copying JPEGs. 

We report the overall score, the bandwidth result, access time result.  Higher scores are better, except for access time where lower is better.  Note that the overall score is derived from the bandwidth result and access time result.  There is a mathematical formula it uses to derive the score.  The bandwidth score and access time are also derived from a formula.  For example, Bandwidth is related to the amount of time that the storage device is executing I/Os.  To learn more about this benchmark, check out the technical guide.

PassMark PerformanceTest

PassMark PerformanceTest Disk Bench

We run only the Disk Bench portion of PassMark’s PerformanceTest.  This is a good synthetic overall application type benchmark for drives.  This benchmark derives its overall score from running a Disk Sequential Read test, Disk Sequential Write test, IOPS 32KQD20 test, and IOPS 4KQD1 test.  The overall score is what we graph, and higher is better.

CrystalDiskMark

CrystalDiskMark

CrystalDiskMark is a common program for testing throughput of sequential and random reads and writes.  For NVMe drives, we use the NVMe profile in the program.  For SATA drives we use the default profile.  Therefore, you cannot compare results between SATA and PCI-Express drives as the testing configuration is different between both.  But you can compare the results between PCIe Gen3 and Gen4 devices.  We show the read and write results for each test, higher results are better.

The NVMe configuration tests SEQ1M Q8T1 read and write, SEQ128K Q32T1 read and write, RND4K Q32T16 read and write and RND4K Q1T1 read and write.

The default SATA configuration tests SEQ1M Q8T1 read and write, SEQ1M Q1T1 read and write, RND4K Q32T1 read and write and RND4K Q1T1 read and write.

AS SSD

AS SSD Copy Benchmark

AS SSD is a throughput tester similar to CrystalDiskMark, but we don’t need to use both.  However, AS SSD has a unique subset of tests called Copy Tests.  We use the Copy Tests benchmark for drives.  The categories are ISO, Program, and Game.  The official description of what it does is as follows:  “In the copy tests (menu-tools-copy benchmark) the following test folders are created: ISO (two large files), programs (typical program folder with many small files) and games (folder of a game with small and large files). These three folders are copied with a simple copy command of the operating system. The cache remains switched on for this test. The practical tests show the performance of the SSD with simultaneous read and write operations.

We report the duration or time it took to complete these functions.  Lower time is better.  This is a good benchmark to show how fast drives are at copying various file sizes.

DiskBench

DiskBench

DiskBench is a utility we use to calculate the time it takes to copy a large file from the same drive being reviewed or tested to itself, volume to volume.  We have created a 50GB file in Windows with the “fsutil” command.  We place this file on the device being tested.  We then tell DiskBench to copy this file on the drive to a new folder on the same drive.  This tells us how good and fast the SSD is at copying large files.  We report the duration of time it took to complete the task.  Lower time is better.

Final Fantasy XIV: Endwalker Benchmark

Final Fantasy XIV: Endwalker Benchmark is a game benchmark of an actual game.  While it does benchmark the GPU, it has another function, it times the loading of each scene that it loads to be benchmarked.  It outputs a load time for each scene, plus an overall average load time of all the scenes together.  In this way, we can get an accurate and repeatable representation of game loading time without using something as inaccurate as a stopwatch.  This allows us to get consistent and accurate results. 

Since this is based on a real game and game engine, it has major relevance.  For this benchmark, we set the graphics settings “Maximum” and the resolution to 4K.  In this way, it is loading the highest texture sizes, and the most game assets to stress load times as much as possible.  We graph the total average load time of all scenes.  Lower time is better.

SPECworkstation 3.1

SPECworkstation 3.1 is a professional workstation benchmark based on diverse professional applications.  We don’t run the entire benchmark, just the storage performance called WPCstorage.  The storage workload is based on storage transaction traces from a wide variety of professional applications engaged in real work. These are grouped according to market segments for scoring purposes.”

It runs 60 subtests of testing across different types of programs.  The market segments it benchmarks are: Media and Entertainment, Product Development, Life Sciences, Energy, and General Operations.  The entire list of the applications are: ccx, SPECapcMaya, MayaVenice, MandE, handbrake, 3dsm, wpcCFD, prodDev, namd, lammps, energy-02, mozillaVS, mcad, IcePack and 7z. 

It outputs an overall score for each section, and then an average score of everything combined.  We graph the overall average score of everything combined and report that number.  Higher numbers are better, and that means the SSD will perform better in workstation class applications.

Temperature

HWiNFO

We use HWiNFO to check the temperature and report the maximum or highest temperature.  To test temperature, we test it in two ways, Stress Testing, and Typical Usage.  Stress Testing involves us loading up Active KillDisk and running Secure Erase across the entire SSD.  This stresses the SSD and heats it up as much as possible (beyond typical usage) as it is reading and writing from every cell across the SSD.  This is a worst-case scenario, but we feel it’s important to see how hot it can get. 

Second, we run several benchmarks and note the typical temperature in typical workloads.  This is what you will generally encounter in real-world workloads.  This temperature is lower than the stress test but shows what you will mostly encounter in normal workloads.  We report these temperatures in a graph in Celsius.  Lower temps are better.

Brent Justice

Brent Justice has been reviewing computer components for 20+ years, educated in the art and method of the computer hardware review he brings experience, knowledge, and hands-on testing with a gamer oriented...

Join the Conversation

3 Comments

  1. I appreciate the effort, it looks well thought out and thorough. I especially applaud the testing standardization and the building up of a database. That is the best thing for all reviews imo, as it gives you the ability to objectively go back and make comparisons.

    I have to admit – personally, I don’t really look at storage benchmarks. I care about SSD vs HDD, but if one SSD is a bit faster than the next – not a consideration for me or my typical uses.

    I recognize I’m not everyone – and some people here do have cases where the difference in performance can make a big impact. I’m just not one of them.

    For me, the three biggest factors in my storage purchases:

    1) Is it reliable? Your temperature testing does get to that fact, but only tangentially. Here, I rely largely on Backblaze reporting, SSD overprovisioning and brand name reputation (which is not a great indicator of anything really). It’s hard to get this kind of data without some long term use cases, and particularly in SSDs, a lot of times they just haven’t been around long enough to be able to get that kind of data. If there are other resources that help get at this, I’d be very interested. I don’t know that this is something that FPS could invest in, it takes a good deal of time and resources. For SSDs – for home use I’m typical consumer use, so write endurance isn’t a huge factor, and even at work – it’s light duty database, and even that doesn’t see a huge amount of writes.

    2) Price per byte. I will look at interface when I look at price – I’d value nVME over SATA on SSDs, for instance, but only to a point. But for one nVME over another, I probably wouldn’t look at speed benches.

    3) Warranty coverage. I have shucked drives before, when the price per byte was just so low that it was hard to pass up. But more often than not I try to get drives with 5 year warranties. I’ve found most drives will outlive that, but on drive failures that I’ve seen, it tends to lump into three distinct bands – within the first 90 days, or around year 3-4, or well after year 7. I never touch rebuilt or used drives.

  2. Choosing to intentionally test drives via the second, chipset-connected slot is a very strange decision, given the compatibility issues that have popped up with drives like the (now resolved) WD SN850 and those which use certain SMI controllers (SM2262/EN, etc). There is also additional command latency for having to traverse the chipset interlink which can impact IOPS, to say nothing of any incidental secondary bandwidth from accessory devices which are switched to the chipset.

    Being able to more easily point a fan at the drive seems like a very weird justification given the potential for more important issues which could, quite literally, invalidate each and every drive test result on this site going forward.

    Many users have also experienced poor SATA performance on X570 boards, with lower than expected random 4K (Q32T16, etc) results as compared to B450/B550 or Intel platforms, so that could throw off reviews of any future hypothetical refreshed SATA SSDs. In fact, the same problem is clearly visible in your review of the TeamGroup T-Force Vulcan SSD. Q32T16 performance sits at about 230MB/s, below what is expected.

    https://www.thefpsreview.com/2020/07/08/teamgroup-t-force-vulcan-500gb-2-5-sata-ssd-review/6/

    To be clear, I’ve been reading your content since you started at [H] and this isn’t some gotcha dig at your credibility, it simply strikes me as a very weird choice and a bit of an unforced error. If anything, I’d expect testing on the main slot by default, with an additional quick sanity check on the chipset slot as well, to check for any glaring compatibility issues.

    Storage reviews haven’t really been a focus here, but if you intend to jump in with both feet it might be worth considering these things. :)

  3. To be clear, I’ve been reading your content since you started at [H] and this isn’t some gotcha dig at your credibility, it simply strikes me as a very weird choice and a bit of an unforced error. If anything, I’d expect testing on the main slot by default, with an additional quick sanity check on the chipset slot as well, to check for any glaring compatibility issues.

    It was considered, and the above check has actually been done. The motherboard we are using has the exact same performance between the M.2 slots. Plus, all drives are being benchmarked on equal turf, and thus comparable as they are all being benchmarked in the same way, on the same M.2 slot, and with the same cooling. Therefore, the testing is standardized and can be compared directly.

    IF, read IF, any latency issues exist due to this particular M.2 slot, it would be replicated on every test, for every SSD, and thus would still be comparable as the same configuration is being used. However, we did checks to verify that the performance is the same between the two slots prior to making this decision. I would not have chosen to do so otherwise. The M.2_2 slot provides the full potential of performance. The full PCI-Express 4.0 lanes are open to it and the slot can maximize PCI-Express 4.0 performance.

    Our system runs lean, and we do not have excessive data running through the chipset that would cause latency or bandwidth degradation. We also run the tests multiple times and verify the results.

    The M.2_1 slot on this motherboard does not have a heatsink. Only the M.2_2 slot, therefore this is another reason for using the slot, so we can apply the motherboard’s default heatsink as it is intended.

    In addition, the use of the M.2_2 slot is a real-world test configuration that would be used in a computer build. Testing on that slot is a configuration that would exist in real-world usage. It is therefore a real-world setup.

    The use of an X570 based motherboard for the test bench is obvious.

    All system specs, and configuration, are clearly stated.

Leave a comment