All‑flash arrays and NVMe have pushed storage into the sub‑millisecond world. That is great for apps, but it also means tiny inefficiencies in hosts and fabrics now show up as real dollars.
The good news: most of the wins come from disciplined fabric choices, fair QoS, and a little host surgery. Think of it as removing pebbles from a racetrack.
By the time you are done, you should see tighter latency, smoother tail behavior, and better path utilization without throwing more hardware at the problem.
1) Start with the fabric you actually have
You do not need a religion about transports. You need predictable latency, loss handling that matches your traffic, and clean operations. In many modern san storage environments, the decision is between Fibre Channel with NVMe, and Ethernet with NVMe/TCP.
Both can deliver very low latency when built correctly. Focus on link quality, buffer behavior, and congestion management rather than chasing theoretical microseconds.
Fibre Channel basics that matter
- Keep inter‑switch links short and clean. Provision enough buffer credits for the round trip on longer links. Undersized credits look like random pauses.
- Use fine‑grained zoning. Single initiator to single target keeps chatter out of the control plane.
- Run links at the highest stable rate your optics and cabling support. Verify forward error correction status and look for incrementing error counters rather than assuming.
Ethernet with NVMe/TCP basics
- Treat it like performance networking. Verify loss, latency, and jitter on every hop.
- Enable ECN in the network and on hosts, then confirm it is actually marking under pressure. ECN is your friend for low‑loss, low‑latency flows.
- Jumbo frames can help if every hop and endpoint agrees. Consistency beats a larger MTU that is only enabled “most places.”
2) Build a simple, fair QoS plan
Storage traffic benefits from fairness far more than from absolute priority. Bursts from one database should not drown out everyone else. Keep it simple and observable.
- Define a small number of classes: storage, backup, and everything else. Map them with 802.1p or DSCP on Ethernet, and to logical classes on FC if your switches support it.
- Shape at the edge. Put per‑host caps for backup and analytics so they never steal capacity from latency‑sensitive apps.
- Police microbursts on ingress if your platform supports it. Microbursts cause transient buffer exhaustion that looks like mystery tail latency.
- Audit regularly. QoS that nobody checks quietly drifts out of shape.
3) Tune for tail latency, not just averages
Fast media is unforgiving. The 99.9th percentile often tells the truth your average hides. Work methodically.
Measure first
- Collect histograms, not only mean and p95. Track p99 and p99.9 during peak hours.
- Tag latency with path identity. You want to know which link or HBA was in the hot path when the spike happened.
Host queue depth and IO size
- Start conservative per path, then walk up while watching p99 and CPU. More queue is not always more throughput when the media is already fast.
- Align IO to 4 KiB boundaries and avoid odd sizes that force extra work in the stack.
Make the CPU predictable
- Use a performance governor on busy storage hosts. Frequency swings add jitter.
- Pin interrupts and NVMe completion queues to the right NUMA node. Cross‑socket hops are latency you can avoid.
- Prefer the “none” or “mq‑deadline” scheduler for block NVMe devices on Linux. Let the device do the work it was designed to do.
4) Multipathing that actually balances
Multipath is not about “more paths,” it is about correct paths. NVMe introduces Asymmetric Namespace Access that tells the host which paths are optimized.
- Enable native NVMe multipath where your OS supports it. It understands ANA states and keeps IO on optimized paths.
- Choose a path policy that reacts fast but does not thrash. “Queue if no path” can preserve in‑flight IO during brief events. Short, sane timeouts keep apps happy during failover.
- Verify path symmetry. If one link traverses extra optics or a slower hop, it will skew tail latency even if the aggregate bandwidth looks fine.
5) Ethernet specifics for NVMe/TCP
NVMe/TCP rides standard TCP, which is a gift for operations. You can use the same tooling and still keep latency tight.
- Turn on ECN end to end. Confirm with packet captures that ECT(0) is set and CE marks appear during congestion. Pair it with a modern congestion control algorithm on the host.
- Keep receive and transmit queues deep enough on NICs to absorb bursts, but not so deep that you create standing queues. Watch for bufferbloat symptoms.
- If you use jumbo frames, enforce consistency with automation. One mismatch can silently fragment traffic and add latency.
- Avoid link‑wide PAUSE. If you must use priority flow control for other classes, scope it narrowly. NVMe/TCP works well with ECN and well‑managed queues.
Conclusion
All‑flash and NVMe reward teams that care about the last mile of tuning. The fabric carries the truth about your workload. QoS keeps neighbors from stepping on each other. Hosts remove jitter when interrupts and queues are placed with intent. Put those pieces together and the effect compounds.
Lower tail latency. Faster failure handling. Happier databases. And the best part: you can get most of the gains with settings you already own, not a forklift. Start small, measure, and let the graphs tell you where to go next. Revisit the plan every quarter as firmware and operating systems evolve. The physics are steady, yet software keeps unlocking free performance when you keep the fundamentals clean.
Sign in to leave a comment.