“Welcome, newcomers. The tradition of Festivus begins with the airing of grievances. I got a lot of problems with you people! And now you’re gonna hear about it!”Frank Costanza from SEINFELD
Holiday season is finally upon us! A time when friends and family gather around the dinner table expressing gratitude and celebrating life with those we cherish most. But we all have that curmudgeonly uncle whose sole purpose is to remind us that his life isn’t all peaches and cream, and that perhaps some of us had more than just a little hand in that. Yes, he’s your bloodline’s “Frank Costanza” – the character from Seinfeld who, in rebuke of Christmas, created “Festivus” to fill the air with holiday gloom instead of cheer. An anti-holiday during which a tinsel-covered pole replaces the customary tree and “feats of strength” substitutes for joyful conversation. But its most notable feature is the ceremonial “airing of grievances” during which he expresses all the ways his family has disappointed him.
Of course, I could use this time to reflect on all my blessings with gratitude. After all, ’tis the season, right? But, naaah. Instead, I’m gonna channel my inner “Frank Costanza” and give all of you in the Performance Community a holiday “piece of my mind.” Buckle up!
My Top 6 Performance Pet Peeves
“Only six? Oh, that’s not so bad.” Don’t be mistaken – there are more where these pet peeves came from. But the spirit of giving moves me to keep this short and sweet for you loyal readers. After all, you got planes to catch and turkeys to prepare. So, without further ado, I present you my Top 6 Performance pet peeves in reverse order of aggravation.
6. Tech Twitter
Imagine that you’re an avid seafood lover but you live in an area dotted with Red Lobster locations (Red Lobster – the McDonald’s of seafood restaurants). But then, lo and behold, a critically acclaimed seafood restaurant opens up within 20 miles of your home! You can barely contain your excitement!
Upon arrival, though, you notice the neighborhood is sketchy. Streets covered in filth and sidewalks strewn with hypodermic needles. Inside, obnoxious patrons engage in loud, lewd conversations. You’re mortified.
But the food! Oh, dear lord, THE FOOD! It’s delectable! Restaurant critics actually downplayed the quality! How could you ever go back to Red Lobster after this gastronomical awakening?!
This is exactly what Tech Twitter is like. It’s a treasure trove of valuable technology insights and up-to-date research rivaled by few outlets. Many of the authors, speakers, and thought leaders you’ve admired for years from the cold distance of a blog, conference video, or whitepaper are right there within a digital arm’s reach. I interact with compiler writers, kernel maintainers, tech founders, book authors, and tool developers on a daily basis now.
But then I look around and realize. . . I’m on Twitter. Uggh! It’s a slum. A hellscape of toxic political discourse. And I’m not just talking about the random threads that Twitter forces upon you w/o prodding. Sometimes, it even originates from the very techies you follow. And trust me, the fringe beliefs espoused by some of them come from both ends of the spectrum.
But the tech info! Oh, how sumptuous the feast of TECH INFO!
5. C++ vs. Java Low Latency Debates
Stop. Please just stop. This is one of my most enduring performance pet peeves. Look, both languages can be used for low latency purposes. And with commercial JVMs like Azul Platform Prime (formerly “Zing”), it opens the door even more for Java in this space (free JVMs have made significant strides, as well).
But let me be clear: When it comes to low jitter, microseconds-level response times, the game is won or loss in the L1d cache. In such a game, data structure layout is of utmost importance. So, until Java offers finer-grained control of data layout for plain old Java objects w/o needing to resort to “unsafe” practices (pun intended), then C/C++ will continue to dominate the ultra low latency software space.
Having stated that, I’ll tell you that I’ve worked for successful HFT firms that standardized their trading stack on Java and others that standardized on C/C++. So can we please put this tired subject to rest now? No? Eh, I didn’t think you’d go for it, either. . .
4. Scaling Before Optimizing
Growing up in my family, NFL Sunday and Pay-Per-View Boxing events meant junk food time. Italian Fiesta Pizza, Leon’s BBQ, Popeye’s Chicken, etc. All of us kids started with the wings, and then we’d eventually go after the drumsticks since they had more meat to satisfy us ravenous lil runts. But my dad and uncles would look at our plates, aghast at the sight of half-eaten piles of chicken wings.
“No wonder you’re still hungry – you left so much meat on those wings! No, you can’t have any drumsticks til you finish those wings!”
This reminds me of the tech consumer. Constantly clamoring for better speeds and feeds when they’ve yet to optimize their apps to the HW they already have. It’s been this way for quite a while already, with developers reaching for thread-level parallelism before exploiting the CPU’s instruction- and data-level parallelism. But now with the advent of cloud computing, it’s easier just to spin up a whole batch of extra hosts entirely!
CPUs are powerful beasts these days, guys. You’re only hungry for more compute because you haven’t yet eaten all the compute on your plate. Finish off the wings you have before reaching for the drumsticks, please.
3. Threshold-based Automated Performance Tests
Your SREs dutifully track Service Level Objectives (SLO) defined by the business, enabling a more proactive stance with regard to Customer Experience. But you want to take it a step further. “Let’s monitor SLOs in the CI/CD pipeline and catch performance regressions at inception!” Sounds like a good idea, right? So, you (build|buy) this automated performance testing framework, and it’s been humming along for weeks now.
One day, the automation framework fires an alert – the latest commit has tripped the p95 SLO. You and the devs spend days poring through this innocuous commit but fail to pinpoint anything of consequence. And then it hits you! Weeks worth of commits have progressed through the pipeline by this time, providing tens of possible culprits of one or more regressions during those intervening weeks. But since none of them ever crossed the prescribed SLO threshold, they merged w/o incident. . . until the noise from this latest innocuous commit juuust nudged it over the edge. And therein lies the folly of threshold-based triggering, and the source of one of my most recent performance pet peeves. It’s quickly turned into one of the new systems performance myths in our industry.
Additionally, this type of automated performance testing only alerts when performance regresses. What about when response times *unexpectedly* improve? Could that commit have induced a correctness issue missed by Functional/Integration Testing?
Read my virtual lips: Employ non-parametric statistical testing. Alert on and investigate statistically-significant differences, not just regressions. And we’ll leave it there to expound upon in another article at a later date.
2. Arithmetic Mean: The IG Filter of Stats
We’ve all experienced this in some form. Maybe you’re single and you’ve matched with a prospect online, only to sit across from someone who looks like the much older, less healthy version of the person you swiped right on. Or maybe you’ve noticed that your friend’s or colleague’s IG or LinkedIn pic miraculously smooths out the results of years of smoking and frowning. It’s hilarious, yes, and I tease people who do this mercilessly. But let’s keep it a buck – it’s also deceptive. You know you don’t look like that, or at least not since college. Stop it!
And that’s precisely what the arithmetic mean does for your Response Time reports. It smooths out all the peaks and tails, and it trims the bloat. It, also, pleases the CTO. But when the company’s Web Bounce Rate alerts his superiors, he decides to review the Response Time histograms himself:
“What are these two large peaks here and then again over here? And look how far out that right tail goes! Wait. . . there are virtually no data points around the quoted average? What gives?!”
Best case scenario, he assumes you’re incompetent and puts less trust in your reporting in the future. Worst case scenario, he assumes you deliberately dissembled the facts and fires you. It’s a no-win situation. Stop running your response time data through an IG filter. FYI – this is the oldest of all my performance pet peeves.
Ladies and gentlemen
Boys and girls
My #1 most annoying, grating, aggravating Performance Pet Peeve is. . .
1. Clickbait Benchmark Articles
I take Tom Brady and Ryan Tannehill, and I evaluate them on arm strength, 40-yard dash speed, and IQ. At the conclusion, the tally comes out 3 – 0 in Tannehill’s favor. There you have it – Ryan Tannehill is the better NFL QB!
I take Tyson Fury and Deontay Wilder, and I evaluate them on punching power, max bench press, and 40-yard dash speed. At the conclusion, the tally comes out 3 – 0 in Wilder’s favor. There you have it – Deontay Wilder is the better boxer!
Both those conclusions are *clearly* ridiculous. Tom Brady is widely considered the G.O.A.T. of NFL QBs, and should’ve been cast as Captain America instead of that Chris Evans guy (don’t @ me). Tyson Fury is an undefeated Heavyweight Champion who TKO-ed the fearsome puncher Deontay Wilder in back-to-back outings.
So, then, why the disparity between evaluation and real-life outcomes? Because the elements that make up a stellar QB or Heavyweight Boxing Champion comprise a much more complex mix than what we measured. QB is the most difficult position in the NFL, and boxing is called the “Sweet Science” for a reason.
Yet, nearly every month some blogger or tech ezine writes another one of these clickbait-ey microbenchmark showdowns, and readers eat it up. This hash map vs. that hash map, this Linux distro vs. that Linux distro, this CPU vs. that CPU, all haphazardly thrown together using simplistic workloads, summarized from superficial analysis. In fact, the most arduous part of their entire process is the graphing:
“I wonder what gnuplot recipe I should use to produce these graphs. Or should I just use R or matplotlib?”
Extrapolating from oversimplified microbenchmarks can mislead, as we learned from Google’s experience of improving fleet performance despite spending more time at the allocation phase of TCMalloc. I bet that longer “allocation time” would get TCMalloc dinged in one of those silly allocator library showdowns, wouldn’t it?
Truthfully, though, I see no end in sight for this lazy type of benchmarking for two reasons:
- Benchmarking is hard and requires a level of rigor, technical analysis, and attention to detail that even specialists find difficult
- Headlines about “this tech” vs. “that tech” get clicks no matter how unsound the experimentation – after all, who doesn’t love a good showdown and the Hacker News debates which follow?
A couple Performance pet peeves just barely missed this list since I wanted to keep the word count low for the holiday season. I’ll only briefly mention them here:
Useless Use of cat
Every girlfriend in my life, both past and present, squeezes the toothpaste from the middle of the tube. Annoying but I keep it to myself. People often use “literally” in place of “figuratively”, which Merriam-Webster aids and abets because words have no meaning anymore. I let that go, too. But when software developers, the most faithful users of bash aliases for long git or cmake command abbreviations, type “cat /var/log/messages | less” instead of the shorter, less resource-intensive “less /var/log/messages”, I wanna explode!
Poor Frontend Web Performance
Do me a favor, please: Go to https://gtmetrix.com/. In the bar on that page, type in “https://www.nike.com”. Skim over the abysmal Performance Scores. Finally, scroll down and note the recorded size of Nike’s landing page. Just atrocious! Admittedly, Nike isn’t the only company that exhibits a web frontend performance issue. But they’re one of the very few with a brand large enough to ignore it. More than likely, your company isn’t part of that elite few. Please include Frontend Performance Testing in your CI/CD pipelines. Following tips from Timofey Bugaevsky’s “Acing Google’s PageSpeed Insights Assessment” article is a good place to start.
Now Pass the Gravy!
That’s it! That’s all 6 of my Top Performance pet peeves. Ah, so cathartic! It feels good to just let it all out from time to time, doesn’t it? Now everyone go eat, drink, and be merry with your loved ones, and be thankful that you have your health and each other! Happy holidays!