Exploratory Data Analysis: The launch angle hysteria

Nathan Ackerman
8 min readMar 8, 2021

The launch angle movement has taken Major League Baseball by storm recently.

Throughout Major League history, hitters — with the occasional exceptions of the Babe Ruths, the Hank Aarons, the Willie Mays of the world—fit one general mold: slap hitters. Singles. Stolen bases. Small ball.

But recently — and partly inspired, in my opinion, by the steroid era — hitters have been told to change that approach entirely. Now, it’s all about the long ball. Hitters are dropping their back shoulders, swinging with an uppercut and aiming for the fences. It’s summed up by the increase in launch angle, or the angle of the ball’s flight when it leaves the bat.

I wanted to explore the effect that the launch angle movement has had on players’ annual statistics in the last century. Is there any significant or noticeable difference in the gap between players’ annual batting averages and slugging percentages as the home run craze took off? One would expect that due to the all-or-nothing nature of increasing launch angles, batting averages—or the proportion of at bats in which a player gets a hit—would go down, but slugging percentages—the number of total bases a hitters gets off hits in an average at bat—would go up.

But is this really the case? Either way, what does that difference look like in the last hundred years versus the last 48 — ever since the designated hitter got introduced to the American League and set off a sequence of rapid changes in the way baseball is played and the way hitters are taught? And, does the rise in home runs in the last few decades correlate with a rise in ISO (more on that later) at the same rate? These are the questions I tried to tackle.

The data I pulled for this analysis were from SeanLahman.com (click on “2020 — comma-delimited version [Baseball Databank],” then open up the file called “batting.csv”).

The data are huge. It starts in 1871, when pitchers threw underhand and batters could request pitch locations. This, obviously, wouldn’t work. The game has changed too much since 1871 — and far after that too—to include the data. I chose to use only data from 1920 or after, as the Dead Ball Era — a time in baseball marred by very little power hitting—ended in 1919.

I also omitted individual seasons with fewer than 200 at bats — as this could’ve skewed my data as well. For example, if a player came up to the big leagues and had one at bat, in which he homered, his batting average for the season would’ve been 1.000 and his slugging percentage 4.000, though that pace is unsustainable over a full season. That possibility —or a similar one — had to be removed from my data.

I used pandas to wrangle my data to take out the pre-1920 years and the seasons with fewer than 200 at bats for a given player. Here you can see the first and last five entries in the database after my initial wrangling.

Still, the data was lacking in the statistics I wanted to explore. It had players’ hits (H) and at bats (AB) in a season, but it didn’t have batting average (BA)—simply hits divided by at bats. It also didn’t have total bases (TB, necessary to calculate slugging percentage), which is just singles + doubles*2 + triples*3 + home runs*4. (The data didn’t include singles, but baseball stat sheets rarely do — it’s just hits minus doubles minus triples minus homers.) Using the above formulas, I calculated total bases, then calculated slugging percentage (SLG) using TB divided by AB.

Finally, I was interested in finding the difference between slugging percentage and batting average — isolated slugging or ISO, used to measure a player’s raw power. I calculated that using SLG minus BA.

Here’s my data now with the added columns of BA, TB, SLG and ISO.

Now that my data was all wrangled and prepped with the necessary stats, it was time to begin exploring.

First on the agenda was to plot batting average vs. slugging percentage in the four eras since the dead ball era: The Lively Ball Era (1920–1945), Post-War Era (1946–1960), Expansion Era (1961–1973) and Designated Hitter Era (1973-present). I did so in four different subplots, and the chief focus here was to examine the slope of each. My expectation was that the plots would become gradually steeper with each subsequent era — this is because I anticipated small increases in batting average to correspond with relatively larger and larger increases in slugging percentage as time went on, given that an increasing proportion of hits went for extra bases as time went on.

What I found — and what I confirmed after printing the slope of the best fit line for each of these plots — wasn’t what I expected. I found that the slopes of the plots generally decreased over time and became flatter with each subsequent era.

This caught me off guard, but I had a theory: The changes in the way hitting was approached didn’t change as much in these eras as they have in the last couple decades. Essentially, my data was too broad, and the relevant time period I was trying to analyze was a too-insignificant part of my data to reveal the impact of launch angle as compared to the other eras. So, I narrowed the scope.

I decided to compare the same stats, but instead breaking the designated-hitter era into three different categories: One from 1973–1986, one from 1987–2000, one from 2001–2014 and one from 2016–2020. (The final time frame is shorter because 2016 is when we started to see dramatic differences in launch angle, and I wanted to isolate that data.) Here’s what I found:

Hm. Still nothing significant in support of my hypothesis, as I expected to see a dramatic increase in the slope with each subsequent time frame (especially the last), but I didn’t.

Why are the four subplots in different colors, you might ask? I decided it might be easier for me to visualize these plots compared to each other if they were on the same graph to see if any new trends became noticeable. It didn’t give me anything new, so I’m not including it here.

I thought maybe it might change things up if I focused instead in ISO slugging, to see if it increased as time passed and hitting underwent dramatic philosophy shifts. I decided not to graph it by year this time, but rather by home runs, to answer the simple question: As home runs increased, did ISO increase at a proportional rate?

What I found was, again, that it did not — that among the four different eras, as home runs increased, ISO increased at a consistent rate.

I decided to try next to uncover the relationship between ISO and year — namely, whether or not home runs have increased in baseball history at the same rate as ISO.

To do this, I first plotted years against home runs to make sure my assumption that home runs increased as time went on was accurate.

As I predicted, there is a general upward trend — toward the bottom, it’s pretty stagnant, but there are more players with higher home run totals more recently, as players started swinging for the fences and the launch angle movement really took off.

Next, I wanted to graph the changes in ISO over time. If my hypothesis was correct — that the increase in home runs as hitting philosophies changed corresponded with a dramatic spike in ISO, we would see a steeper slope to the years vs. ISO graph than on the years vs. home runs graph.

Here’s what that years vs. ISO graph looked like:

The result was not what I expected based on my original hypothesis, but it falls in line with my previous findings: As time passed and as baseball became more home-run-oriented, ISO did not accelerate at a disproportionately high rate.

All of my findings in these Exploratory Data Analyses went against my hypotheses that we would’ve started to see large increases in slugging percentage to accompany smaller increases in batting average— and that ISO would increase dramatically as time passed — simply due to the fact that a higher proportion of players’ hits would have been home runs and other types of extra base hits as players started swinging upward and trying to generate higher launch angles.

For me, it put things into context and rationalized the changes we’ve seen in baseball over the past few years. I had started to see the game as an entirely different one than my dad grew up watching, or especially the one my grandfather grew up watching. There is some truth to my preconception. Power has definitely increased, players are being valued more for their ability to hit the long ball, and there is no doubt that the slap-hitter, small-ball-minded player that the league used to value has now had to make way for the batters who can launch the ball 450 feet and beyond.

But overall, these changes aren’t as dramatic as I had thought. Maybe the launch angle movement is still in its initial stages, too early to compare with eras and periods of baseball that lasted decades at a time. But based on the sheer volume of players in the league right now, there have already been so many seasons that I pulled data from even in the last five years that I still believe my findings hold significant weight, and we should refrain from blowing out of proportion the changes that baseball has seen in the past few years and decades. It’s still the same game, and the data largely reflects that.

--

--