If you play in fantasy baseball dynasty league, you have no doubt raked through countless projections and rankings, attempting to forecast the year to come and break-out players.
There is no shortage of useful information out there, but one of the problems with pre-season fantasy baseball rankings is that they are catch-all’s, intended to appease the masses. And that makes sense, but if you really need insight into the value of a player in your league you need to only be scoring players based on the categories you carry.
Below I’ll show you a fool-proof method for creating solid rankings based on your specific league stats. I mostly used Excel for this exercise, but it can easily be done in R as well.
Step 1 – Get You Some Data
There’s no point in re-inventing the wheel here. I downloaded 2018 projections from fantasypros.com. Their Zeile Consensus Projections are solid and built from a number of sources. I’ve been told FanGraphs does an excellent job with their projections, so those are worth a look too, but the FantasyPros data had complete games easily accessible and that is a category I needed, so there you have it.
Step 2 – Clean it Up
In this step, I simply deleted the categories in each spreadsheet (hitting and pitching data are separated…for now) that I did not want in my way. I also created a calculated field for K/BB using the K and BB columns, since that is a category in my league.
Step 3 – The Super Secret Formula
So, in the end, I ultimately need to come up with value for how a player is contributing in each category and add those values up to give the player an overall score relative to how everyone else in the league scores.
To do this, we need to standardize the data so that each category uses roughly the same scoring range. This can be done by calculating the Z-score for every player in each category. Don’t worry too much if you’re not familiar with Z-Score — it’s basically a way to tell how far away from the average (mean) a player is in each category. But instead of literally using the numbers in the categories, those numbers are standardized.
Example:
Let’s say the mean number of hits and batting average are 150 and .270, respectively. If a batter is projected to get 170 hits and bat .310, we wouldn’t want to say he is 20 hits better and add that to his .040 better batting average for a score of 20.040. That would give so much weight to hits and batting average would be negligible, even if you batted an absurd .400 for the season.
This is where the standardization comes in. We can give each category an even scoring system if instead we say the faux player above is, say, 1 standard deviation above the mean for batting average and 1.4 above the mean for hits. So how do we calculate this?
It’s actually quite easy. The most tedious part is actually adding the scores at the end. First, you will need to get the average and overall standard deviation for each category in order to make this work.
=AVERAGE(data range here) =STDEVP(data range here)
I place these formulas below the row of players for each category. It looks something like this:
Now that we have an average and standard deviation for each category, we can calculate Z-scores. In Excel, there is a formula for this called “=STANDARDIZE.”
I created a new column to the right of my dataset and tallied up each players Z-score for each category to give them an overall score. The STANDARDIZE function in Excel accepts a few arguments that you will need to supply. First, you need to point to the players’ stat for a given category, followed by the average stat for the category and the standard deviation for that category (which we calculated earlier). Doing that for each category will look something like this:
=STANDARDIZE(E2,$E$303,$E$304)+STANDARDIZE(F2,$F$303,$F$304)+STANDARDIZE(G2,$G$303,$G$304)+STANDARDIZE(J2,$J$303,$J$304)+STANDARDIZE(L2,$L$303,$L$304)+STANDARDIZE(M2,$M$303,$M$304)+STANDARDIZE(N2,$N$303,$N$304)
This score basically says that if these projections hold true, then this is how valuable each player is relative to each other for your league. Some of the scores may surprise you, but remember that it is aligned with your league and does not give weight to irrelevant categories, regardless of how big of a name a player might be in reality.
If you’re good with the above, that’s fine. But I made one more adjustment to account for number of at-bat’s and innings pitched. In other words, I wanted to give more value to a player that batted .295 over 500 at-bat’s versus a player who maybe batter .300, but for only 200 at-bats.
In order to do this, I created a new column for any stat that was an average or a ratio (BA, OBP, ERA, WHIP, K/BB, etc.) and basically created a new metric that I would use instead. For each of these, I took the player stat, subtracted it from the average for that stat and then multiplied by the number of at-bats or innings pitched, depending.
ERA, for example, would now be:
(player ERA - league average ERA) * Innings Pitched
I then used that variable instead of the ERA variable to calculate my Z-score.
Step 4 – Merging the Data into one Sheet
If you harken back to the beginning of this post, recall that there are still two spreadsheets at this point — one for hitters and one for throwers. We need to bind these together and sort by the new score we created in order to get our rankings!
Here is my sheet after combining all of the columns together. I also added a new column called “Type” that indicates whether a player is a hitter or pitcher, because I knew I would need it later on for a separate project.
Step 5 – A Dose of Reality
I personally think this is a much better approach than looking at generic rankings that are spewed out annually. With that said, it has it’s flaws.
For some reason, pitchers seemed to be over-valued in my scoring and after some analysis I found that pitchers who were over-indexing in complete games were getting way to high of scores in that category. I tried a number of things to dilute this but ultimately landed on subtracting the number of CGs from each players overall score (most players had zero, of course) and somehow it seemed to work and I feel pretty good about what I’ve produced.
You may run into a similar scenario and may need to make modifications to your scores using trial and error. Hope that’s not the case and you end up with something great!
That is an interesting way to handle it, I am currently doing something similar for a hockey pool I am in. I found an issue with how I am handling it which leads me to this question, now maybe it’s my understanding of how all the math works, but how can you just add each category together and keep it relevant? Is that not the equivalent of saying that 1 HR is worth the same as 1 base hit? even though HR are harder to come by so should be more valuable in a fantasy setting? Just curious on your thoughts about this.
Hi Tyler, thanks for commenting and reading.
Your point above is exactly why you want to standardize using Z-scores. The best way I can explain it is that it tells how far above or below a player is to the average player in a given category. So let’s look at this hypothetical:
Let’s say Xander Bogaerts hits 20 homers and has 180 base hits. If I add those numbers together, it would be 200. Obviously we don’t want to do this because it puts a tremendous amount of weight on hits and players with a lot of hits are going to overshadow every other category.
With Z-score, Bogaerts instead might be 1 standard deviation above the average in home runs (let’s just say he’s barely above average here) and 2 standard deviations above the average player in hits (we’ll just say that he’s well above the average for hits in this example), for a total of 3 points. This is a much more standardized approach because it more or less makes each category apples to apples in the sense that it’s very likely that every player is going to be between the range of -3 to 3 for every category.
If that’s unclear or you need or you need thoughts on your hockey project, let me know.
How would you go about calculating the value of loses as a pitching category. For example, 8 L over 200 IP compared to 3 L over 70 IP.
Hi Andrew, thank you for the comment and apologies for the delay.
It really depends on the category you are using for your league. If you use raw number of Losses, then I would suggest you just consider the number of Ls for each player and not concern yourself with IPs. You would need to find the standard deviation, etc. of Ls like I do in my example. Let me know if you need any help calculating that piece.
On the other hand, if you are using some other type of metric/category in your league that is a ratio or average and includes IPs, then you may want to divide Ls by IPs to come with a new category and then you would want to apply the same process of standardizing to that new category instead of Losses.
I’ve done something similar to what I think you’re describing with some of my categories, and it seems to work well. For example, we use OBP as a category. But it’s more valuable to for a player to have a OBP of .350 with 500 at-bats than it is to have a player with a .350 OBP and only 250 at-bats. And so that I didn’t treat these two equally, we decided to take OBP / ABs and standardize the result of that instead of using straight OBP (even though OBP is the actual stat we use in fantasy).
Let me know if I’m missing the mark on this.