Friday, October 2, 2009

Correlation, Part One

The first part of a 10 part series, I'm following on this great post to The Hardball Times by Dan Fox. I'm going to follow with different stats, however: BB%, K%, BB/K, BABIP (batting average on balls in play), Isolated Power (Slugging-average), weighted Runs Created and weighted Runs Above Average and weighted On Base Average (see fangraphs for those 3), home run/Fly ball percentage, and line drive percentage.
I'm seeing which one correlates with run scoring the most;
correlation runs from -1 to 1
-1 indicates that a high value in one is a low value in the other; 0 means that there is no correlation, and 1 is perfect correlation. The cut off for good correlation is .7, because the R^2 is .49, which is half, which is acceptable. (R^2 is correlation squared).
My data is from 2005-2008; I may redo it with 2009 data as well.
The Correlation between runs and walk percentage is .41 which is an R^2 of .16. It's not strong, so walk percentage does not relate well with scoring runs.
The equation of the line of regression (line of best fit) is y=0.01x+3.96 (it's in the mold of the slope-intercept form: y-mx+b)

Click to make bigger

1 comment: