Returns to skill in professional golf: a quantile regression approach.
Kahane, Leo H.
Introduction
The returns to skill in professional sports have been the focus of
a growing body of research in sport economics. Scully (1974) was the
first paper to attempt to link player skills with compensation in Major
League Baseball. Jones and Walsh (1988) studied salaries in the National
Hockey League, and Kahn and Sherer (1988) examined salaries in the
National Basketball Association. Kahn (1992) studied performance and pay
in the National Football League. These papers represent only a small
sample of research on salary determination in sports, and the field has
made great strides since these early papers were published.
In addition to the above sports, researchers have studied salary
determination in professional golf. For example, one of the earliest
papers (discussed in greater detail in the next section) is by Shmanske
(1992), who used a cross-section of data from 1986 to estimate the
relationship between various golf skills and tournament earnings. One of
the issues ignored by this paper is the fact that golf earnings are
highly positively skewed. Subsequent papers (e.g., Moy & Liaw, 1998;
Shmanske, 2000; Nero, 2001) partially address this problem by
transforming earnings into natural logs before regressing them on
skills. While employing a log transformation may decrease skew, this
comes at a price as it ignores some potentially interesting
characteristics of the earnings distribution that are captured by its
skew. An alternative approach, which is the focus of this paper, is to
examine the linkage between professional golfers' earnings and
their skills with the use of quantile regression. Quantile regression
not only is better equipped to deal with skewed data, but it also allows
us to more fully explore the returns to skill by considering non-central
points on the conditional earnings distribution.
Previous Research on Golf Earnings
There have been a handful of published papers focusing on the
relationship between skills and earnings in professional golf. (1) One
is the aforementioned paper by Shmanske (1992), who uses a cross-section
of the 60 top money winners from the 1986 Professional Golfers
Association (PGA) Tour to study how practice improves the marginal
product of golfers' skills, which in turn affects their earnings.
Based on his empirical findings he notes, among other things, that the
value of the marginal product from putting may be in the range of $500
per hour of practice. Later research by Moy and Liaw (1998) uses a
cross-section for golfers in the PGA, the Ladies Professional Golf
Association (LPGA), and Senior PGA Tour golfers' data for 1993 to
estimate the relationship between earnings and various golfing skills.
They find that long driving, good putting, and iron play are all
important for success in the PGA. (2) In comparison, iron play and short
game skills are more important for players in the LPGA and the Senior
PGA.
On a related topic, Shmanske (2000) considers the earnings
differential of players in the PGA and LPGA by examining a cross-section
of data for each group from 1998. He notes that while men tend to play
for bigger purses in the PGA than do women competing in the LPGA, men
also generally play longer courses and more rounds than do women. He
finds that, controlling for skill levels, women in the LPGA are not
underpaid in comparison to the men in the PGA. A follow-up by Rishe
(2001) examines the same issue of earnings differentials, but in this
case between PGA and Senior PGA players. He finds that the primary
reason for the greater earnings of PGA golfers is that they earn a
greater rate of return on their skills than equally skilled golfers in
the Senior PGA. He proposes that this difference may be attributable to
differing demand conditions (e.g., television viewership, etc.) for
golfing skills, but age discrimination may also be a reason.
Work by Alexander and Kern (2005) is aimed at testing the adage of
"drive for show, putt for dough." The adage implies that
driving distance is less important in determining the performance (and
hence earnings) of professional golfers than is putting and other
"short game" skills. The authors employ a panel dataset for
PGA players covering the period of 1992-2001 to test this hypothesis,
and to see if changes in equipment over the years (e.g., the increased
size of drivers, etc.) have affected the importance of driving versus
putting. They conclude that there is some limited support that the
importance of driving has increased over the years, but that putting
remains the most important skill.
Recent research on the linkage between skill and earnings in golf
focuses on the fact that skills do not directly determine earnings. For
example, Callan and Thomas (2007)--motivated by Scully (2002)--develop a
structural model where skills determine score, which in turn determines
rank performance, which ultimately determines earnings. Using
cross-sectional data from the 2002 PGA Tour they find that their
structural equation approach produces somewhat different estimates for
the marginal product of various skills in comparison to reduced-from,
single-equation studies. (3)
Earnings, Skewness, and the Advantage of Quantile Regression
One of the features that nearly all of the papers described above
share is the use of linear regression models that focus on the behavior
of the conditional mean of the dependent variable, the ordinary
least-squares (OLS) estimation method being the one most commonly
employed. (4) This paper follows a different approach by utilizing
quantile regression for estimating the returns to skill in golf. (5)
Quantile regression has a number of features that may make it a better
choice of estimation methods in this context. For example, one key
advantage of quantile regression is that it is better equipped to handle
cases where skewness and outlier effects are present in the dependent
variable. In the case of golf tournament earnings, it is abundantly
clear that the data are strongly positively skewed. Evidence of skewness
and the presence of outliers are provided graphically in Figures 1A and
1B. The first is a box plot of real earnings per event
("earnings" hereafter) for PGA Tour golfers during the 2004 to
2007 seasons. The graph clearly demonstrates a strong positive skew with
many outliers. Figure 1B shows the kernel density estimate for the same
data and includes an overlay of a normal distribution. Again, the data
appear to be strongly positively skewed and the shape of the kernel
density estimate appears to be non-normal.
[FIGURE 1 OMITTED]
In addition to these figures, Table 1 provides empirical tests for
skewness and non-normality of earnings. The tests, which are based on
skewness, kurtosis, and a joint test for both, strongly reject
normality. (6)
As for the causes of this skewness in PGA earnings, two reasons
emerge. First, the payout structure in PGA tournaments is nonlinear.
This point is made clearly in Scully (2002, p. 336) as he notes that
prize money in golf is based on the rank-order finish of tournament
participants with a nonlinear payout structure. Golfers in a tournament
who make the "cut" (i.e., those who finish in the top 50% of
the field after the first two rounds of the tournament) are eligible for
some share of the purse. Of those making the cut, the typical payout
structure upon the completion of the tournament is such that the first
place finisher receives 18% of the purse, the second place finisher
receives 10.8%, the third place receives 6.8%, the fourth place finisher
receives 4.8%, and so on. This nonlinear, convex payout structure (with
respect to rank-order finish) contributes to the skewness in PGA
earnings.
A second reason for the skewness in per event earnings is the
presence of some extraordinarily talented golfers. Even with the
non-linear payout structure noted above, per event earnings could still
be non-skewed across golfers if tournament wins are spread across a
large number of golfers. The fact is, however, that tournaments wins are
clustered among a small group of highly talented golfers. For example,
during the 2004 to 2007 period there were a total of 135 first place
wins across all PGA tour events. Of these 135 first place wins, three
golfers (Tiger Woods, Vijay Singh, and Phil Mickelson) collectively took
49 (or 36%) of them. (7)
The clustering of winning, together with the non-linear payout
structure, lead to skewed per event earnings in professional golf. If we
proceed with a simple conditional mean regression estimation (such as
OLS) when the data are strongly skewed then the non-normality of the
errors will create difficulties with inference tests based on the usual
standard errors and t-statistics. Furthermore, the results may not
describe the experience of the "typical" golfer since the
regression coefficients may be strongly influenced by the skewness and
outlier effects (an illustration of this fact will be provided later).
As noted earlier, one common solution to reducing the skew in the
dependent variable is to transform the data into natural logs and then
estimate a conditional mean regression of the natural logs of the
dependent variable on the covariates. However, the normality tests
presented in Table 1 for the natural log of earnings also shows evidence
of non-normality and thus estimating a semi-log form does not entirely
solve the problem of non-normality in the errors in this case. Quantile
regression, however, is well equipped to deal with such a problem.
Perhaps the greatest advantage to quantile regression is that it
allows for us to consider non-central points on the conditional
distribution function of the dependent variable. That is, conditional
mean regression only gives us information about how various covariates,
such as driving distance, may affect the conditional mean earnings for
golfers. (8) Quantile regression allows us to explore the possibility
that, say, changes in driving distance may affect golfers differently at
different points on the conditional earnings distribution. Simply put,
quantile regression results may provide a more complete understanding of
the effects of various covariates on earnings.
Model, Estimation Approach and Data
The model used in this paper to estimate earnings of PGA golfers is
similar to those used by others (e.g., Shmanske, 1992; Moy & Liaw,
1998; Alexander & Kern, 2005), and contains measures on various
skills, overall professional experience, and physical characteristics of
the golfer. Equation (1) presents the general form:
[y.sub.i] = [alpha](q) + [beta](q)[x.sub.i] + [[epsilon].sub.i](q)
(1)
The dependent variable, [y.sub.i], is measured as real earnings per
PGA event (in thousands of 2007 dollars). (9) The vector contains the
covariates expected to explain golf earnings, [beta] is the vector of
coefficients to be estimated, and [[epsilon].sub.i] is the error term.
Note that the superscript (q) denotes the specific quantile associated
with equation (1). (10) The vector [x.sub.i] contains five measures of
golfing skill that contribute to lower scores and ultimately greater
earnings. The definition of these variables and their expected impact on
earnings is as follows:
* greens in regulation: the percent of time a player was able to
hit the green in regulation (greens hit in regulation/holes played x
100). A green is considered hit in regulation when the golfer has two
putts from the green to make par. For example, on a par-5 hole, a green
is hit regulation if the ball in on the putting green by the third shot,
leaving two putts left to earn par. This measure represents a
golfer's skill in iron play. A positive coefficient is expected for
this variable. (11)
* putting average: the average number of putts needed to finish a
hole per green hit in regulation. Other things equal, the fewer the
putts, the lower the score and thus a negative coefficient is expected.
* save percentage: the percent of time a golfer was able to get the
ball in the hole in two shots or less following landing in a greenside sand bunker (regardless of score). This skill captures the golfer's
ability to salvage his/her score by accurately chipping out of the sand
and as such we expect a positive coefficient.
* yards per drive: the average number of yards per measured drive.
(12) Other things equal, including accuracy, longer drives leave the
ball closer to the hole and should generally lead to reduced scores and
thus a positive coefficient is expected.
* driving accuracy: the percentage of time a tee shot comes to rest
in the fairway. All else equal, greater accuracy in driving should
result in lower scores and as such a positive coefficient is expected.
In addition to the skill measures described above, vector in
equation (1) also contains controls for experience. The first is years
pro, which is the number of years that have passed since the player has
debuted as a professional golfer on the PGA Tour. The second is simply
the square of years pro. It is expected that earnings will increase with
greater experience on the PGA Tour, but with a diminishing effect. Thus
a positive coefficient is expected for years pro and a negative
coefficient is expected for its square. (13)
Lastly, vector [x.sub.i] includes two measures on the physical
characteristics of golfers. The first is weight (measured in pounds) and
the second is height (measured in feet). These measures are included to
control for the possibility that physical characteristics of golfers may
affect performance and, hence, earnings. For example, it may be the case
that taller players require less effort when driving the ball and as
such may be more consistent with their drives (i.e., they may have less
variance in their diving distance and accuracy). Or it may be the case
that, other things equal, heavier players become more fatigued during a
tournament and this could affect their consistency as well.
Data on earnings and the above noted measures for players were
collected for the 2004 through 2007 PGA Tours; Table 2 provides summary
statistics. The median real earnings per event in thousands (not shown
in Table 2) is 34.71. The median is much less than the reported mean of
52.02, and is illustrative of the strong positive skew in the earnings
data.
Empirical Results
As was noted, one of the disadvantages of conditional mean
estimation methods, such as OLS, is that the regression results may be
strongly influenced by outlier effects. As a means of illustrating this
problem in the current context, Table 3 presents the top five most
influential observations for a simple levels OLS regression estimate of
equation (1). The table reports statistics for 'DFBETAS' for
each of the five skill measures included in the regression. (14) The
results reveal a very interesting pattern. Namely, Tiger Woods alone
strongly affects the estimated coefficients on four of the five skill
measures. As an example, by including Tiger Woods' performance in
2006 the estimated coefficient for greens in regulation is increased for
all observations by nearly one full standard deviation! Clearly, the use
of a simple conditional mean estimation method such as OLS would likely
produce misleading values for the estimated coefficients. Fortunately,
quantile regression is, by its nature, immune to such outlier effects
and as such it is likely to give a more realistic estimation of the
returns to various golfing skills for the "typical" golfer.
Table 4 presents the quantile regression estimates for equation
(1). The first column contains, for comparison purposes, the simple OLS
estimation with robust standard errors reported in parentheses. Columns
2 through 6 report quantile regression estimates for the 10th, 25th,
50th, 75th, and 90th conditional quantiles. The standard errors for
these estimated coefficients are computed using a bootstrap method with
2,000 replications. (15) Finally, column 7 shows the Wald statistic for
a test of equivalence between the estimated coefficients for the five
quantiles. For all regressions, the covariates are centered at their
means.
Starting with the OLS results we see that, with the exception of
driving accuracy, all of the skill and experience measures have the
predicted signs. The negative sign on driving accuracyis interesting and
may reflect a tradeoff between driving accuracy and distance. (16) As
for statistical significance, four of the five coefficients to the skill
measures (greens in regulation, putting average, save percentage, and
yards per drive) are statistically significant at the 5% level or
smaller. The coefficients to driving accuracy-and years pro achieve a
significance level of 10% and the coefficients to years pro squared,
weight, and height fail to achieve an acceptable level of significance.
Turning to the quantile regression estimates, the estimated coefficients
for greens in regulation, putting average, and save percentage have the
expected signs and are statistically significant at the 1% level in all
regressions. The variable yards per drive is statistically significant
at the 5% level or less for 10th and 25th quantile regressions, but
tends to be less significant for greater quantile regressions. This
suggests that driving distance is important for those at the lower end
of the earnings distribution, but becomes less important for those at
the upper end. Or, in other words, the adage "drive for show, putt
for dough" seems to apply to the elite golfers, but not necessarily
for golfers at the lower end of the earnings distribution. Finally, the
Wald tests for statistical equivalence of the estimated coefficients
across the quantiles reject the null hypothesis at the 5% level of
significance for all of the skill measures with the exception of yards
per drive, which is rejected at the 10% level of significance.
Rather than discussing the size of each coefficient separately
(there are a total of 60), I will discuss several prominent results.
Notice that the coefficient on greens in regulation for the OLS column
suggests that, other things equal, an increase in one percentage point
in this measure increases earnings by approximately $7,485. Comparing
this result to the median quantile estimation (column 4), we see that an
increase in one percentage point in greens in regulation is expected to
increase earnings by about $4,111. This is a considerable difference
between these two predictions and reflects the effects of the skewness
(and the effects of outliers) in the earnings measure. Even more
interesting is the fact the estimated coefficient for greens in
regulation increases steadily as we move from the 10th quantile, (1.486)
to the 90th quantile (6.377). This implies that the effect of an
increase in greens in regulation is quite different for golfers with low
earnings in comparison to golfers with high earnings, other things
equal. The Wald test in column 7 confirms that the difference is
statistically significant. Thus we can conclude that not only does an
increase in greens in regulation positively affect expected earnings
(i.e., producing positive "location shift" of the conditional
earnings distribution), but it also means that the increase in expected
earnings is greater at higher points on the conditional distribution
function and thus produces a widening in the expected earnings (i.e.,
producing a positive "scale shift").
We find similar results with respect to putting skill. The OLS
predicted effect of reducing putting average by one stroke ($700,082) is
nearly twice that for the median quantile regression result ($374,041).
In addition, notice the effects of this variable become greater for each
successive conditional quantile. These results imply, for example, that
a one stroke reduction in putting average would increase expected
earnings by about $182 thousand for golfers at the 10th quantile in
earnings, while the same reduction would increase expected earnings by
more than $717,000 for golfers in the 90th quantile of earnings. In sum,
improvements in putting skill increase expected earnings and at the same
time tend to widen them; the latter result being in part due to the
nonlinear payout structure in PGA tournaments as noted earlier.
Figure 2 provides an alternative means of considering the effects
of the covariates on both the location and scale of earnings. Each graph
in the figure tracks the evolution of the estimated coefficient of each
covariate for greater quantiles. The shaded area shows the range for the
95% confidence envelope for the quantile regression estimate. Lastly,
the dashed line shows the estimated OLS coefficient for the variable.
Generally speaking, a plot for a quantile coefficient (and confidence
envelope) that is above the zero axis indicates that earnings increase
with an increase in the covariate (i.e., produces a positive location
shift of the conditional earnings distribution), while a plot lying
below the zero axis implies the opposite. (17) Furthermore, plots that
slope upward tend to widen the scale of the conditional distribution
function. Whereas downward-sloping plots indicate that the scale tends
to narrow as the covariate increases. Thus, as an example, we see that
the plot for greens in regulation in Figure 2 is above the zero axis and
has a positive slope. This reflects our previous discussion of this
coefficient. Namely, that earnings are increasing with greens in
regulation and that the dispersion or scale of earnings is increasing as
well. We can see from Figure 2 that save percentage has a similar
qualitative effect as greens in regulation. The plot for putting average
is below the zero axis and has a negative slope. Thus, this tells us
that decreases in the average number of putts per hole has the effect of
increasing earnings and that it also tends to increase the scale of
earnings, all else equal.
Conclusion
The returns to skill in professional golf have been considered in a
number of academic research papers. All of the previous research,
however, has employed some version of a conditional mean estimation
procedure that may be inappropriate and misleading in light of the fact
that earnings in the PGA are strongly skewed. This paper has taken a
different approach--that of conditional quantile estimation. Our
findings from estimated quantile regressions indicate that not only do
the effects of various skills differ from those employing conditional
mean estimation, but that the impact of changes in several key skill
measures is more complex than what is implied from simple OLS-produced
estimates. Like previous studies, improvements in iron play, putting
ability, and sand saves all serve to increase expected earnings. The
results presented in this paper, however, go further and show that
improvements in these skills contribute a widening of the conditional
earnings distribution. In addition, to the extent that professional
golfers practice various skills with an eye on increasing earnings, the
results presented in this study may provide better guidance as to how
much time to spend practicing various skills. For example, consider a
golfer in the 25 percentile of real earnings per event. According to the
above results, if this golfer could reduce his/her putting average by
one standard deviation (i.e., by 0.02) then his/her earnings would be
expected to increase by an estimated $5,739, which represents
approximately a 32% increase in per event earnings. However, a golfer in
the 75 percentile for earnings per event would witness an approximate
$9,973, or 16% increase in their earnings for the same reduction in
their putting average.18 This kind of differential impact on improved
putting (and for other skills) may affect the time individual golfers
spend on developing specific skills.
[FIGURE 2 OMITTED]
There are several possible extensions to the above work. One is to
consider alternative skill measures in the earnings function. For
example, measures such as birdie conversions or hitting out of the rough
may be tested. In addition, the above estimates used pooled cross
sections of data for professional golfers. It may be desirable to
control for unobserved heterogeneity for individual golfers with a
random- or fixed-effects quantile regression. This type of estimation
procedure, however, is currently unavailable for quantile regression
(see Koenker, 2005). Perhaps future developments in the field will make
such an estimation approach possible.
References
Alexander, D. L., & Kern, W. (2005). Drive for show and putt
for dough? An analysis of the earnings of PGA Tour golfers. Journal of
Sports Economics, 6(1), 46-60.
Callan, S. J., & Thomas, J. M. (2007). Modeling the
determinants of a professional golfer's tournament earnings: A
multiequation approach. Journal of Sports Economics, 8(4), 394-411.
Ehrenberg, R. G., & Bognanno, M. L. (1990a). Do tournaments
have incentive effects? Journal of Political Economy, 98(6), 1307-1324.
Ehrenberg, R. G., & Bognanno, M. L. (1990b). The incentive
effects of tournaments revisited: Evidence from the European PGA Tour.
Industrial and Labor Relations Review, 43(3), S74 S88.
Hamilton, B. (1997). Racial discrimination and professional
basketball salaries in the 1990s. Applied Economics, 29(3), 287-296.
Hao, L., & Naiman, D. Q. (2007). Quantile regression. Thousand
Oaks, CA: Sage Publications, Inc.
Jones, J. C. H., & Walsh, W. D. (1988). Salary determination in
the National Hockey League: The effects of skills, franchise
characteristics, and discrimination. Industrial and Labour Relations Review, 4, 592-604.
Kahn, L. M. (1992). The effects of race on professional football
players' compensation. Industrial & Labor Relations Review,
45(2), 295-310.
Kahn, L. M., & Shere, P. D. (1988). Racial differences in
professional basketball players' compensation. Journal of Labor
Economics, 6(1), 40-61.
Koenker, R. (2005). Quantile regression. Cambridge, UK: Cambridge
University Press.
Koenker, R., & Hallock, K. F. (2001). Quantile regression.
Journal of Economic Perspectives, 15(4), 143-156.
Moy, R. L., & Liaw, T. (1998). Determinants of professional
golf tournament earnings. American Economist, 42(1), 65-70.
Nero, P. (2001). Relative salary efficiency of PGA Tour golfers.
American Economist, 45(2), 5156
Rishe, P. J. (2001). Differing rates of return to performance: A
comparison of the PGA and senior golf tours. Journal of Sports
Economics, 2(3), 285-296.
Scully, G. W. (1974). Pay and performance in Major League Baseball.
American Economic Review, 64(6), 915-930.
Scully, G. W. (2002). The distribution of performance and earnings
in a prize economy. Journal of Sports Economics, 3(3), 235-245.
Shmanske, S. (1992). Human capital formation in professional
sports: Evidence from the PGA Tour. Atlantic Economic Journal, 20(3)
66-80.
Shmanske, S. (1998). Price discrimination at the links.
Contemporary Economic Policy, 6(3), 368-378
Shmanske, S. (2000). Gender, skill, and earnings in professional
golf. Journal of Sports Economics, 1(4), 385-400.
Shmanske, S. (2008). Skills, performance and earnings in the
tournament compensation model: Evidence from PGA Tour microdata. Journal
of Sports Economics, 9(6), 644-662.
Vincent, C., & Eastman, B. (2009). Determinants of pay in the
NHL: A quantile regression approach. Journal of Sports Economics, 10(3),
256-277.
Endnotes
(1) Other topics of research on golf have included such things as
the pricing for a round of golf (e.g., Shmanske, 1998) and the incentive
effects that tournament-style payoffs have on participants' effort,
(e.g., Ehrenberg and Bognanno, 1990a, 1990b).
(2) See Shmanske (1992) for brief explanation of the object of
playing golf and the kinds of skills needed to be successful in playing
the game.
(3) Shmanske (2008) also considers a structural equation approach,
but uses individual tournament data (as opposed to year-long tournament
averages). The use of individual tournament data allows him to take into
account the specific aspects of various tournaments (e.g., overall
course distance differences). In addition, the microdata employed allows
him to incorporate measures of variance and skew in player performance
measures in addition to the typical mean measures.
(4) The Callan and Thomas (2007) paper employs a two-stage
least-squares estimation method. Alexander and Kern (2005) employ a GLS estimator with random effects.
(5) Others who have employed quantile regression in the context of
sports include Hamilton (1997) for professional basketball and Vincent
and Eastman (2009) for professional ice hockey. See Koenker and Hallock
(2001) for an introduction to the basics of quantile regression. For a
more comprehensive presentation of quantile regression see Koenker
(2005).
(6) The tests shown in Table 1 are computed using Stata's
"sktest."
(7) Tiger Woods had 22 first-place finishes, Vijay Singh had 16,
and Phil Mickelson had 11, (http://www.pgatour.com).
(8) And even this may be of little value when the mean value of
earnings is not centrally located due to the skewness in the data.
(9) All data for PGA Tour were gleaned from the ESPN website:
http://espn.go.com/golf/. Variable definitions were obtained from the
PGA Tour website: http://pgatour.com/r/stats/.
(10) In the case of a simple conditional mean estimation (e.g.,
OLS) of equation (1), the superscripts would not appear in the equation.
(11) Alexander and Kern (2005) note that this measure may not be a
'pure' measure of ability as a golfer's greens in
regulation value may be dependent on their driving ability. This is also
true for putting average and save percentage.
(12) As noted on the website: http://www.pgatour.com, measurements
are taken on two holes per round. The two holes that are selected face
in opposite directions to counteract the effect of wind. Distance is
measured to the point at which the ball comes to rest regardless of
whether they are in the fairway or not.
(13) The age of the player (and its square) were also considered as
controls for experience. The results were essentially unchanged when
these measures were used in place of years pro (and its square).
Attempts to use both measures in the same regression led to clear signs
of multi-collinearity. This is expected as the simple linear correlation
between age and years pro is approximately 0.97.
(14) The DFBETA for golfer i for covariate j is computed as:
[DFBETA.sub.ij] = ([b.sub.j] - [b.sub.j(i)])/se([b.sub.j(i)]). That is,
it is the difference between the estimated coefficient [b.sub.j]
including player i, minus the same coefficient estimated when player i
is excluded ([b.sub.j(i)]), divided by the standard error of the
estimated coefficient when the player is excluded. The larger the
absolute value of the DFBETA, the greater the influence the observation
has on the estimated coefficient. Observations where the absolute value
of the DFBETA is greater than 2/[absolute value of n], equal to 0.072 in
this case, are considered highly influential observations.
(15) Using a bootstrap estimation for the standard errors reduces
the problems associated with inference using assymptotic-based standard
errors and t-statistics whose validity depends on the normality
assumption. See Hao and Naiman (2007) for discussion on this issue.
(16) Thanks are due to an anonymous referee for pointing out this
possibility. The idea here is that players who attempt to drive the ball
further may do so at the expense of accuracy. Indeed, the correlation
between yards per drive and driving accuracy for the sample used in this
study is 0.63.
(17) A plot that generally falls on the horizontal axis, such as
years pro squared, implies the covariate does not have a statistically
significant impact on earnings.
(18) Real earnings per event for golfers in the 25th and 75th
percentiles are approximately $17,963 and $62,161, respectively. Using
the coefficient for putting average shown in Table 4 for golfers in the
25th percentile, the expected relative gain to reducing their putting
average by 0.02 is: (-0.02) x(-286.97)/17.963 = 0.319. Similarly, for
golfers in the 75th percentile, we have: (-0.02)x(-498.63)/62.161 =
0.160.
Leo H. Kahane [1]
[1] Providence College
Leo H. Kahane is an associate professor of economics at Providence
College. His research interests include sport economics, international
trade, and political economy. He is also the editor of the Journal of
Sports Economics.
Author's Note
This paper was prepared for the Western Economic Association
International Conference in Vancouver, Canada, June 29-July 3.
Table 1: Normality Tests for Real and Log Earnings per Event
Variable Skewness Kurtosis Pr(Skewness)
real earnings 4.698 37.769 0.000
per event,
thousands
Ln (real -0.111 3.374 0.200
earnings per event)
Variable Pr(Kurtosis) joint test
X2(2) Prob > X 2
real earnings 0.000 791.4 0.000
per event,
thousands
Ln (real 0.050 5.480 0.065
earnings per event)
Table 2: Summary Statistics (n = 778)
Variable Mean Std. Dev. Min Max
real earnings per 52.02 62.10 0.94 682.65
event, thousands
greens in regulation 64.55 2.92 54.30 74.10
putting average 1.78 0.02 1.71 1.86
save percentage 48.95 5.93 31.80 68.1
yards per drive 288.57 8.78 258.70 319.6
driving accuracy 63.40 5.31 41.90 78.4
years pro 13.38 6.47 0.00 34.00
weight 181.94 19.97 136.00 265.00
height, (in feet) 5.96 0.19 5.25 6.42
Table 3: DFBETA Results for Top 5 Most
Influential Observations.
Greens in Regulation
Player Year DFBETA
Tiger Woods 2006 0.999
Tiger Woods 2007 0.669
Vijay Singh 2004 0.473
Sergio Garcia 2004 0.218
Tiger Woods 2005 0.196
Save Percentage
Player Year DFBETA
Tiger Woods 2006 0.495
Tiger Woods 2005 0.236
Vijay Singh 2005 0.190
Phil Mickelson 2004 0.163
Adam Scott 2004 0.161
Driving Accuracy
Jose Coceres 2007 0.260
Zach Johnson 2007 0.194
Fred Funk 2007 0.187
Jim Furyk 2006 0.151
Geoff Ogilvy 2006 0.120
Putting Average
Tiger Woods 2007 -0.648
Tiger Woods 2005 -0.473
Ernie Els 2004 -0.315
Tiger Woods 2004 -0.247
Jim Furyk 2006 -0.228
Yards per Drive
Tiger Woods 2005 0.625
Tiger Woods 2006 0.277
Tiger Woods 2007 0.223
Jose Coceres 2007 0.148
Geoff Ogilvy 2006 0.136
* Based on OLS regressions in levels. The cutoff
value for DFBETAs that may present problems is
when [absolute value of DFBETA] > 2/(N^0.5)
Table 4: OLS and Quantile Regression Results for
Real Earnings per Event
VARIABLES (1) (2)
OLS q10
greens in regulation 7.485 *** 1.486 ***
(1.219) (0.270)
putting average -700.082 *** -182.121 ***
(97.118) (23.157)
save percentage 2.148 *** 0.396 ***
(0.332) (0.090)
yards per drive 0.926 ** 0.257 **
(0.371) (0.105)
driving accuracy -0.972 * 0.257
(0.501) (0.166)
years pro 1.460 * -0.103
(0.873) (0.331)
years pro squared -0.036 0.005
(0.028) (0.010)
weight -0.107 -0.062 **
(0.089) (0.030)
height, (in feet) 15.093 0.983
(9.175) (3.252)
Constant 51.207 *** 15.527 ***
(1.802) (0.565)
Observations 778 778
R-squared/Pseudo 0.299 0.130
VARIABLES (3) (4)
q25 q50
greens in regulation 2.158 *** 4.111 ***
(0.328) (0.619)
putting average -286.970 *** -374.041 ***
(37.512) (47.718)
save percentage 0.662 *** 1.074 ***
(0.102) (0.210)
yards per drive 0.462 *** 0.407 *
(0.121) (0.228)
driving accuracy 0.463 ** -0.107
(0.186) (0.364)
years pro -0.847 * -0.474
(0.507) (0.684)
years pro squared 0.031 * 0.022
(0.018) (0.024)
weight -0.044 -0.131
(0.044) (0.090)
height, (in feet) 0.587 4.824
(4.670) (8.284)
Constant 22.544 *** 38.391 ***
(0.852) (1.465)
Observations 778 778
R-squared/Pseudo 0.128 0.139
VARIABLES (5) (6)
q90
greens in regulation 6.487 *** 6.377 ***
(0.998) (2.114)
putting average -498.630 *** -717.563 ***
(95.964) (204.680)
save percentage 2.276 *** 3.375 ***
(0.415) (0.997)
yards per drive 0.070 1.602 *
(0.455) (0.898)
driving accuracy -1.593 ** -2.086 *
(0.761) (1.242)
years pro 0.595 2.554
(1.140) (2.316)
years pro squared -0.019 -0.064
(0.036) (0.079)
weight -0.032 -0.115
(0.118) (0.191)
height, (in feet) 4.775 5.992
(16.093) (24.889)
Constant 64.458 *** 102.736 ***
(3.079) (6.359)
Observations 778 778
R-squared/Pseudo 0.165 0.229
VARIABLES (7)
Ho: equivalent
coefficients
greens in regulation 7.69 ***
putting average 5.71 ***
save percentage 6.30 ***
yards per drive 1.99 *
driving accuracy 2.56 **
years pro 1.25
years pro squared 1.1
weight 0.57
height, (in feet) 0.1
Constant
Observations
R-squared/Pseudo
R-squared
*** p < 0.01, ** p < 0.05, * p < 0.1. Robust standard errors
in parentheses for OLS. Boostrapped standard errors for
quantile regressions (2000 replications). Column (7) shows
Wald test statistics for the equivalence of estimated
coefficients.