Application of multivariate principal component analysis on dimensional reduction of milk composition variables
Abstract
Variable selection and dimension reduction are major prerequisites for reliable multivariate regression analysis. Most a times, many variables used as independent variables in a multiple regression display high degree of correlations. This problem is known as multicollinearity. Absence of multicollinearity is essential for multiple regression models, because parameters estimated using multi-collinear data are unstable and can change with slight change in data, hence are unreliable for predicting the future. This paper presents the application of Principal Component Analysis (PCA) on the dimension reduction of milk composition variables. The application of PCA successfully reduced the dimension of the milk composition variables, by grouping the 17 milk composition variables into five principal components (PCs) that were uncorrelated and independent of each other, and explained about 92.38% of the total variation in the milk composition variables.
References
Alphonsus C and Essien IC. 2012. The relationship estimates amongst milk yield and milk composition characteristics of Bunaji and Friesian × Bunaji cows. African Journal of Biotechnology. 11(36): 8790-8793.
Alphonsus C, Akpa GN, Nwagu BI, Barje PP, Orunmuyi M, Yashim SM, Zana M, Ayigun AE and Opoola E. 2013. Evaluation of Nutritional Status of Friesian x Bunaji Dairy Herd Based on Milk Composition Analysis .Journal of Animal Science Advances. 3(5): 219-225.
Bair Eric, Trevor Hastie, Paul Debashis and Robert Tibshirani. 2006. Prediction by supervised Principal Components. Journal of the American Statistical Association. 473 (19): 119-137.
Čejna V and Chládek G. 2005. The importance of monitoring changes in milk fat to protein ratio in Holstein cows during lactation. Journal of Central European Agriculture. 6: 539-545.
Fahey J. 2008. Milk protein percentage and dairy cow fertility. University of Melbourne, Department of Veterinary Science, VIAS, Sneydes Road 600, Werribee, Victoria, Australia.P.12. Web link: http://www.nhia.org.au/html/body_milk_protein__fertility.html 30/10/2008.
Fekedulegn DB, Colbert JJ, Hicks Jr RR and Schuckers ME. 2002. Coping with multicollinearity: An example on application of Principal Components Regression in Dendroecology. Research Paper NE-721, Newton Square PA: United States Department of Agriculture. Forest service. 1-48p Web link: www.fs.fed.us/ne/morgantown/4557/dendrochron/rpne721.pdf.
Friggens NC, Ridder C and Løvendahl P. 2007. On the use of milk composition measures to predict the energy balance of dairy cows. Journal of Dairy Science. 90(12): 5453-5467.
Hansen LB. 2000. Consequences of selection for milk yield from a geneticist's viewpoint. Journal of Dairy Science. 83(5): 1145-1450.
Harris BL and Pryce JE. 2004. Genetic and Phenotypic relationships between milk protein percentage, reproductive performance and body condition score in New Zealand dairy cattle. Proceeding of the New Zealand Society of Animal Production. 64: 127-131.
Ingvartsen KL, Dewhurst RJ and Friggens NC. 2003. On the relationship between lactational performance and health: Is it yield or metabolic imbalance that cause production diseases in dairy cattle? A position paper. Livestock Production Science. 83: 277–308.
Klainbaum DG, Kupper LL and Muller KE. 1998. Applied Regression Analysis and Multivariable Methods. 3rd Edition (Colle Pacific Grove, CA).
Kuterovac K, Balas S, Gantner V, Jovanovac S, Dakic A. 2005. Evaluation of nutritional status of dairy cows based on milk analysis results. Italian Journal of Animal Science. 4(3): 33-35.
Lafi SQ and Kaneene JB. 1992. An explanation of the use of principal component analysis to detect and correct for multicollinearity. Preventive Veterinary Medicine. 13 (4): 261-275.
Leahy K. 2001. Multicollinearity: When the solution is the problem. In Olivia Parr Rud (Ed.) Data Mining Cookbook (pp. 106 - 108). New York: John Wiley & Sons, Inc.
Løvendahl P, Ridder C and Friggens NC. 2010. Limits to prediction of energy balance from milk composition measures at individual cow level. Journal of Dairy Science. 93(5): 1998–2006.
Maitra S and Yan J. 2008. Principal Component Analysis and Partial Least Squares: two dimension reduction techniques for regression. Casualty Actuarial Society, Discussion paper program. pp.79-90
Oni OO, Adeyinka IA, Afolayan RA, Nwagu BI Malau-Aduli AEO, Alawa CBI and Lamidi OS. 2001. Relationships between milk yield, post partum body weight and reproductive performance in Friesian x Bunaji Cattle. Asian–Australian Journal of Animal Science. 14(11): 1505 – 1654.
Principal Component Analysis http://support.sas.com/publishing/publicat/chaps/55 )
Pryce JE, Royal PC, Garnsworthy and Mao IL. 2001. Fertility in the high-producing dairy cow. Livestock Production Science. 86(1-3): 125-135.
SAS. 2000. SAS User‟s Guide Version 8.1. Statistical Analysis system institute Inc, Cary, Nc, USA.
Vaughan TS and Berry KE. 2005. Using Monte Carlo techniques to demonstrate the meaning and implications of multicollinearity. Journal of Statistics Education, 13(1): 1-9. Web link: www.amstat.org/publications/jse/v13n1/vaughan.html
Yu CH. 2008. Multi-collinearity, variance inflation, and orthogonalization in regression. Web link: http://www.creative-wisdom.com/computer/sas/collinear.html
Yu CH. 2010. Checking assumptions in regression. Web link: http://www.creativewisdom.com/computer/sas/regression_assumption.html