Measuring Performance of Distance-based Regression for Skewed Data

Authors

  • Nor Hisham Haron
  • Nor Aishah Ahad
  • Nor Idayu Mahat

Abstract

Cuadras introduced a Distance-based regression (DBR) in 1990as an unbiased regression model that suitable to use in mixed-type of independent variables. DBR is similar to classical linear regression (CLR), but it utilizes distance measures as independent variables instead of raw values. Earlier study on DBR has limited the focus on understanding the performance of DBR when the data are normally distributed, hence it performances in skewed data remain questionable. This study attempts to answer such question by comparing the performance of DBR with bootstrap linear regression (BLR), in simulated data sets, which contain either continuous independent variables or mixed type of independent variables where residuals were set to follow gamma distribution. The simulations consider the number of sample size, n and number of independent variables, p. Small (n = 10), medium (n = 40) and large (n = 100) with p = 2 and p = 3.The investigation was set up in a simulation study, aiming to compare the performance of DBR over BLR based on the value of adjusted R-square (adjR2), Bayesian information criterion (BIC) and power. Power is the percentage of p-value for the model that less than 5% significance level. The main objective for this study is to see in what circumstances DBR is suitable to use.  The findings indicate that DBR performed better thanBLR in all cases of numerical independent variables and mixed-type of independent variables. We also found that DBR performed better across all tested sizes of sample.

Downloads

Published

2019-11-25

Issue

Section

Articles