Applying multiple regression analysis to my data:

 

Is this valid?  Does the output tell me anything useful about the interrelationship between dyslexia and academic confidence?

 

Tops, W., Callens, M., Lammertyn, J., van Huus, V., Brysbaert, M., 2012, Identifying students with dyslexia in higher education, Annals of Dyslexia, 62(3), 186-203.

 

The paper recently digested (above) which presents research that took place recently at a Belgian university to explore ways to better identify students with dyslexia caused me to reflect on the process of multiple regression.

Although I had been aware of the power of multiple regression as a prediction tool I had not considered how it may be useful for gaining a further understanding about my data.

We understand that multiple regression requires the determination of  one dependent variable to explore in relation to multiple independent variables with a view to generating a model that might predict a value output for that dependent variable based on multiple inputs that is a better predictor than the mean model, I want to consider how the multiple dimension model of Dyslexia Index that my project has generated might be linked as input (i.e. independent) variables to Academic Behavioural Confidence output.

Laerd Statistics has guided me through the process of setting up and running a multiple regression in SPSS and to become acquainted with the procedure, I have run the regression on the complete datapool. The regression appears valid and meaningful and the results are reported below.

But my interest is in using the multiple regression process to add substance to my argument that students with previously unidentified apparently dyslexia-like study characteristics present a higher level of Academic Behavioural Confidence than their dyslexia-identified peers.

So my plan is to run the regression again but on the split-group datapool. In this way, distinct regression models will be produced for each of my main research subgroups: ND, and DI – students with no disclosed or declared dyslexia and students with identified dyslexia respectively.

Thence, by using just the model generated by the subgroup of dyslexic students, use this as predictor for the ABC scores of students in the subgroup of non-dyslexic students but who have presented a Dyslexia Index Profile of Dx > 592.5, which I have set as the critical point identifier for establishing the real research subgroup of interest, students with unidentified dyslexia-like attributes (research subgroup DNI).

I can then use the model to calculate the expected ABC values for each of the students in this research subgroup to see how this compares against the actual ABC value that has been measured using the eQNR. Then I can think about what this may be telling me :-/

I am guided that these are the assumptions that must apply in order for a multiple regression analysis output to be appropriate and meaningful:

  1. I have continuous dependent variable – I do, it is Academic Behavioural Confidence and I have measured it across a continuous scale 0 – 100;
  2. I have two or more continuous independent variables – I do, these are the 20 dimensions of dyslexia that together constitute my Dyslexia Index, Dx;
  3. I have independence of observations (i.e. independence of residuals) – SPSS checks this using the Durbin-Watson statistic and apparently an ‘ideal’ value is close to 2.
  4. There needs to be a linear relationship between both a) the dependent variable and each independent variable AND ALSO b) between the dependent variable and the independent variables collectively. This is checked in SPSS by plotting the studentized residuals against the (unstandaradized) predicted values – exactly what these are is not important to understand at this stage but I will explore the theory behind this later;
  5. The data needs to show homoscedasticity of residuals  – which means that the variances along the line of best fit remain broadly similar;
  6. My data must NOT show multicollinearity – this is where two (or more) variables are highly correlated with each other. The test for this is by inspecting the matrix of correlation coefficients, looking for any values of > 0.7 (these would be indicating multicollinearity) and by inspecting the Tolerance/VIF values where we are looking for a value of < 0.1 for the tolerance (equivalent to VIF > 10 – they are reciprocals of each other) and if these occur, there exists an element of collinearity that needs dealing with;
  7. The data should present no significant outliers, high leverage points or highly influential points and these are all classifications of the effects that unusual points can have on the regression output. SPSS provides guidance about how to identify and deal with these cases should they occur.
  8. We need the distribution of residuals – that is, errors – to be approximately normally distributed. SPSS helps with this by mapping out a histogram with a superimposed normal curve, but also presents a P-P plot (too complicated and boring to explain here).

So that’s it. What follows first is the output and reporting for the second multiple regression run where I instructed SPSS to use the split group data. Assumptions 1 & 2 both apply and so reported is the output results for each subgroup commencing with Assumption 3:

 

Determining how well the model fits the data:

The model summary table presented by SPSS indicated a moderate to strong correlation of 0.738 between the scores predicted by the correlation model and the actual datapoints.

The corresponding coefficient of determination, which measures the proportion of variance in the dependent variable which is explained by the independent variables, was 0.545, indicating that 54.5% of the variance in Academic Behavioural Confidence in this sample is explained by the addition of all the independent variables into the regression model. The Adjusted R Square value, which provides an estimate of the proportion of variance in the dependent variable which might be expected in the background population was 0.427 (that is, 42.7% of the ABC variance) and this is also an indication of the EFFECT SIZE.

The model summary table presented by SPSS indicated a moderate to strong correlation of 0.721 between the scores predicted by the correlation model and the actual datapoints.

The corresponding coefficient of determination, which measures the proportion of variance in the dependent variable which is explained by the independent variables, was 0.520, indicating that 52.0% of the variance in Academic Behavioural Confidence in this sample is explained by the addition of all the independent variables into the regression model. The Adjusted R Square value, which provides an estimate of the proportion of variance in the dependent variable which might be expected in the background population was 0.316 (that is, 31.6% of the ABC variance) and this is also an indication of the EFFECT SIZE.

 

Thus the multiple regression I have conducted through SPSS on the split-group datapool appears to be strong and valid.

 

Prediction Model

These are the linear regression model equations which can be used to predict values of Academic Behavioural Confidence based on inputs from Dyslexia Index dimensions 1 -> 20.

 

Research Subgroup DI:

ABC = 53.332 + 0.074(Dx01) – 0.080(Dx02) – 0.017(Dx03) + 0.043(Dx04) + 0.170(Dx05) – 0.144(Dx06) + 0.058(Dx07) – 0.018(Dx08) + 0.110(Dx09) + 0.122(Dx10) + 0.046(Dx11) + 0.001(Dx12) + 0.071(Dx13) – 0.046(Dx14) + 0.006(Dx15) – 0.050(Dx16) + 0.003(Dx17) + 0.005(Dx18) – 0.168(Dx19) – 0.028(Dx20)

 

Research Subgroup ND:

ABC = 58.432 – 0.037(Dx01) – 0.071(Dx02) + 0.036(Dx03) + 0.020(Dx04) + 0.162(Dx05) – 0.085(Dx06) + 0.095(Dx07) – 0.136(Dx08) -0.013(Dx09) – 0.040(Dx10) + 0.015(Dx11) + 0.027(Dx12) + 0.066(Dx13) – 0.039(Dx14) +0.029(Dx15) + 0.079(Dx16) + 0.077(Dx17) + 0.045(Dx18) – 0.107(Dx19) + 0.008(Dx20)

 

The next (exciting!) step is to apply the Regresion Model for research subgroup DI to students in research subgroup DNI – that is, students in a subgroup of research subgroup ND who presented a Dyslexia Index of Dx > 592.5 – and compare the outcomes with the ACTUAL Academic Behavioural Confidence values recorded by each of these respondents in the QNR returns. The table below presents the results:

 

 

Leave a Reply