what are the 2 types of variables in this unit | explanatory variable (ev). independant
response variable (rv). dependant |
what is association
and how is it displayed | a link between two variables
two way frequency table |
segmented bar charts use and composition | when there are multiple categories on the x axis.
persentage on y axis (0-100%) , catagories on x
multiple segments to =100% |
how to identify association
scatterplot and normal | scatterplot- no pattern =no association
pattern =association
table: use percentages, 5%+ difference = association |
there is an association on a scatter point, describe 4 factors of the patterns. these are? | direction - positive negative
strength - strong, moderate, weak
outliers - if present
form - linear, non linear (curve) |
what is the correlation coefficient and it values | the strength of a linear relationship
represented by 'r'
closer to -/+ 1, the more linear and straight the data lies
0 = no association.
exactly +/-1 =perfect linear association |
strengths classifications of linear associations (numbers intervals indicating strong weak strength) | remember 0.75, 0.5, 0.25
0.75+=strong ,0.5 +=moderate, 0.25+=weak, any less= no association |
what 3 assumptions are being made when using correlation coefficient | variables are numerical
association is linear
no outliers in data |
what is the coefficient of determination | the accuracy at which we can predict one variable using the other
represented by r^2 (always between 0-1)
because if r=1 and the graph is perfectly linear we can predict variables with complete accuracy.
ie. given hight we can predict weight to r^2 % |
when interpreting the coefficient of determination we use the phrase: | r^2% of the variation in the response variable can be explained by the explanatory variable - put on note sheet for part b |
not a question | correlation tells you about the strength of the association, but nothing about the source or cause of the association.
an example is the association between use of sunscreen and presents of heat stroke, heatstroke and sunscreen do not cause each other.
therefore correlation does not imply causality |
linear regression | placing a straight line on a data set |
least squares regression | a line where the sum of the regressions ( difference between predicted value and actual value) is the least possible.
equation is y=ax+b |
interpolation | predictions within the data range |
extrapolation | predictions outside the data range |
residual | residual data= actual data (y)- predicted data (ÿ). (can be +,-,0)
predicted value can be found by substituting a known x value into the least squares regression formula. |
residual graph layout | residual on y axis, -,0,+
x axis pertruding from 0
lack of a clear pattern confirms a linear association. |