banner



How To Plot Residuals On Ti 84

Scatterplot, Correlation, and Regression on TI-89

Copyright � 2007�2022 by Stan Brown, BrownMath.com

Summary: When you accept a set of (10,y) information points and want to find the best equation to describe them, yous are performing a regression. You will larn how to find the forcefulness of the association betwixt your two variables (correlation coefficient), and how to find the line of all-time fit (least squares regression line).

Usually you take some idea that your x variable tin can aid predict your y variable, and then you call x the explanatory variable and y the response variable. (Other names are independent variable and dependent variable.)

Contents:

  • Step 0. Setup
  • Pace i. Brand the Scatterplot
  • Step 2. Perform the Regression
    • Correlation Coefficient, r
    • Regression Line, ŷ =a x+b
    • Coefficient of Conclusion, R
  • Step 3. Brandish the Regression Line
  • Appendix: Display the Residuals
    • Residual Plot Showing Problems
    • avant-garde: Residuals and R
  • What�s New?

Footstep 0. Setup

Set floating bespeak manner, if you haven�t already. [Mode] [] [] [] [ Blastoff makes Eastward] [ENTER]

The calculator will recall this setting when you turn it off: next time you can start with Stride 1.

Pace 1. Brand the Scatterplot

Earlier you even run a regression, you should offset plot the points and see whether they seem to lie along a directly line. If the distribution is plain not a direct line, don�t do a linear regression. (Another form of regression might nevertheless be appropriate, but that is exterior the scope of this course.)

Let�s utilise this instance from Sullivan 2011 [total citation at https://BrownMath.com/swt/sources.htm#so_Sullivan2011], folio 179: the altitude a golf ball travels versus the speed with which the guild caput hit information technology.

Club-caput speed, mph (10) 100 102 103 101 105 100 99 105
Distance, yards (y) 257 264 274 266 277 263 258 275
Turn off other plots. [] [APPS] and select Stats/Listing Editor.

 [F2] [three] [F2] [iv] turns off all plots and functions.

Enter the numbers in two statistics lists. You lot volition use ii named lists for the 10�s and y�southward. Whatever names are possible, but I�ll use lx and ly considering they�re short. If those lists already exist, highlight the sixty proper name and press [Articulate] [ENTER] to erase previous entries. If threescore isn�t there yet, move to an empty list heading and press [L] [X]. (Fifty is above the 4 cardinal. When y'all press 4 while naming a list, it volition alter to L automatically.)

 Enter the 10 numbers, then clear listing ly (or create it) and enter the y numbers.

 Note: You tin can hide an unwanted list by cursoring to the listing proper name and pressing [ makes DEL]. The list remains in retentiveness until you use [ 2d makes VARLINK] to delete it.

Set upwards the scatterplot.
TI-89 setup for scatterplot
[F2] [1] [F1] opens a dialog box. You lot desire these settings:
  • Plot type: Scatter
  • Marking: anything except dot (considering a data dot looks only like a dot on the filigree)
  • X: [alpha] [L] [10]
  • Y: [alpha] [50] [Y]
  • Use Freq and categories: NO
Press [ENTER] to complete the definition.
Plot the points. [F5] automatically adjusts the window frame to fit the information.
(optional)You can adjust the grid to look meliorate.
TI-89 scatterplot, grid adjusted
[ F2 makes WINDOW], set Xscl=1 and Yscl=5, and then [ F3 makes GRAPH] to redisplay it.

 Appropriate values of Xscl and Yscl may be different for other issues. Pick the values that make the graph look best to you.

Check your data entry past tracing the points.
TI-89 scatterplot with trace for checking data entry
[F3] shows you the offset (x,y) pair, and so [] shows yous the others. They�re shown in the order you entered them, non necessarily from left to right.

scatterplot for this data set, showing axis labels and titles A scatterplot on paper needs labels (numbers) and titles on both axes; the x and y axes typically won�t showtime at 0. Here�s the plot for this data set. (The horizontal lines aren�t needed when you plot on graph paper.)

When the same (10,y) pair occurs multiple times, plot the second one slightly offset. This is called jitter. An example will be shown in class.

If the information points don�t seem to follow a directly line reasonably well, STOP! Your calculator will obey you if you lot tell it to perform a linear regression, but if the points don�t really fit a direct line then information technology�s a case of �garbage in, garbage out.�

For example, consider this example from DeVeaux, Velleman and Bock 2009 [total citation at https://BrownMath.com/swt/sources.htm#so_DeVeaux2009], folio 179. This is a table of recommended f/stops for diverse shutter speeds for a digital camera:

Shutter speed (x) i/1000 1/500 1/250 1/125 1/sixty 1/30 i/xv one/eight
f/stop (y) two.eight four five.6 8 xi 16 22 32

scatterplot of the above numbers, showing non-linear trend If you effort plotting these numbers yourself, enter the shutter speeds as fractions for accuracy: don�t convert them to decimals yourself. The computer will evidence y'all only a few decimal places, merely information technology maintains much greater precision internally.

Yous can come across from the plot at right that these information don�t fit a straight line. There is a distinct bend near the left. When you lot have anything with a curve or bend, linear regression is wrong. You tin try other forms of regression in your calculator�s menu, or you can transform the information as described in DeVeaux 2009 [full commendation at https://BrownMath.com/swt/sources.htm#so_DeVeaux2009], Chapter x, and other textbooks.

Step 2. Perform the Regression

Fix to calculate statistics. [] [APPS] and select Stats/Listing Editor.

TI-89 regression dialog box
[F4] [3] [2] brings up the LinReg(ax+b) dialog box. Y'all want these settings:
  • X list: [alpha] [L] [X]
  • Y list: [alpha] [L] [Y]
  • Store ReqE on to: [] and select y1(10)
  • Freq: ane
  • Category List: (exit bare)
  • Include Categories: (go out blank)

 Press [ENTER] to perform the regression and paste the regression equation into Y1.

TI-89 regression output screen Show your work! Write LinReg(ax+b) plus the two lists and the y-variable that yous�re using. Just �LinReg� isn�t enough.

Write down a (slope), b (y intercept), R� (coefficient of determination), and r (correlation coefficient). (Four decimal places for gradient and intercept, and two for r and R�, is a decent rule of thumb.)

a = three.1661, b = −55.7966

R� = 0.88, r = 0.94

Correlation Coefficient, r

scatterplots for various correlation coefficients

�Several sets of (ten,y) [pairs], with the correlation coefficient for each set. Note that correlation reflects the noisiness and management of a linear relationship (meridian row), only not the gradient of that relationship (eye), nor many aspects of nonlinear relationships (lesser).�
source: Wikipedia article

Look first at r, the coefficient of linear correlation. r can range from −1 to +1 and measures the strength of the clan between 10 and y. A positive correlation or positive association means that y tends to increment equally 10 increases, and a negative correlation or negative association means that y tends to decrease as 10 increases. The closer r is to 1 or −ane, the stronger the association. We ordinarily round r to two decimal places.

For real-world data, the 0.94 that we got is a pretty strong correlation. Just you might wonder whether there�due south actually an association betwixt club-caput speed and altitude traveled, as opposed to only an credible correlation in this sample. Decision Points for Correlation Coefficient shows you how to answer that question.

Be conscientious in your estimation! No matter how potent your r might be, say that changes in the y variable are associated with changes in the x variable, non �caused by� it. Correlation is not causation is your mantra.

It�s easy to think of associations where in that location is no cause. For case, if you make a scatterplot of Us cities with x as number of books in the public library and y as number of murders, you lot�ll see a positive association: number of murders tends to be higher in cities with more than library books. Does that mean that reading causes people to commit murder, or that murderers read more than than other people? Of course non! There is a lurking variable here: population of the city.

When you have a positive or negative association, there are four possibilities: x might cause changes in y, y might crusade changes in 10, lurking variables might cause changes in both, or it could just be coincidence, a random sample that happens to show a stiff clan fifty-fifty though the population does not.

correlation and causation cartoon
used past permission; source: https://imgs.xkcd.com/comics/correlation.png

Though nobody ever computes r by mitt whatever more than, the formula explains the properties of r. To compute r, find the z-scores of all the x�s and y�due south, multiply z x times z y for each information indicate, add upward all the products, and split up the full by due north−1. The second formula is equivalent but a little easier: Find the means and standard deviations of the ready of ten�s and the set of y�s. For each data point, multiply x by y . Add up those products and divide past north−1 times the standard deviations.

formulas for linear correlation coefficient

z-scores are pure numbers without units, and therefore r besides has no units. You can interchange the 10�due south and y�southward in the formula without changing the result, and therefore r is the aforementioned regardless of which variable is 10 and which is y.

Why is r positive when data points trend upwardly to the right and negative when they tendency downwards to the right? The product (10 )(y ) explains this. When points tendency upwards to the right, virtually are in the lower left and upper right quadrants of the plot. In the lower left, 10 and y are both beneath average, x and y are both negative, and the product is positive. In the upper correct, x and y are both above average, ten and y are both positive, and the product is positive. The production is positive for most points, and therefore r is positive when the trend is upward to the right.

On the other hand, if the data trend downwardly to the correct, most points are in the upper left (where x is below average and y is above average, x is negative, y is positive, and the product is negative) and the lower right (where 10 is positive, y is negative, and the product is negative.) Since the product is negative for virtually points, r is negative when data trend down to the right.

Regression Line, ŷ =a 10+b

Write the equation of the line using ŷ, non y, to indicate that this is a prediction, not actual measured data. b is the y intercept, and a is the slope. Nosotros�ll round both of them to four decimal places, so write the equation of the line as

ŷ = 3.1661x − 55.7966

(Don�t write 3.1661x + −55.7966.)

These numbers tin can be interpreted pretty hands. Business organization majors will recognize them every bit intercept = fixed cost and slope = variable cost, but you can interpret them in not-business organisation contexts merely also.

The gradient, a, tells how much ŷ increases or decreases for a one-unit increment in x . In this case, your interpretation is �the ball travels most an extra 3.17 yards when the social club speed is 1 mph greater.� The sign of a is always the same as the sign of r. (A negative slope would mean that ŷ decreases that many units for every one unit increment in x.)

The intercept, b, says where the regression line crosses the y axis: it�due south the value of ŷ when x is 0. Be careful! The y intercept may or may not be meaningful. In this instance, a social club-head speed of cipher is not meaningful. In general, when the measured x values don�t include 0 or don�t at least come pretty close to it, you lot can�t assign a real-globe interpretation to the intercept. In this example you�d say something like �the intercept of −55.7966 has no physical interpretation because a guild-head speed of zippo is meaningless for striking a golf brawl.�

Hither�s an instance where the y intercept does have a physical meaning. Suppose you measure out the gross weight of a UPS truck (y) with various numbers of packages (x) in it, and you get the regression equation ŷ = two.17ten+2463. The slope, 2.17, is the average weight per package, and the y intercept, 2463, is the weight of the empty truck.

The slope (a or m or b one) and y intercept (b or b 0) of the regression line can be calculated from formulas, if y'all have a lot of time on your hands:

standard regression equations

For the meaning of , see ∑ Means Add �em Upwardly.

Traditionally, calculus is used to come upwardly with those equations, but all that�s really necessary is some algebra. See Least Squares � the Gory Details if you�d like to know more than.

The second formula for the gradient is kind of slap-up considering it connects the slope, the correlation coefficient, and the SD of the two variables.

Coefficient of Conclusion, R

The last number nosotros expect at (third on the screen) is R�, the coefficient of determination. (The calculator displays r�, but the majuscule is standard notation.) R� measures the quality of the regression line equally a means of predicting ŷ from 10: the closer R� is to 1, the ameliorate the line. Some other way to look at it is that R� measures how much of the full variation in y is predicted by the line.

In this example R� is about 0.88, so your estimation is �almost 88% of the variation in distance traveled is associated with variation in lodge-head speed.� Statisticians say that R� tells yous how much of the variation in y is �explained� by variation in x, but if y'all utilise that discussion remember that it means a numerical association, not necessarily a cause-and-effect explanation. Information technology�southward all-time to stick with �associated� unless yous have done an experiment to show that in that location is cause and effect.

There�s a subtle difference between r and R�, so keep your interpretations straight. r talks about the strength of the association between the variables; R� talks about what part of the variation in the y variable is associated with variation in the 10 variable. Your estimation of R� should not use whatsoever course of the give-and-take �correlated�.

Only linear regression volition have a correlation coefficient r, merely any type of regression � fitting any line or curve to a fix of data points � volition have a coefficient of decision R� that tells you how well the regression equation predicts y from the contained variable(southward). Steve Simon gives an example for non-linear regression in R-squared.

Step iii. Display the Regression Line

Show line with original data points.
TI-89 plotted points and regression line
[ F3 makes GRAPH]

What is this line, exactly? It�due south the one unique line that fits the plotted points best. Simply what does �best� mean?

the same four points with bad and good regression lines

The same four points on left and correct. The vertical distance from each measured data point to the line, yŷ, is called the residual for that x value. The line on the correct is ameliorate because the residuals are smaller.
source: Dabes & Janik [full commendation at https://BrownMath.com/swt/sources.htm#so_Dabes1999]

For each plotted point, there is a remainder equal to yŷ, the difference betwixt the bodily measured y for that x and the value predicted by the line. Residuals are positive if the information signal is above the line, or negative if the data point is below the line.

You can recall of the residuals as measures of how bad the line is at prediction, so you lot want them small-scale. For whatever possible line, there�s a �total badness� equal to taking all the residuals, squaring them, and adding them upward. The least squares regression line means the line that is best because it has less of this �total badness� than any other possible line. Obviously you�re not going to try unlike lines and make those calculations, considering the formulas built into your calculator guarantee that at that place�s one best line and this is information technology.

Run across also: One time you have the regression line, you lot can use the calculator to predict the y value for any x in the model.

Appendix: Display the Residuals

Some profs want you to plot or compute residuals, and some don�t. Even if your prof doesn�t require this, information technology�due south good to plot the residuals anyway, because that�s an important bank check on whether the linear model is actually a good option for your data set. If you need to calculate individual residuals, encounter the concluding department of How to Find ŷ from a Regression on TI-89.

�No regression analysis is complete without a display of the residuals to check that the linear model is reasonable.�

DeVeaux 2009 [full citation at https://BrownMath.com/swt/sources.htm#so_DeVeaux2009], page 227

The residuals are automatically calculated during the regression, and stored in a resid list in your Stats/List Editor. All you have to do is plot them on the y axis against your existing 10 data. This is an important concluding check on your model of the straight-line human relationship.

Yous desire the plot of residuals versus x to exist �the well-nigh boring scatterplot you lot�ve always seen�, in De Veaux�s words (page 203). �Information technology shouldn�t take any interesting features, like a direction or shape. Information technology should stretch horizontally, with about the same amount of besprinkle throughout. Information technology should show no bends, and it should have no outliers. If you see any of these features, find out what the regression model missed.�

Don�t worry near the size of the residuals, because [ZOOM] [nine] adjusts the vertical scale and so that they take up the total screen.

If the residuals are more or less evenly distributed above and below the axis and show no particular trend, you were probably correct to choose linear regression. But if there is a trend, you lot have probably forced a linear regression on non-linear information. If your data points looked similar they fit a straight line only the residuals evidence a trend, it probably means that you took data forth a small function of a bend.

Here there is no curve and there are no outliers. The scatter is pretty consistent from left to right, so you conclude that distance traveled versus club-head speed really does fit the direct-line model.

Residual Plot Showing Problems

Refer dorsum to the scatterplot of f/stop against shutter speed. I said and so that it was not a straight residuals plot for digital camera data from Step 2 line, and so you could not do a linear regression. If you lot missed the bend in the scatterplot and did a regression anyway, you�d go a correlation coefficient of r = 0.98, which would encourage y'all to rely on the bad regression. Just plotting the residuals (at correct) makes it crystal clear that linear regression is the wrong type for this data prepare.

This is a textbook case (which is why it was in a textbook): there�s a clear curve with a bend, variation on both sides of the x axis is not consistent, and there�s even a likely outlier.

advanced: Residuals and R

I said in Stride 2 that the coefficient of determination measures the variation in the measured y associated with the measured x. At present that nosotros take the residuals, we can make that argument more precise and perhaps a picayune easier to understand.

The set of measured y values has a spread, which can be measured by the standard deviation or the variance. It turns out to be useful to consider the variation in y�s as their variance. (You recollect that the variance is the foursquare of the standard deviation.)

The full variance of the measured y�s has two components: the so-called �explained� variation, which is the variation along the regression line, and the �unexplained� variation, which is the variation away from the regression line. The �explained� variation is just the variance of the ŷ�s, calculating ŷ for every x, and the �unexplained� variation is the variance of the residuals. Those two must add up to the total variance of the measured y�s, which means that if we express them as percentages of the variation in y so the percentages must add to 100%. So R� is the percent of �explained� variation in the regression, and 100%−R� is the percent of �unexplained� variation.

variance of y-hat over variance of y equals 100% times R squared and variance of residuals over variance of y equals 100% times R squared

Now I can restate what you learned in Footstep two. R� is 88% considering 88% of the variance in y is associated with the regression line, and the other 12% must therefore be the variance in the residuals. This isn�t difficult to verify: do a 1-VarStats on the list of measured y�s and foursquare the standard difference to get the full variance in y, south y  = 59.93. Then do 1-VarStats on the residuals list and foursquare the standard deviation to become the �unexplained� variance, seast = 7.12. The ratio of those is 7.12/59.93 = 0.12, which is 1−R�. Expressing it as a per centum gives 100%−R� = 12% so 12% of the variation in measured y�due south is �unexplained� (due to lurking variables, measurement error, etc.).

What�southward New?

  • 19 Nov 2021: Updated links hither and here.
  • 17 Nov 2020:

    Updated obsolete language nearly form requirements in the Appendix.

    Converted page from HTML four.01 to HTML5, and italicized the variable names.

  • xxx Dec 2015:

    Rewrote the formula for the regression line�southward y intercept to a more computationally-friendly form.

    Added an culling formula for gradient, linking it to the correlation coefficient and the SD of the two variables.

    Added a pointer to ∑ notation.

  • (intervening changes suppressed)
  • xvi June 2007: New article.

Source: https://brownmath.com/ti83/regres89.htm

0 Response to "How To Plot Residuals On Ti 84"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel