Sunday, October 30, 2016

Regression Analysis to Estimate Missing Values

The availability of data for any type of analysis is necessarily potentially limited, but fortunately there are regression analyses with which one can extrapolate numbers with which to estimate missing values.  One relatively simple method for calculating possible values for those missing is the use of a regression line, which can be used to answer the question "what is the most likely value for Y if given a value for X."  One makes the assumption here that the trend in the data will be the same for the missing values, and that the relationship between X and Y will remain constant.

Year Station B x Station A y
1931 1005.84 1131.97
1932 1148.08 1269.09
1933 691.39 828.84
1934 1328.25 1442.78
1935 1042.42 1167.23
1936 1502.41 1610.67
1937 1027.18 1152.54
1938 995.93 1122.42
1939 1323.59 1438.29
1940 946.19 1074.47
1941 989.58 1116.30
1942 1124.60 1246.45
1943 955.04 1083.00
1944 1215.64 1334.22
1945 1418.22 1529.50
1946 1323.34 1438.04
1947 1391.75 1503.98
1948 1338.97 1453.11
1949 1204.47 1323.45
  
The above table has rainfall values for two weather stations for the years 1931 through 1949.  The values for Station A in this year range were found using the slope and Y-intercept for the relationship between the stations using the values from 1949-2004.  The formula for the regression line is y = bx + a, where b is the slope, a is the Y-intercept, and x is the given value for Station B for that year.  

No comments:

Post a Comment