Viersen,+T

__Introduction__
For my two variable data analysis I choose to look at the correlation between fan attendance and winning percentage in Major League Baseball. Being one of my favourite sports I chose baseball because it is a sport that has by far the highest number of regular season games,so I wanted to analyze and see if there is a trend that shows bigger fan bases leading to more wins for the team.
 * Background Information**:


 * mlb fan attendance dropped more than 6 percent from 2007/2008 to 2008/2009
 * in 2008/2009 the mlb welcomed 73,364,441 fans into stadiums
 * the recession as well as over-priced seats are said to be the major factors
 * only 4 teams had a lower fan attendance average than the Blue Jays last season
 * VS**.

**Stage 1: Initial Proposal**

 * Does Fan Attendance in Major League Baseball affect the Team's Winning Percentage?**


 * Independant Variable**-Fan Attendance
 * Dependant Variable**-Winning Percentage

also the players will tend to be more motivated if there are fans supporting. I do believe there will be a strong correlation between these two variables because of these reasons.
 * Hypothesis:** I believe that in general fan attendance at major league baseball game will indeed affect the team's winning percentage, because most good teams will attract fans, and



Stage 2: Refinement of Stage One

 * Bias Present in Raw Data:**


 * Measurement Bias**-the only type of bias that could be argued based on the raw data is the fact that the data only represents two years. Using more then 2 years would've made the graphs and data sets huge which wouldve caused confusion when comparing and looking for trends. I do not believe that this sampling bias will effect the results because I believe that two years will represent most trends because in general the correlation between fan attendance and winning percentage will be fairly constant on a yearly basis.

Other then this there are really no other types of bias because this data is simply statistics about win percentage and fan attendance so these are facts and could've in no way been altered, coming from reliable sources. The sampling size was the entire league, and all extraneous variables must be included in order to represent every team in the league and how their win percentage affects fan attendance. Finally, as far as response, this is not a survey but simply raw statistics so no groups were over, or under represented.


 * Data Analysis Plan:**
 * Variable One:** The first variable (the independant variable) is fan attendance, which simply means the average number of fans in attendance at each game throughout the course of the regular season (162 games)


 * Variable Two:** The second variable (the dependant variable) is winning percentage, which is the percentage of games won by each team during the regular season.
 * One Variable Statistics:** I plan to use this data to identify important factors such as standard deviation, variance, etc. With these values I will be able to compare from year to year and analyze whether or not there is a difference between the values.


 * Two Variable Statistics:** Using the raw data collected in Stage 1 I will make graphs that compare first win percentage and fan attendance by year, individually. Secondly I will use a graph to compare both years on the same graph and analyze whether or not fan attendance does indeed affect winning percentage.

__Results and Analysis__

 * One Variable Data**
 * importance of mean, median, mode is limited because of sports league
 * something interesting is that in both years mean and median are right at 0.500 which is good in sports (even good teams to bad teams)
 * Standard Deviation for 2008-2009 Fan Attendance-8,492
 * Standard Deviation for 2009-2010 Fan Attendance-8,680
 * very low, usually is a larger number in sports such as NFL
 * fairly constant year to year, fairly low standard deviation which is good because it shows the fan attendance does not fluctuate hugely from team to team
 * Standard Deviation for winning percentage is below 0.100 for both years which only shows the range is reasonable (0.364-0.636) and (0.352-0.599)



ex. Los Angeles (09/10) -- Fan Attendance: 43,979, Winning Percentage: 0.494 Chicago Cubs (09/10) -- Fan Attendance: 37, 814, Winning Percentage: 0.463 Tampa Bay Rays (09/10) -- Fan Attendance: 23,025, Winning Percentage: 0.593 New York Mets (08/09) -- Fan Attendance: 38,942, Winning Percentage: 0.432 Flordia Marlins (08/09) -- Fan Attendance: 18,075, Winning Percentage: 0.537 -Most other large market teams attract fans and have good win percentages (Boston, New York Yankees) -Most other small market teams also don't attract fans but have sub-par win percentages.
 * Two Variable Data:**
 * definite trend with data -- as fan attendance increases, so does winning percentage for both years
 * this was expected because better teams normally draw more fans
 * the outliers were the most interesting here because often there were low win percentages but high fan attendance
 * Average Fan Attendance for 08/09: 30,317, Average Fan Attendance for 09/10: 30,034
 * larger markets always have fan support, (LA, Chicago, New York), all have below 0.500 win percentages, but have very high fan attendance
 * smaller markets often have little fan support regardless of their record (Tampa Bay, Florida) have win percentages above 0.500.

Conclusion

 * based on the graphs and two variable data analysis it is obvious that there is a medium to strong correlation between fan attendance and winning percentage, however more importantly there is a definite trend that shows as fan attendance increases, so does winning percentage.

> "We go out there and play hard for 162 games," Longoria said, "and for the fans to show the kind of support they're showing right now, you kind of wonder what else you have to do as a player." > "Had a chance to clinch a post season spot tonight with about 10,000 fans in the stands....embarrassing"**]** > > **[]** > >
 * Interesting Points:**
 * [Longoria's take, Via the St. Petersburg Times:**
 * Price said:**
 * some may argue that based on outliers that fan attendance does not affect regular season winning percentage (because of market size)
 * in 2009/2010 the Tampa Bay Rays had the best AL record, yet lost in the first series, losing all 3 home games
 * this, as well as players frustration shows that fan attendance may have a toll on postseason performance
 * one other example is the San Fransisco Giants winning the World Series last year as they were an average team, however their overwhelming fan support seemed to help them along in their run
 * [[image:SF-Giants-2010.PNG width="356" height="261"]]
 * so, although some outliers may indicate that fan attendance does not affect winning, these are some very important playoff instances that would agree that fan attendance does indeed affect winning.