Hogg,+R


 * Commercial Software Price and Employment Trends: **

= How Employment in Software Development Affects the Price of Software in Canada  = =Robert Hogg =

PLEASE VIEW USING INTERNET EXPLORER, SOME ELEMENTS OF PAGE MAY BE DISTORTED IF VIEWED IN FIREFOX. Also, be warned that this site can be a bit of a hardware hog. My apologies! Note: if the computer tells you to "Stop running a script or your computer may become unresponsive," please disregard this warning. Click [|bob hogg presentation upload.ppt] to view the PowerPoint. = ** Background: ** = Industry insiders have been saying for years that the price of developing software is increasing dangerously quickly. They have posited that this is the biggest threat their industry faces, and that it will affect everyone quickly and painfully. Some of this is true. As the development costs of software rise, so too does its retail prices. In time, this //will// make software unaffordable for the average user. However, business and commercial software need not necessarily follow this trend. The base technology that drives successful businesses is not complicated and has not really changed in many years. With this in mind, it stands to reason that typical business and commercial software should be retailing at a lower price than they have in the past. They do not require flashy effects and graphical flourishes to succeed; all they need is an easy-to-use GUI (graphical user interface), compatibility with other popular software programs, and the ability to do some relatively simple functions necessary for business administration.

For example, word processors must make business letters easy to write, spreadsheets must be able to display graphs of stock prices or sort employees by highest salary, and slideshow programs must make a company look attractive to investors. None of these tasks have been very difficult to program since the introduction of object-oriented programming and visual compilers; moreover, the base technology needed to do this has not changed much.

Thus, the biggest change has been in employment. Is the influx of new employees into the software industry helping it develop products at a quicker and less expensive pace, or is it dragging it down and leading to a collapse in the industry?

** Question: ** Are the changes in the retail price of commercial software over the last decade (2000 - 2009) related to the employment of I.T. professionals in the United States? If so, then to what degree are they related, what is the cause-and-effect or common-cause factor that links them (if applicable), and are U.S. employment statistics a valid indicator of price changes in software bought and sold (albeit not manufactured) in Canada?

If there is no or limited correlation between the software price index and the employment statistics, is it simply because they do not impact each other, or are there better indicators available, such as the changes in wages for workers, the popularity of open-source software, number of degrees granted in the field, or some other unseen factor?

** Hypothesis: ** I predict that there will be a strong negative correlation between the number of workers in the software industry and the price of commercial software in Canada. Moreover, it has predicted that the price of commercial software has declined as a result of more people quickly working to develop new products without having to spend time developing new features, and that we can continue to use the U.S.A.'s computer science employment statistics to provide an indication of what software prices will be like in Canada. Variables: ** Independent: ** Number of people employed in the software development industry, U.S.A., 2000 – 2009

This is the number of people working full- or part-time in the United States (but mostly full-time) as computer programmers, software engineers developing packaged software, or software engineers maintaining the software of corporations or governments. It does NOT include computer engineers (who design the hardware, like mice, keyboards and electronic components of computers), computing research scientists, or other such people.

** Dependent: ** Commercial Software Price Index, Canada, 2000 – 2009

The ** Commercial Software Price Index ** (CSPI) tracks “ the change in the purchase price of pre-packaged software typically bought by businesses and governments,” as described by Statistics Canada, who compiles the series on a monthly basis. This study will use annual averages, as only annual data is available for U.S. employment.

Note that the Statistics Canada measures the CSPI in Canadian dollars (a naturally fluctuating currency), rather than "constant 2000 dollars", or another static equivalent, so some change in the CSPI will be uncontrollable due to changes in currency value.


 * Research: ** To conduct preliminary research, I sought out employment statistics for the software industry from the United States’ Bureau of Labor Statistics.

The Bureau divides software industry workers into three categories: Computer Programmers, Computer Software Engineers (Applications) and Computer Software Engineers (Systems Software.) In general, applications engineers develop software for “general computer applications software or specialized utility programs” or “create or adapt customized applications for business and other organizations.” (United States Bureau of Labor Statistics), while systems engineers organize, maintain and develop computer systems for businesses to automate tasks such as payroll record keeping, parts ordering, etc. The role of programmers is usually to actually write the code that makes the programs run after software engineers have laid down the fundamental plans.

In the United States, current trends in the industry show a decrease in the number of computer programmers since 2000, along with an increase in the number of software engineers. Overall, around 17 000 computer programmer jobs were lost, and about 24 000 software engineer positions have been filled. While it is entirely possible that this is merely or partially a change in job title only and there has been little actual loss of employment for computer programmers, the Bureau has stated that many programming jobs have been outsourced to foreign countries, where labour is cheaper but still high quality. Overall, the industry is approximately 6% larger than at the start of the millennium.



I also performed research on the price of commercial software in Canada. Statistics Canada tracks this data as the ** Commercial Software Price Index. ** The CSPI tracks “ the change in the purchase price of pre-packaged software typically bought by businesses and governments,” as described by Statistics Canada’s web page. However, the CSPI also states that systems software is included in the Index.

Analysis of the CSPI revealed that, as anticipated, the price of software in Canada has gone down considerably since 2000. On average, the price has dropped consistently year after year between 2000 and 2007. From 2007 onwards, a slight rebound in price has occurred, although the prices have not approached their previous peak. The average price has dropped a total of $32.68 between 2000 and 2009. The most significant decrease was between 2002 and 2003, when the average price dropped $12; this is possibly the result of the corporate “dot-com crash” of 2000-2002.

To determine whether the CSPI and the United States’ employment statistics are related, I measured the correlation coefficient between the two. It was apparent upon first glance that while the CSPI is very strongly negatively correlated with the numbers of software engineers in the United States, it is also strongly positively correlated with the number of computer programmers. The number of programmers being positively correlated with the index, while completely opposing the hypothesis, is intuitive. It has been known for years that "[a]dding manpower to a late software project makes it later," a statement known as Brooks’ Law. (Fred Brooks is the author of // The Mystical Man-Moth, // a recognized book on the human elements of computer programming.) Moreover, the later a project becomes, the more its budget balloons, which raises its retail price. However, this does not recognize the impact software engineers have had on the CSPI. Most likely, the added organization and project leadership software engineers have come to display has partially negated the problems that using multiple programmers on one project had created: overlapping code, a lack of cooperation among co-workers, increased frequency of bugs in programs, and general software development anarchy. Corporations who encountered these sorts of inconveniences during the early days of computing eventually started assigning one highly-productive programmer to direct and control those below, and typically saw their costs go down as a result. These productive programmers were essentially the forerunners of modern software engineers. Regardless, that the number of programmers is positively correlated with the CPSI has an unfortunate effect on the total correlation: the correlation coefficient between the total U.S. software industry employment and the CSPI is -.78.


 * = == Software Development Industry Employment Statistics, U.S.A. vs. Commercial Software Price Index == ||
 * = === Software Development Category === ||= === Linear Correlation Coefficient (to 2 decimal places) === ||
 * = Computer Software Engineers, Applications ||= - 0.97 ||
 * = Computer Software Engineers, Systems Software ||= - 0.92 ||
 * = ====== ** Computer Programmers ** ====== ||= ====== ** + 0.93 ** ====== ||
 * = Total of all Categories ||= - 0.78 ||



By converting the linear correlation coefficients to coefficients of determination, it becomes clear that employment trends in the U.S. software industry can explain only 61% of the variation in the CSPI. This result is an anomaly, as approximately 85% of variation in the CSPI can be explained by the trends in systems software engineers and programmers and 94% can be explained by the employment of applications engineers. The standard error of the total category is about 9.64, indicating that the line of best fit is, on average, $10 inaccurate in its predictions.

This necessitated ungrouping the data before further analysis could be conducted. However, as analyzing multiple sets of variables is painful, and individually, the statistics would present only incomplete information on the overall state of the industry, incorporating all three categories of software developer employment statistics together but separately in one model became a necessity.

Overall Equation:
== CSPI = 113.6883192 + .000101746 Programmers - .000215579 Applications + 6.30455E-05 Systems [|ii] == Using a multiple-variable linear regression method, it becomes much simpler to predict change in the CSPI using American employment statistics. As shown, the value of the coefficient of determination using the new model is much higher than it was for any of those previously considered. The new model can accurately predict as much as 97% of the variation in the CSPI, a considerable improvement over the 61% of the previous model. Moreover, the standard error, at $2.77, is vastly more accurate as well. However, it is important to note that while this model proves it is effective to use software developer employment to monitor the price of pre-packaged programs, it is not necessarily effective for extrapolating the data to account for very low numbers of programmers as may happen in the future.

Click below to view regression statistics for this equation.

Note: If "confidence interval" is used, "confidence level" is meant.

Using a multiple-variable linear regression method, it becomes much simpler to predict change in the CSPI using American employment statistics. As shown, the value of the coefficient of determination using the new model is much higher than it was for any previously considered models. The new model can accurately predict as much as 97% of the variation in the CSPI, a considerable improvement over the 61% of the previous model. Moreover, the standard error, at $2.77, is vastly more accurate as well. However, it is important to note that while this model proves it is effective to use software developer employment to monitor the price of pre-packaged programs, it is not necessarily effective for extrapolating the data to account for very low numbers of programmers, as may be necessary in the future.

As previously stated, by no means does correlation absolutely mean causation exists. Here are several possible theories as to why the correlation between software industry employment and the C.S.P.I. exists.

1. ** Cause-and-Effect 1: ** The greater number of software engineers in the United States make it easier to produce and sell computer programs at lower prices.

2. ** Cause-and-Effect 2: ** The lesser numbers of computer programmers in the United States has reduced the cost of software by making it quicker and cheaper to produce.

3. ** Reverse Cause-and-Effect: ** The lower price of commercial software has increased sales and thus attracted more workers to the software development industry.

4. ** Common-Cause Factor: ** Varying external variables are causing a change in both the number of people employed in the software industry and in the CSPI.

5. ** Accidental Relationship: ** The C.S.P.I. and the number of jobs in the I.T. industry have both changed linearly over the years despite no factor linking them causally.

6. It is also possible that software companies have been pushing the MSRP’s of their products down even as the cost of developing them rises. While this, if true, could compromise the whole validity of this study, this seems highly unlikely as such reckless price cuts as have been seen over the last decade are not sustainable.

Theory One (which supports the hypothesis) seems likely for a couple of reasons beyond the obvious correlation. The proven success of the “senior programmer” methods mentioned earlier has shown that applying software engineering-like methods to program design has effective results. Second, competent businesses are always out to improve their methods, and so if software engineers were not having some quantifiable effect on making software inexpensive (and thus competitive) to sell, companies would not be hiring them in such high numbers.

Theory Two (which does not support the hypothesis, yet is beginning to look more and more correct) is also possible. Programmers will always be needed among software developers to complete tasks, but paying one cheap yet disciplined foreign worker to code a solution based on a pre-defined plan is far less expensive than employing multiple domestic workers to fight over their own ideas of how best to solve a problem. This process speeds development and keeps budgets down.

The reverse cause-and-effect scenario – while undeniably plausible – seems very unlikely, as other circumstances would be more likely to attract candidates. Whether or not the software sells and how much it costs will be irrelevant to the average, disaffected worker, who will likely be far more concerned about either hourly wages or the job market. A strong market for commercial software will always exist as long as a) rampant piracy does not become a problem and b) business continues to function much as it always has. Thus, young workers considering entering the software development industry will continue to do so because of all the jobs available. Moreover, the software prices are // Canadian //, and the workers are // American. // If any American workers even know what the CSPI is, they are most likely in the minority. American price statistics would thus be better at predicting the job market. Still, next to no software developers care about what the price of the software they are making is. Most likely, their highest priority at any given time is when they are going to get to stop working overtime.

The accidental relationship is possible, but highly unlikely since the variables are much related thematically. Unless a huge number of software companies have sprung up in the past decade or so, or revenue has skyrocketed, more workers (and older ones getting higher salaries) would doubtlessly strain the resources of companies and thus push up the price of software. Common-Cause Factors:

** Key Findings: **
=== · By using a multi-variable linear regression model, it is possible to accurately predict prices in the CSPI using American employment statistics. However, the model will likely be inaccurate when attempting to extrapolate for very low numbers of programmers, as may be necessary in coming decades. Additionally, as there are three independent variables in the model, using an inverse function to estimate the number of software developers for a certain value of the CSPI is impossible. ===

· The higher number of software engineers in the U.S. and the lower numbers of programmers are almost certainly a cause of the decrease in the Canadian Software Price Index.
=== · The number of doctoral computer science degrees granted in the U.S. per year is a possible common-cause factor in the relationship between the CSPI and the American job market. However, if it is so, the effect that it has is very small, and thus negligible. ===


 * Conclusions:  **

I have found my hypothesis to be mostly correct in terms of the relationship between the price of software in Canada and the employment of computer professionals in the United States. They are strongly correlated individually, and, when a multiple-variable linear regression model is applied, the number of computer programmers and software engineers in the U.S. makes an excellent model for the CSPI, but one that may become invalidated if the number of programmers in the U.S. shrinks dramatically. Most evidence points to the theory that an increase in the number of software engineers reduced the price of software. However, it also indicates that the fewer numbers of programmers also reduced the CSPI, opposing the hypothesis. I now find this element of my hypothesis to have been unfounded and based on a lack of proper background research. (I should have read //The Mystical Man-Moth// before I started this study.) Overall, though, it appears the business of computing in the United States is safe, and will continue to flourish just as they have for years.

**Appendix:**
 * There are numerous implications of the fact that the number of software developers in the United States affects the CSPI. One idea is that private enterprise in Canada will continue to benefit from low-cost, high-quality software and will be able to succeed in a time that is very difficult for everyone. More concerning, however, is the future of programmers in Canada. If the CSPI is strongly affected by American software employment, this implies that either a) software companies in Canada are keeping their prices low to compete with well-known American brands or, more worrisome, b) that Canadian business are simply buying all their products from said American brands. While in-house programmers and systems software engineers will always be needed to create custom solutions for companies that have unique needs, if this study proves correct, then the future of software development in Canada is at risk and we may see notable Canadian companies buckle under and become liquidated. Of course, there is still some life left in the Canadian high-technology industry, such as the burgeoning success of Research in Motion. However, the possibility that software development could entirely move to the United States is not just a concern for those of us looking for employment in the industry. Considering how many jobs the U.S. itself has outsourced (or “globalized” if you prefer the euphemism), to countries with low safety and workplace standards for workers, allowing jobs to leave for the U.S. is a tragedy of human dignity of epic proportions. The success of Canadian software companies is critical to ensuring that if foreign countries wish to become economic powerhouses, they will have to treat their workers with respect, just as software businesses in our country do. **

**I- Data Collection and Bias:**

Data taken from the Bureau of Labor Statistics' Occupational Employment Statistics (OES) Survey were not based on population data. Rather, it comes from a sample of select businesses in the United States of America (not counting the Virgin Islands, Guam or Puerto Rico). The Bureau of Labor Statistics compiled the report twice a year from 2003 to 2005 inclusive, once in May and once in November, from separate samples of 200,000 businesses. In all other years from which information was drawn, a sample of 400,000 businesses was performed once per year. From 2006 to 2008 inclusive, this time was identified as May; for years previous, time was not identified. Data for this study is always taken from the May surveys for years in which the Survey was conducted semi-annually. Taking an average of May and November values was not attempted. The population the Survey attempts to measure is drawn from State Unemployment Insurance claims. The Survey administered was voluntary response; response rates typically ranged in the area of the mid-70's. Response bias is possible, as employers may be somewhat ashamed of reporting low numbers of programmers (which could be seen as evidence that they are outsourcing jobs). The Survey is stratified by area, industry, and by size of organization: companies with 250 employees or more are far more likely to be surveyed than small businesses with only six or seven employees. Also, the Survey notes that, because the survey used the Standard Occupational Classification (SOC) system early on and the North American Industry Classification System (NAICS) in later years, some data is considered incomparable against earlier values. This warning, unfortunately, had to be ignored so I could obtain a larger sample size.

Data taken from the Commercial Software Price Index came from Statistics Canada, but the International Data Corporation in turn collected much of the data. The price index is composed of a weighted average of "application development and deployment, collaborative applications, content applications, engineering applications, Enterprise Resource Management (ERM) applications, security software storage software and system software", (Statistics Canada) with values for the weightings taken from the International Data Corporation on occasion as well. All prices taken are "street prices" and do not account for taxes, changing dollar value, or any shipping/handling charges. Identical products are identified and a price relative is calculated for all products. The geometric mean of these values is used to determine the index. A "representative sample" of software products is taken each year, and products that experience unnatural price fluctuation regularly are removed from the survey after successive years.


 * II: Supplementary Files: **

Excel Tables
 * [[file:Proposal.txt]]
 * [[file:A Brainstorm Session.doc]]
 * || ||[[file:Copy of commercial software price index.xls]]
 * ||= [[file:new - employment usa by year.xls]] ||
 * ||= [[file:Comparison.xls]] ||
 * ||= [[file:degrees.xls]] ||
 * ||= [[file:open source good.xls]] ||
 * For a dedicated look into how Canada's Consumer Software Price Index works, visit Statistics Canada's webpage.
 * To understand what is meant by the terms "Computer Programmers" or "Software Engineers" visit the United States' Bureau of Labor Statistics.

[i] The number of software developers employed in America is inconsistently represented between the Bureau of Labor Statistics’ Occupational Employment Handbook (OEH) and their Occupational Employment Statistics (OES) Survey, so predictions will be forecasted using data from the OES to maintain consistency with the remainder of the report. This is due to non-response bias occurring in the OES; only approximately 78.2% of establishments responded to surveys. Moreover, some sampling bias may be present as well; large corporations were more likely to be sampled than smaller ones. Larger corporations may be more likely to offshore jobs than smaller ones, possibly underestimating the number of programmers. The OEH actually states that the number of programmers in the United States will fall from 2008 values to 414,400 workers in 2018. However, this number is well above the number of programmers cited as working in 2008 in the OES (394,230).

[|ii] Note that the values for the slopes of the independent variables are chosen without regard to causality. I.e. the slope for the number of programmers is positive and the slope for the software engineers (applications) category is negative merely because this is useful for prediction. It does NOT actually indicate that programmers keep the cost of software up and software engineers bring it down.