## Creating Stacked Barplot and Grouped Barplot in R using Base Graphics (no ggplot2)

Stacked Barplots, or graphs that depict conditional distributions of data, are great for being able to see a level-wise breakdown of the data. Unfortunately, R has no easily built in functions for generating a stacked barplot. In fact, the majority of R users suggest creating stacked barplots using ggplot2 since it is easier and also looks a bit better. However, for those not yet aboard the ggplot2 train or for those that prefer base R, fear not! For we are about to embark on an adventure that will bring stacked barplots to base R as well as the ability to group bar plots together!

## Examining the Data

For this example, we will use data concerning web domains in different journals. The dataset looks like the following:

internetrefs
Domain Journal Count
gov NEJM 41
gov JAMA 103
gov Science 111
org NEJM 37
org JAMA 46
org Science 162
com NEJM 6
com JAMA 17
com Science 14
edu NEJM 4
edu JAMA 8
edu Science 47
other NEJM 9
other JAMA 15
other Science 52

Notice, the data has consistent order such that all Domain entries are grouped together and the Journal variable has a cyclic ordering where: NEJM, then JAMA, and finally Science appears. We will exploit this feature of the data. I will also provide a method that isolates the data when such feature does not exist.

### Creating the Matrix

In order to create a stacked bar plot, we must first know a variable's levels(). Levels is synonymous with factors that are associated with categorical (string) variables. The levels() function returns the unique values of the strings that a variable takes on. So:

# Attach object so that we can reference by Domain, Journal, and Count # Instead of internetrefs$Count attach(internetrefs) levels(Journal) ## [1] "JAMA" "NEJM" "Science"  levels(Domain) ## [1] "com" "edu" "gov" "org" "other"  Then, using the levels() information, the data must be transformed into a matrix structure. The matrix structure takes on the following form: the rows are representative of the levels() of the Domain variable and the columns represent the levels() of the Journal variable. We induce this by: ### USING THE PRE-EXISTING ORDER FEATURE OF THE DATA ### # Load the count values data = Count # Place data in a matrix that will have 3 columns since the number levels() # for Journal is 3. Also, based on the ordering feature of the data, load # the matrix such that we fill the matrix by row. data = matrix(data, ncol = 3, byrow = T) # Label the columns and rows colnames(data) = levels(Journal) rownames(data) = levels(Domain) The matrix structure of data is then: id JAMA NEJM Science com 41 103 111 edu 37 46 162 gov 6 17 14 org 4 8 47 other 9 15 52 But, let's say that you lack that nice feature of the data that we discussed earlier. One way to obtain it is by going through and ordering the columns of the initial dataframe: internetrefs_ordered = internetrefs[with(internetrefs, order(Domain, Journal)), ] id Domain Journal Count 8 com JAMA 17 7 com NEJM 6 9 com Science 14 11 edu JAMA 8 10 edu NEJM 4 12 edu Science 47 2 gov JAMA 103 1 gov NEJM 41 3 gov Science 111 5 org JAMA 46 4 org NEJM 37 6 org Science 162 14 other JAMA 15 13 other NEJM 9 15 other Science 52 From here, the problem then simplifies to the code used for the ordered data: ### AFTER CREATING THE ORDER FEATURE OF THE DATA ### # Access and load the count values data_ordered = internetrefs_ordered$Count   # Place data in a matrix that will have 3 columns since the number levels() # for Journal is 3. Also, based on the ordering feature of the data, load # the matrix such that we fill the matrix by row. data_ordered = matrix(data_ordered, ncol = 3, byrow = T)   # Label the columns and rows colnames(data_ordered) = levels(internetrefs_ordered$Journal) rownames(data_ordered) = levels(internetrefs_ordered$Domain)
id JAMA NEJM Science
com 41 103 111
edu 37 46 162
gov 6 17 14
org 4 8 47
other 9 15 52

## Building the Stacked Barplot

There are two ways to build a stacked barplot: percentage-based and counts-based

However, special care needs to be taken when including a legend. By default, barplot()'s legend generating capabilities are pretty lacking. As a result, one needs to modify the margin space and how clipping is handled. This is achieved by setting par():

# mar is defined to receive: c(bottom, left, top, right) . The default # margin is: c(5, 4, 4, 2) + 0.1 . As a result, we have exploded the # right-hand side of the figure to hold legend.   # xpd=TRUE forces all plotting to be clipped to the figure region par(mar = c(5.1, 4.1, 4.1, 7.1), xpd = TRUE)

To build percentage-based barplots, we must use prop.table() to generate each percentage for the columns:

# Here, margin represents whether it will be run on rows (1) or columns (2) # We&#39;ve selected to use prop.table() on columns since that was how we built # our data. prop = prop.table(data, margin = 2)

If we are looking to build a percentage-based vertical stacked barplot then:

par(mar = c(5.1, 4.1, 4.1, 7.1), xpd = TRUE) barplot(prop, col = heat.colors(length(rownames(prop))), width = 2) legend("topright", inset = c(-0.25, 0), fill = heat.colors(length(rownames(prop))), legend = rownames(data))

If we are looking to build a counts-based vertical stacked barplot then:

par(mar = c(5.1, 4.1, 4.1, 7.1), xpd = TRUE) barplot(data, col = heat.colors(length(rownames(data))), width = 2) legend("topright", inset = c(-0.25, 0), fill = heat.colors(length(rownames(data))), legend = rownames(data))

## Grouped Barplots

If we are looking to build a percentage-based horizontal grouped barplot then:

par(mar = c(5.1, 4.1, 4.1, 7.1), xpd = TRUE) barplot(prop, col = heat.colors(length(rownames(prop))), width = 2, beside = TRUE) legend("topright", inset = c(-0.25, 0), fill = heat.colors(length(rownames(prop))), legend = rownames(data))

If we are looking to build a counts-based horizontal group barplot then:

par(mar = c(5.1, 4.1, 4.1, 7.1), xpd = TRUE) barplot(data, col = heat.colors(length(rownames(data))), width = 2, beside = TRUE) legend("topright", inset = c(-0.25, 0), fill = heat.colors(length(rownames(data))), legend = rownames(data))

## In Summary

We've talked about a lot of different components required to pull off a stacked barplot and group bar plot in R. Below is the script that you will want to modify to suit your own data:

internetrefs = read.delim("~/Desktop/internetrefs.txt")   # Force Order data_ordered = internetrefs[with(internetrefs, order(Domain, Journal)), ]   # load the count values data = data_ordered$Count data = matrix(data, ncol = 3, byrow = T) colnames(data) = levels(data_ordered$Journal) rownames(data) = levels(data_ordered$Domain) prop = prop.table(data, margin = 2) par(mar = c(5.1, 4.1, 4.1, 7.1), xpd = TRUE) # Percent-based vertically stacked barplot barplot(prop, col = heat.colors(length(rownames(prop))), width = 2) legend("topright", inset = c(-0.25, 0), fill = heat.colors(length(rownames(prop))), legend = rownames(data)) # Percent-based grouped bar barplot barplot(prop, col = heat.colors(length(rownames(prop))), width = 2, beside = TRUE) legend("topright", inset = c(-0.25, 0), fill = heat.colors(length(rownames(prop))), legend = rownames(data)) # Counts-based vertically stacked barplot barplot(data, col = heat.colors(length(rownames(data))), width = 2) legend("topright", inset = c(-0.25, 0), fill = heat.colors(length(rownames(data))), legend = rownames(data)) # Counts-based grouped barplot barplot(data, col = heat.colors(length(rownames(data))), width = 2, beside = TRUE) legend("topright", inset = c(-0.25, 0), fill = heat.colors(length(rownames(data))), legend = rownames(data)) ### Thanks! Special thanks go out to Weihong Huang! ## Tukey Studentized Range and You! # Definition of Tukey's Test Statistic Suppose $r$ independent observations denoted: Let Now, suppose that we have an estimate $s^2$ of the variance $\sigma^2$ which is based on $v$ degrees of freedom and is independent of the $Y_i$ ($i = 1,\cdots,r$). $v$ is usually derived from analysis of variance. Then, the Tukey's Test Statistic is: # Ranges: Internalized Let $W=X_{n}-X_{1}$ Internally Studentized Range: Population $\sigma^2$ is unknown where $s=\left( \frac{1}{\left( {n - 1} \right)}{\sum\limits_{i = 1}^n {{{\left( {{X_i} - \bar X} \right)}^2}} } \right)^{1/2}$ # Ranges: Externalized Let $W=X_{n}-X_{1}$ Externally Studentized Range: Population $\sigma^2$ is unknown AND an independent estimator $s_{v}^2$ of $\sigma^2$ is available with degrees of freedom $v$. The dependence of the distribution $W$ on unknown $\sigma$ can be removed by studentization. So, $S_{v}^{2}$ is changable to $vS_{v}^{2}/\sigma^2 \sim \chi_{v}^2$, independent of $W$. # Ranges: Both Let $W=X_{n}-X_{1}$ Externally and Internally Studentized Range: where $\tilde{s} = \left( \frac{1}{\left( {n - 1 + v} \right)}{\sum\limits_{i = 1}^n {{{\left( {{X_i} - \bar X} \right)}^2}} } +vs_{v}^2 \right)^{}$ # Our use We will be using the Externally Studentized Range…. Specifically, the first definition resembles…. # Tukey's Studentized Range Test Statistic ANOVA Let ${Y_{ij}} \mathop \sim\limits^{iid} N\left( {0 ,{\sigma ^2}} \right)$, where $j=1,\cdots,n$ and $i=1,\cdots,k$ be independent observations in a balanced one-way ANOVA with $k$ treatments. Then $\bar{Y}_{1},\cdots,\bar{Y}_{k}$ are the sample averages and $S^2$ is the independent and unbiased estimator of $\sigma^2$ based on $v=k\left({n-1}\right)$. Let $W$ be the range of $\bar{Y}_i$. # Example - Data Consider a dosage regiment with repeated measures: Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Row Total Dose 1 27.00 26.20 28.80 33.50 28.80 144.30 Dose 2 22.80 23.10 27.70 27.60 24.00 125.20 Dose 3 21.90 23.40 20.10 27.80 19.30 112.50 Dose 4 23.50 19.60 23.70 20.80 23.90 111.50 Col Total 95.20 92.30 100.30 109.70 96.00 493.50 # In R (dose_type = c(rep("1", 5), rep("2", 5), rep("3", 5), rep("4", 5))) ## [1] "1" "1" "1" "1" "1" "2" "2" "2" "2" "2" "3" "3" "3" "3" "3" "4" "4" ## [18] "4" "4" "4"  samples = c(data[1, ], data[2, ], data[3, ], data[4, ]) ## [1] 27.0 26.2 28.8 33.5 28.8 22.8 23.1 27.7 27.6 24.0 21.9 23.4 20.1 27.8 ## [15] 19.3 23.5 19.6 23.7 20.8 23.9  # ANOVA Table: model = glm(samples ~ factor(dose_type)) aov(model) ## Call: ## aov(formula = model) ## ## Terms: ## factor(dose_type) Residuals ## Sum of Squares 140.1 116.3 ## Deg. of Freedom 3 16 ## ## Residual standard error: 2.696 ## Estimated effects may be unbalanced # Tukey's 95% C.I. pairwise comparison process All pairwise differences $\mu_{i}-\mu_{j}$ are given by $100 \times \left({1-\alpha} \right)$% C.I. Note: Always start with the largest mean and smallest mean pair, if the result is not significant, then the result will hold for all means between the largest and smallest. # Largest vs. Smallest $q_{.05 ;4,20 - 4}=qtukey(0.95,4,20-4)=$ 4.0461 $6.56 \pm 4.8789$ # In R: TukeyHSD(aov(model)) ## Tukey multiple comparisons of means ## 95% family-wise confidence level ## ## Fit: aov(formula = model) ## ##$factor(dose_type) ## diff lwr upr p adj ## 2-1 -3.82 -8.699 1.059 0.1546 ## 3-1 -6.36 -11.239 -1.481 0.0089 ## 4-1 -6.56 -11.439 -1.681 0.0070 ## 3-2 -2.54 -7.419 2.339 0.4662 ## 4-2 -2.74 -7.619 2.139 0.4027 ## 4-3 -0.20 -5.079 4.679 0.9994

So, only 3-1 and 4-1 are significant.

That's All folks!

# References

• David, H. A.. “Studentized Range.” Encyclopedia of statistical sciences. New York: Wiley, 2006. 1-3. Print.
• Falk, Michael, and Frank Marohn. “The One-Way Analysis of Variance.” Foundations of statistical analyses and applications with SAS. Basel: Birkhauser Verlag, 2002. 193-194. Print.
• Harter, H. Leon, and N. Balakrishnan. “The Studentized Range of Samples from a Normal Population.” Tables for the use of range and studentized range in tests of hypotheses. Boca Raton, Fla.: CRC Press, 1998. 52-53. Print.
• Hochberg, Yosef. “Studentized Range.” Encyclopedia of Biostatistics. Hoboken, NJ: John Wiley & Sons, Ltd, 2005. 1-3. Print.
• Lowry, Richard. “Ch 14: One-Way Analysis of Variance for Independent Samples.” Concepts & Applications of Inferential Statistics. 2000. 1-3. Print.

# Alternative way to write the Tukey's Test Statistic

Let ${X_1}, \cdots ,{X_m} \mathop \sim\limits^{iid} N\left( {0 ,{\sigma ^2}} \right)$, where $n \ge 2$, and let $Z$ be $\chi^{2}$ with $n$ degrees of freedom.

In the case of $m=2$, $q$ closely resembles the two sample $t$ test statistic.

That is, $X_{1}$ and $X_{2}$ are are taken to be the standardized sample means of the two samples and $Z/n$ is the pooled sample variance, $S_p^2 = \frac{{S_1^2\left( {{n_1} - 1} \right) + S_2^2\left( {{n_2} - 1} \right)}}{{{n_1} + {n_2} - 2}},{\text{ }}{S_E} = {S_p}\sqrt {\frac{1}{{{n_1}}} + \frac{1}{{{n_2}}}}$

## Installing the Open Source Shiny Server on Dreamhost VPS / Debian

So, you've decided you want to make your shiny application available to the world? Good news! You can do that easily with a VPS or a dedicated server, since shared hosting will not allow you to have the necessary shell access to complete the installation. One day, RStudio may open up the Shiny cloud. Until then, I am using Dreamhost's VPS to house my shiny creations. These are the steps that I took:

## Special Install Instructions for Dreamhost VPS!

For those using Dreamhost VPS, note that the OS is Debian Squeeze (6.0.x) and NOT Debian Wheezy (7.~)! As a result, a modification to the /etc/apt/sources.list must be made:

#Make /etc/apt/sources.list writable sudo touch /etc/apt/sources.list sudo chmod a+w /etc/apt/sources.list #Opens VIM to edit the file vim /etc/apt/sources.list #Add to the last line of /etc/apt/sources.list deb http://cran.wustl.edu/bin/linux/debian squeeze-cran3/ #If you are far away from wustl, view the list of cran mirros to choose from here: http://cran.r-project.org/mirrors.html #To exit out of vim use :wq

Back in terminal execute the following:

#Add the key signed by Debian backports archives on CRAN apt-key adv --keyserver keys.gnupg.net --recv-key 381BA480

End Dreamhost Specific Install Instructions

## General Install Guidelines:

#Download newest information on the package lists sudo apt-get update #Install the latest and greatest r/r-dev. sudo apt-get install r-base r-base-dev #Make sure we are fully up-to date by fetching new versions of existing packages sudo apt-get upgrade
#Install Shiny package in R terminal sudo su - \ -c "R -e \"install.packages('shiny', repos='http://cran.rstudio.com/')\""   #Error came up? See below for specifics!   #Install gdebi sudo apt-get install gdebi-core #Download the prebuilt server image wget http://download3.rstudio.org/ubuntu-12.04/x86_64/shiny-server-1.0.0.42-amd64.deb #Install image sudo gdebi shiny-server-1.0.0.42-amd64.deb

Assuming there are no errors, you are done! But, alas, life isn't so simple. So, here are a few errors I ran into.

## Errors. Errors Where art thou?

Do NOT open R without sudo! Using regular R will give:

#In terminal, open r without sudo! R

Inside r:

install.packages('shiny',repos='http://cran.rstudio.com') Installing package into '/usr/local/lib/R/site-library' (as 'lib' is unspecified) Warning in install.packages("shiny", repos = "http://cran.rstudio.com") : 'lib = "/usr/local/lib/R/site-library"' is not writable Would you like to use a personal library instead? (y/n) y Would you like to create a personal library ~/R/x86_64-pc-linux-gnu-library/3.0 to install packages into? (y/n) n

This does NOT work well, since the packages are then installed to a specific library. They need to be installed into: "/usr/local/lib/R/site-library", which is the site wide library. Therefore, make sure you do:

#In terminal, open r WITH sudo! sudo R

The dreaded configure error courtesy of RJSONIO. The error will be given as:

* installing *source* package 'RJSONIO' ... ** package 'RJSONIO' successfully unpacked and MD5 sums checked ERROR: 'configure' exists but is not executable -- see the 'R Installation and Administration Manual' * removing '/usr/local/lib/R/site-library/RJSONIO' ERROR: dependency 'RJSONIO' is not available for package 'shiny' * removing '/usr/local/lib/R/site-library/shiny'   The downloaded source packages are in '/tmp/RtmpWWV73L/downloaded_packages' Warning messages: 1: In install.packages("shiny", repos = "http://cran.rstudio.com/") : installation of package 'RJSONIO' had non-zero exit status 2: In install.packages("shiny", repos = "http://cran.rstudio.com/") : installation of package 'shiny' had non-zero exit status

The solution?

#Option 1 #create directory mkdir ~/tmp #Make read/write. chmod 777 ~/tmp export TMPDIR=~/tmp   #Option 2 mkdir /home/YOUR-USERNAME-HERE/tmp chmod 777 /home/YOUR-USERNAME-HERE/tmp export TMPDIR=/home/YOUR-USERNAME-HERE/tmp

Within sudo R:

#First, check to see if TMPDIR was set: Sys.getenv("TMPDIR") #IF this responds with the above path, skip this #Set the TMPDIR variable to be the correct folder. Sys.setenv("TMPDIR"="/home/YOUR-USERNAME-HERE/tmp")   #install packages like normal: install.packages('shiny',repos='http://cran.rstudio.com')

## STAT 426

STAT 426 - Sampling and Categorical Data
Taught by Professor Jeffrey A. Douglas in Spring 2014

"Sampling: simple random, stratified, systematic, cluster, and multi-stage sampling. Categorical data: multiway contingency tables, maximum likelihood estimation, goodness-of-fit tests, model selection, logistic regression. Computer work is an integral part of the course."

Course work and notes for STAT 426
Assignment Files
Homework 1 Soon
Exam 1 Soon

## STAT 426 - Homework 2

Agresti (2013): 3.3, 3.6(a), 3.7, 3.11(a,b), 3.16, 3.19, 3.20(a), 3.22, 3.25, 3.31, 3.36(a,b), 3.44

Due: 03/07/14 by 5:00 PM

The homework solutions that I turned in are here:
STAT 426 - HW 2 (PDF)

The file is created using knitr, which allows R code to be intertwined by LaTeX.
STAT 426 - HW 2 (RNW)

## List

Below are the research projects that I have been associated with during my graduate studies.

Psychometrics
Bayesian Sequential Modeling

## List

The following are courses and associated works that I have taken, taught, or been apart of during my graduate studies at the University of Illinois.

Course work and notes for Grad School
Fall 2013 STAT 571 Multivariate Analysis Student
Fall 2013 STAT 511 Mathematical Statistics I Student
Fall 2013 STAT 425 Applied Regression and Design Student
Fall 2013 STAT 400 Statistics and Probability I Graduate Teaching Assistant
Spring 2014 STAT 525 Computational Statistics Student
Spring 2014 STAT 511 Mathematical Statistics II Student
Spring 2014 STAT 426 Sampling and Categorical Data Student
Spring 2014 STAT 424 Analysis of Variance Student
Spring 2014 Research Psychometric Modeling Graduate Research Assistant
Spring 2014 STAT 200 Statistical Analysis Graduate Teaching Assistant

## Understanding Formula Response's Sampling Technique

So, you've made the decision to have students create a formula and have it automatically checked? That shouldn't be much of a problem. I mean, we have LON-CAPA and its hooks into Maxima, a computer algebra system (CAS).  So, this shouldn't really be much of a challenge. In fact, this code right here will serve you just fine! Scout's honor!

Problem Context:
Let a and b denote the sample mean of observations from N and M, respectively. Let sx and sy denotes the sample variance of observations from N and M, respectively. Define the test statistic in terms of a, b, sx and sy.

Note: Don't include "n" or "m" in your answer. Use sqrt() to denote the square root. For example, use "sqrt(x)" to denote $\sqrt{x}$ instead of "x^(0.5)", the system may have problem in identifying decimals in this formula response.

1 2 3 4 <formularesponse answer="(a-b)/(sqrt((sx*15+12*sy)/27*(1/16+1/13)))" id="12"> <responseparam name="tol" type="tolerance" default="1%" description="Numerical Tolerance" /> <textline readonly="no" size="25" /> </formularesponse>

And then the student e-mails pile in... And then the professor e-mails... And then... And then... And then... The day becomes a b-budget horror movie as confusion and desperation sets in to fix the problem.

The issue?

Why isn't (a-b)/(sqrt((sx*15+12*sy)/27)*sqrt(1/16+1/13))) working?

Five... Hours... Later... Why gods of LON-CAPA!? Why?! (I may have slightly exaggerated the amount of time that elapsed, but rest assured that was how long it felt.)

To quote one of my favorite characters on the "Eureka" moment:
Holmes: Well my dear Watson, the reason is very elementary you see...
Watson: No, I do not see! That is why I am turning to you for an explanation of how these seemingly implausible events are linked.

My guess as to the reason why LON-CAPA rejected it:
LON-CAPA does NOT simplify expressions before evaluating them. Note, that in the denominator there is one square root in the answer. In the answer giving students trouble, there were two square roots. I believe that the domains of the square roots did not align creating a discontinuity between the two.

The solution: Control the values being inserted into the expression.
By default, formula response checks the formula against an internal range based on the strings detected. Using the sampling technique, we are able to control the range of values used. To enable the sampling technique, simply add:

1 samples="a,b,sx,sy@4,1,1,1"

to the formularesponse tag declaration.

What's happening here though?
To the left of the @ we are defining the variables used in the expression: a,b,sx,sy.
To the right of the @ we have points that are mapped by their position relative to the variables. That is: a=4,b=1,sx=1,sy=1

But, that is a silly way to test equality since a student may randomly guess a number that is equivalent to the expression evaluated at the given location thanks to the tolerance option.

A better way is to use:

1 samples="a,b,sx,sy@4,1,1,1:8,5,3,9#3"

Let's breakdown what is occurring:
We are again defining the variables to be checked to the LEFT of the @ sign.

On the RIGHT of the @ sign, we still declare points that variables should take. However, we encounter a colon (:) followed by more numbers and then a # and another number.

The colon (:) signifies the region problem should look for points. The region start area is defined to the left of the colon and the end bounds for the region are defined on the right. So, we have the following inequalities:
$4 \le a \le 8$, $1 \le b \le 5$, $1 \le sx \le 3$, and $1 \le sy \le 9$.

The # indicates how many points should be picked at random from that region to test the equation with. In this case, we are testing three points in the region since we have #3.

Therefore, the corrected piece of code is:

1 2 3 4 <startouttext /> With Sampling Modification <endouttext /> <formularesponse answer="(a-b)/(sqrt((sx*15+12*sy)/27*(1/16+1/13)))" samples="a,b,sx,sy@4,1,1,1:8,5,3,9#3" id="13"> <textline readonly="no" size="25" /> </formularesponse>

However, sometimes even sampling ranges of points do not allow the correct response to be tested. For example, say you wanted a student to factor an equation. You probably only want the factored form. Though, using the region bounded approach described above on an expression without any removable discontinuities will yield a correct answer to a factoring question with the original expression! To avoid such an outcome, mathresponse must be used with the following modifications:

1 2 3 4 5 6 7 8 9 10 <script type="loncapa/perl"> $fun = "(x^3 - 3*x^2 + 3*x - 9)/(x^4 - 81)";$answer = &cas("maxima","factor(ratsimp($fun));"); </script> <mathresponse cas="maxima" answerdisplay="$answer" args="\$answer"> <answer> is(RESPONSE[1]=LONCAPALIST[1]); </answer> <textline readonly="no" size="40" /> </mathresponse>

This method tests for direct equivalence to the students answer.

## The Trace of the Kronecker Product

One of the coolest ideas I've come across recently relates to the way patterns form in matrices. Moreover, the way we can induce a pattern in a matrix. Enter the Kronecker Product, the way pattern-based matrix multiplication occurs.

Definition: Kroncker product uses matrix $A$ as an $m \times n$ matrix and matrix $B$ as a $p \times q$ matrix so that the Kronecker product $A \otimes B$ is the $mp \times nq$ block matrix:

$\mathbf{A}\otimes\mathbf{B} = \begin{bmatrix} a_{11} \mathbf{B} & \cdots & a_{1n}\mathbf{B} \\ \vdots & \ddots & \vdots \\ a_{m1} \mathbf{B} & \cdots & a_{mn} \mathbf{B} \end{bmatrix}$

Substituting in $B$'s values then yields:

${\mathbf{A}\otimes\mathbf{B}} = \begin{bmatrix} a_{11} b_{11} & a_{11} b_{12} & \cdots & a_{11} b_{1q} & \cdots & \cdots & a_{1n} b_{11} & a_{1n} b_{12} & \cdots & a_{1n} b_{1q} \\ a_{11} b_{21} & a_{11} b_{22} & \cdots & a_{11} b_{2q} & \cdots & \cdots & a_{1n} b_{21} & a_{1n} b_{22} & \cdots & a_{1n} b_{2q} \\ \vdots & \vdots & \ddots & \vdots & & & \vdots & \vdots & \ddots & \vdots \\ a_{11} b_{p1} & a_{11} b_{p2} & \cdots & a_{11} b_{pq} & \cdots & \cdots & a_{1n} b_{p1} & a_{1n} b_{p2} & \cdots & a_{1n} b_{pq} \\ \vdots & \vdots & & \vdots & \ddots & & \vdots & \vdots & & \vdots \\ \vdots & \vdots & & \vdots & & \ddots & \vdots & \vdots & & \vdots \\ a_{m1} b_{11} & a_{m1} b_{12} & \cdots & a_{m1} b_{1q} & \cdots & \cdots & a_{mn} b_{11} & a_{mn} b_{12} & \cdots & a_{mn} b_{1q} \\ a_{m1} b_{21} & a_{m1} b_{22} & \cdots & a_{m1} b_{2q} & \cdots & \cdots & a_{mn} b_{21} & a_{mn} b_{22} & \cdots & a_{mn} b_{2q} \\ \vdots & \vdots & \ddots & \vdots & & & \vdots & \vdots & \ddots & \vdots \\ a_{m1} b_{p1} & a_{m1} b_{p2} & \cdots & a_{m1} b_{pq} & \cdots & \cdots & a_{mn} b_{p1} & a_{mn} b_{p2} & \cdots & a_{mn} b_{pq} \end{bmatrix}$

Within both of these forms, we are able to see a distinct placement of matrices within another matrix.

### Theory to More Theory

Prove that $\mathrm{tr}\left( {A \otimes B} \right) = \mathrm{tr}\left( A \right)\mathrm{tr}\left( B \right)$

In our case, we are particularly interested in the diagonal of two square matrices joined by the Kronecker product. Why? Well, the trace of a square $n \times n$ matrix is defined to be the sum of the elements on the main diagonal, which runs from the upper left corner of the matrix to the lower right corner of the matrix. Mathematically, we have:

$\mathrm{tr}(A) = a_{11} + a_{22} + \dots + a_{nn}=\sum\limits_{i=1}^{n} {a_{ii}}$

Returning back to our previous line of discussion regarding the Kronecker product, we'll amend the matrices presented in the definition slightly so that matrix $A$ now has square dimensions of $a \times a$ and matrix $B$ now has square dimensions of $b \times b$. Thus, we will have a Kronecker product in the following form:
${\bf{A}} \otimes {\bf{B}} = \left[ {\begin{array}{*{20}{c}}{{a_{11}}{\bf{B}}}& \cdots &{{a_{1a}}{\bf{B}}}\\ \vdots & \ddots & \vdots \\{{a_{a1}}{\bf{B}}}& \cdots &{{a_{aa}}{\bf{B}}}\end{array}} \right]$
The dimensions of this matrix are now $\left( {ab} \right) \times \left( {ab} \right)$.
Thus, from non-expand Kronecker product form, we can see that sum of the main diagonal will be: $\sum\limits_{i = 1}^a {{a_{ii}}B}$
So, taking the trace of $A \otimes B$ gives $\mathrm{tr}\left( {A \otimes B} \right) = \sum\limits_{i = 1}^a {\left( {{a_{ii}}\sum\limits_{j = 1}^b {{b_{jj}}} } \right)}$.
Note: The trace is also being applied to $B$.
Therefore, we get: $\mathrm{tr}\left( {A \otimes B} \right) = \sum\limits_{i = 1}^a {\left( {{a_{ii}}\sum\limits_{j = 1}^b {{b_{jj}}} } \right)} = \sum\limits_{i = 1}^a {\left( {{a_{ii}}\mathrm{tr}\left( B \right)} \right)} = \mathrm{tr}\left( B \right)\sum\limits_{i = 1}^a {\left( {{a_{ii}}} \right)} = \mathrm{tr}\left( B \right)\mathrm{tr}\left( A \right) = \mathrm{tr}\left( A \right)tr\left( B \right)$

## MSOS

Project Title: Multivariate Statistics - Old School (MSOS)
Project URL: http://cran.r-project.org/web/packages/msos/index.html
Project Version: 1.0.1
Project Status: Active
Project Description: MSOS is an R package that provides the functionality of R functions and data sets used in John Marden's Multivariate Statistics Old School textbook. Furthermore, the package provides detailed documentation of the functions and datasets within R as well as a slew of examples.