Homework 2 for Biostat 778

Due: December 4, 2013

Setup

In order to do this homework you must

Fork the Biostat778_HW2 repository on GitHub
Once you have forked the repository on GitHub, you can clone it to your local computer to actually do the work.
As you are working, make sure to commit changes at logical points via git add and git commit.
Once you have completed the assignment, you can push your changes back to your GitHub repository via git push.
Once you have pushed your changes, make a pull request on GitHub so that I can see that you're ready to submit your homework.
Your finished code should be submitted in the form of an R package. The R package should be named Homework2.
Your R package should pass R CMD check without any warnings, errors, or notes.
I will put unit tests in the tests directory of the R package for the master branch. Please DO NOT make any changes in the tests directory.
You can use the tests in the tests directory to check your code as the expected results will also be in the tests directory.

Mixture Model

Consider data \(y_1,y_2,\dots,y_n\) which are iid from a mixture of 2 Normal distributions,

\[ y_i \sim \lambda\mathcal{N}(\mu_1,\sigma_1^2)+(1-\lambda)\mathcal{N}(\mu_2,\sigma_2^2) \]

Write a function that estimates the unknown parameters \(\lambda\), \(\mu_1\), \(\mu_2\), \(\sigma_1^2\), and \(\sigma_2^2\) using either Newton's method or the EM algorithm.

Do not use the optim, nlm, nlminb, or optimize functions in your code.
There should be a method argument that takes options “newton” or “EM” to allow the user to choose which fitting method is used
For the “newton” method you may be interested in using the deriv or deriv3 functions.
Your function should return a list with elements mle containing the vector of maximum likelihood estimates and stderr containing the vector of corresponding asymptotic standard errors for the MLEs. The elements of both the mle and stderr vectors should be named with the following names: lambda, mu1, mu2, sigma1, sigma2.
There should be a param0 argument that allows users to specify the starting value for either the Newton or the EM algorithm. The default value for param0 should be NULL, in which case your function should choose the starting value.
Your function should check to see that the value specified for method is valid. The easiest way to do this is with the match.arg() function.
There should be a maxit' argument specifying the maximum number of iterations for each method. It defaults toNULL, in which casemaxit should be 100 for Newton's method and 500 for the EM algorithm.
There should be a tol argument that controls the tolerance for convergence and it should default to 1e-8.
Place your function in an R package with appropriate documentation.
You can test your function with the data provided in the Git repository.

Your function should follow the following prototype:

mixture <- function(y, method, maxit = NULL, tol = 1e-08, param0 = NULL) {
    ## Your code goes here

    ## Return a list with elements `mle' for the maximum likelhood estimates and
    ## `stderr' for their standard errors.
}