PS6

From EECS492_W11
Jump to: navigation, search

Problem Set 6 (60 points)[edit]

EECS492 problem sets are a collaborative activity amongst your staff-assigned team. You may also discuss the problem sets and problem solving approach with other teams, but the work you submit must be wholly that of your team. In particular, you may not derive your solutions from previous years' solutions, nor may you derive your solutions from code snippets found elsewhere (including online). Examples: studying pseudo-code as found on Wikipedia is acceptable; adapting code from an open source project is not.

Your submission will include a written portion (which must be typeset, including any equations, figures, or graphs, and must be in PDF format), and a programming portion. Your writeup should include a terse restatement of each question (copy/paste is fine) in a font face that clearly separates the question from your answer, and should be formatted in an easy-to-navigate fashion (i.e., with large headings for task, and smaller headings for each problem). Make sure that the uniqnames of each member are listed at the top of every page as a running header. Deviations from this format will be penalized.

For the programming portions, you must include all of your code in a ready-to-compile state, along with a README that gives invocation instructions. Send your PDF and ZIPPED programming files by email. Your email must have the following form, where the challenge score is computed based on the scoring rules described in the problem set.

TO: eecs492-staff@april.eecs.umich.edu
SUBJECT: PS4: <uniqname1>, <uniqname2>, <uniqname3>. challenge=<challenge score> 
CC: <all of your team members>

You may receive a "moderation pending" message; this is harmless.

Task 1. Classification [15 points][edit]

You've gotten tired of seeing movies that you don't like, and have decided to try and figure out if you can predict more accurately if you will enjoy a movie. You know there are several criteria that influence whether or not you enjoy a movie. Specifically, you tend to enjoy movies that:

  • have a rating greater than 75 at rottentomatoes.com
  • are either action or drama
  • Are greater than 110 minutes long, but less than 130 minutes
  • and have earned more than $60 million at the box office.

Neither of these criteria are sufficient to enjoy a movie, and they don't always have to hold, but using any single one is better than nothing. You're curious as to whether you can use boosting to combine all of these classifiers and get a more accurate prediction of movie enjoyment.

You've collected the following data on 20 movies you've watched:

Sample Movie Rating Genre Runtime (minutes) Box Office Earnings ($ millions) Enjoyed
1 92 Drama 150 120 Y
2 85 Romance 82 25 Y
3 40 Action 114 104 N
4 72 Action 125 84 Y
5 65 Drama 119 66 N
6 50 Comedy 106 72 N
7 12 Action 99 18 N
8 96 Romance 110 42 N
9 70 Drama 131 23 N
10 33 Drama 119 68 N
11 57 Comedy 124 57 N
12 87 Romance 130 34 N
13 91 Documentary 140 8 Y
14 48 Comedy 100 52 N
15 64 Action 90 40 N
16 70 Romance 102 81 N
17 77 Documentary 106 22 N
18 62 Action 112 53 Y
19 42 Action 85 70 Y
20 75 Action 136 75 N

1.1 Compute the error rate of each individual criterion. Which one performs the best?

1.2 Use Adaboost to create a new classifier. Run through the Adaboost algorithm by hand, using the four classifiers given. For each classifier, show:

  • The current total error
  • The updated weight for each sample
  • The weight for the hypothesis

1.3 How many samples does the new classifier misclassify?

Task 2. Linear Separability [10 points][edit]

Consider the 2D data below:

Z=[1 0 0; 1 1 0; 2 1 0; 3 1 1; 1 2 0; 4 1 1; 1 3 1; 2 3 0];

The matrix Z contains 8 training examples; each has an X and Y coordinate and a label (either 0 or 1). You can plot these in matlab with this:


Z=[1 0 0; 1 1 0; 2 1 0; 3 1 1; 1 2 0; 4 1 1; 1 3 1; 2 3 0];
clf;
hold on;

z0=find(Z(:,3)==0);
z1=find(Z(:,3)==1);

plot(Z(z0,1), Z(z0,2), 'bo');
plot(Z(z1,1), Z(z1,2), 'ro');
axis([-1 4 -1 4]);

We wish to find a linear separator for this data, however, none exists in the original 2D space. Show the best separating plane you can, and report the error rate. (You can simply draw on top of the figure that results from the code above.)

We now add a new feature, the product of the x and y coordinates of each point. We can create a new feature vector and plot it like this:

Q=[Z(:,1:2), Z(:,1).*Z(:,2), Z(:,3)];
clf
hold on
plot3(Q(z0,1), Q(z0,2), Q(z0,3), 'bo');
plot3(Q(z1,1), Q(z1,2), Q(z1,3), 'ro');
axis([-1 4 -1 4 -1 8]);

Show that the new 3D space of the points is linearly separable by finding a rotation of the 3D display. You can do this by hand; turn in a view of your rotated 3D display with your linear separator manually drawn in.

Extra credit: find an equation for the 3D plane that separates the points in the form Ax + By + Cz + D = 0. (It need not be the maximal separator.) Show that all of the blue points fall on one side of the hyperplane (the left hand side evaluates to < 0) and that the red points fall on the other (the LHS evaluates to > 0).

Task 3. Neural Networks [10 points][edit]

Construct by hand two separate neural networks that compute the XNOR and NAND functions of two inputs. Make sure to specify what sort of units you are using.[6+4]

Task 4. Netflix (Challenge problem) [25 points][edit]

The goal of this task is to make better movie recommendations. Specifically, we will predict how one user will rate a movie using the ratings of other users. For companies like Netflix, making good suggestions helps retain customers, and is thus important for their profitability.

One of the central challenges is designing algorithms that scale to large amounts of data. The challenge dataset is very large, with 480189 users rating 17770 movies. We are providing two data sets: the challenge dataset is large and the user's actual ratings are not known to you. We've also provided a smaller dataset which you can use for more rapid testing; this dataset includes the correct user rankings. Note that each dataset is split into two parts: a training dataset (where ratings are known) and a test dataset.

The challenge training dataset contains approximately 100M movie ratings, from 480189 users and 17770 movies. Your challenge score will be evaluated using the challenge-query.gz file, which contains approximately 109k queries. The sample training dataset contains about 5M movie ratings, and there are about 27k queries. The data in the sample training set covers only 1407 movies, and so the "density" of ratings per movie is comparable to the challenge data set.

Each data file is provided in gzipped format. Note that it is not necessary to decompress the data files; the provided run code will decompress them on the fly.

You can obtain the datasets and the sample code here:

http://april.eecs.umich.edu/courses/eecs492_w10/netflix

The metric used to evaluate your algorithm is mean squared error. Users rate moves on an integer scale from 1 to 5. Note that not all users rate all movies, which creates a "missing data" problem. Your challenge score will be the MSE of your algorithm on the challenge dataset alone; your performance on the sample dataset is irrelevant.

You may NOT use any additional netflix data from any other source, including the original netflix data.

Getting started[edit]

You will implement a java class implementing the netflix.Agent interface. Example code has been provided; see netflix.ExampleAgent.

You can run your code using:

java -cp netflix.jar netflix.Grader --train=sample-train.gz --query=sample-query-sol.gz --agent=netflix.ExampleAgent

which will output:

MSE:         1.02040

After implementing your agent and testing it on the sample dataset, you need to run your algorithm on the challenge dataset. You can do this with:

 java -cp netflix.jar netflix.Grader --train=challenge-train.gz --query=challenge-query.gz --agent=netflix.ExampleAgent

This may take some time (several minutes for our simple ExampleAgent). Eventually, it will output:

Query data set did not contain ground truth ratings; no MSE computed.

No MSE is computed because the correct user ratings are not included in the challenge-query.gz data set. Instead, the Grader application has written a text file named "output.txt" which contains all the ratings generated by your agent. You can upload this file to the leaderboard, which will compute your MSE.

Important: The challenge data set is large, and processing it may take several hours even for relatively simple methods. Plan accordingly!

You should also be prepared to answer queries for brand new users, i.e., users that have never rated a movie before.

Possible approaches[edit]

  • Nearest neighbors: find the k users that are most similar to the query user, and predict the query user's preference based on the preferences of the k similar users.
  • Clustering: Group similar users together and use a method like regression to predict what people in that group like.
  • Dimensionality reduction: Find a small set of "typical" user preferences, then describe the query user in terms those preferences. For example, the query user could be 0.8 similar to "typical" user 7, 0.2 similar to "typical" user 2. You could then use the data for those "typical" users to predict how the query user will like a new movie. There are many ways of generating these "typical" users (read the Eigenfaces paper for some ideas), but you could also just pick actual users that have rated many movies.
  • Exploit additional data. Movie titles (provided with the dataset) could be mined for keywords that might be strongly predictive. If a user liked one "star wars" movie, they might be more likely to like the others. Another example are the dates: do people's voting histories change over time?
  • Exploit implicit data. Is someone who watches a lot of movies more likely to rate highly?
  • Build a predictive model for each movie. For example, construct a number of weak features (which could include ratings on other movies, keywords in movie titles, dates, etc.) and use a machine learning method to predict the scores for a single movie using those features. You could use AdaBoost, for example, either by using a multi-class variant or by building 5 classifiers (one for each rating) and combining the results.

Note that normalizing your predictions with respect to the rating habits of a particular user can have a significant impact on your score. In other words, a "3" to one user may not mean the same thing as a "3" to another user; correcting for this can noticeably improve your score.

Note that you do not need to do your training from within your netflix.Agent; you can write a separate application to do the training and write intermediate data to disk. However, at least initially, it will probably be convenient for you to make use of the netflix.Agent.train() method. Naturally, you should turn in your training code as well.

How will we be graded?[edit]

A full credit solution will have the following properties:

  1. The method can be run on the challenge dataset and produces a reasonable MSE. (You should be able to beat 1.0 with a reasonable effort).
  2. The method is described clearly and succinctly in the write-up.
  3. The method makes use of specific user behavior or specific trends within a movie. (A method that produces "aggregate" estimates over the entire body of movie ratings, such as the ExampleAgent, is not acceptable.) Another way of saying this: your predictions should reflect information specific to the particular user or the particular movie (or both).
  4. The method makes use of at least one machine learning method described above or in class, possibly including (but not limited to) regression, boosting, SVMs, decision trees, bayes' nets, clustering, nearest neighbors.

When submitting your solution, please report your MSE on the challenge dataset along with the runtime of your algorithm (including training time, even if it was done by a separate application). We are curious about both your training and query time, but neither will affect your challenge score.

The staff solution, which is based on describing query users as a blend of a set of "reference" users (and is careful about normalizing scores with respect to each user), scores 0.8583. This algorithm took about 2 hours to train and run on the challenge dataset, and used about 400MB of memory to hold the data.

Task 5. Certification and Peer Evaluation [required for credit, 0 points][edit]

Print or write the following on a sheet of paper:

"I participated and contributed to team discussions on each problem, and I attest to the integrity of each solution. Our team met as a group on [DATE(S)]."

Each team member should sign the paper (physically, not digitally), and a photo of the paper should be included with your problem set. By doing so, you are asserting that you were materially involved in each answer, and that you certify that every solution complies with the collaboration policy of this course. In the event of a violation of the course's guidelines, your certification will be forwarded to the honor council.

If the statement requires qualification (i.e., it's not quite true), please describe the issue and the steps you took to mitigate it. If a team member did not materially contribute to a particular solution, then their score on that problem will generally be reduced. The staff will use their discretion based on the specific circumstances you identify, and we are happy to discuss potential problems prior to the problem set due date.

If your team works together, as required, this will be painless! :)

Each member of your team must also complete a peer evaluation, accessible from the apps page.

Your evaluations of your teammates will remain private. If problems develop with your team, it is up to your team to resolve them: we will not intercede except in the most extreme situations.