# PS5

## Contents

**Problem Set 5** (60 points)[edit]

EECS492 problem sets are a collaborative activity amongst your staff-assigned team. You may also discuss the problem sets and problem solving approach with other teams, but the work you submit must be wholly that of your team. In particular, you may not derive your solutions from previous years' solutions, nor may you derive your solutions from code snippets found elsewhere (including online). Examples: studying pseudo-code as found on Wikipedia is acceptable; adapting code from an open source project is not.

Your submission will include a written portion (which must be typeset, including any equations, figures, or graphs, and must be in PDF format), and a programming portion. Your writeup should include a terse restatement of each question (copy/paste is fine) in a font face that clearly separates the question from your answer, and should be formatted in an easy-to-navigate fashion (i.e., with large headings for task, and smaller headings for each problem). Make sure that the uniqnames of each member are listed at the top of every page as a running header. Deviations from this format will be penalized.

For the programming portions, you must include all of your code in a ready-to-compile state, along with a README that gives invocation instructions. Send your PDF and ZIPPED programming files by email. Your email must have the following form, where the challenge score is computed based on the scoring rules described in the problem set.

TO: eecs492-staff@april.eecs.umich.edu SUBJECT: PS4: <uniqname1>, <uniqname2>, <uniqname3>. challenge=<challenge score> CC: <all of your team members>

You may receive a "moderation pending" message; this is harmless.

## Task 1. Sequential Decision Process [15 points][edit]

Consider an acrophobic (afraid of heights) tourist at the Grand Canyon. The tourist would like to obtain the best possible view of the canyon, which is obtained by walking close to the edge. Quite reasonably, he is also afraid of tumbling into the canyon.

The world has four states: S0, S1, S2, and S3. S0 is farthest from the canyon edge and has the worst view. S2 is right at the rim of the canyon and affords an excellent view, and S1 is between S0 and S2. In state S3, the tourist has fallen into the canyon; the agent dies.

State | S0 S1 S2 S3 --------------------------- Reward | 1 2 4 -4

In any given state, the tourist has several actions available: move BACK, STAY in place, or move FORWARD. These actions are non-deterministic; in particular, the tourist may inadvertently move towards the canyon edge ("slip") when he is attempting to stay still.

Action | Outcome ------------------------------ BACK | Moves away from cannon with probability 1.0. (If at S0, stays at S0). STAY | "Stay" with Pr = 0.9, "slip" towards canyon one position with Pr = 0.1 FORW | "Forward" with Pr = 1.0

A) [3 points] Suppose that the discount rate (gamma) is 1.0. What is the optimal policy for each state? (This should require no computation.)

B) [12 points] Using a programming language of your choice, implement value iteration for this problem. Your implementation should make it easy to change the immediate rewards at each state, the discount rate, and the probability of slipping (which affects the STAY action). Further, your routine should output both the estimated utilities of each state, along with the action recommended by the policy at each state. Finally, you should update values in the order S3, S2, S1, S0 so that your numerical answers agree with our implementation. Attach your code to your submission. You should also initialize your expected utilties to 0.

Note that we define an iteration as serially updating each of the variables, using the most recently computed value of each utility, even if that utility was computed earlier during the same iteration. (I.e., the same way we did it in lecture.)

B.1) With the discount rate set to 0.5, what are the estimated utilities and policies after one iteration?

B.2) What were the steady-state values of the estimated utilities and the resulting policy?

B.3) About how many iterations did it take to achieve these utilities (i.e., when the utilties were constant to three decimal places)?

C) [5 points] Extra credit: Implement policy iteration as well, and show how the convergence rate compares to value iteration.

## Task 2. Manipulating Gaussian Distributions [10 points][edit]

We once again want to help find Jill who is standing somewhere on the X axis. Suppose our prior is P(x) ~ N(u1, σ1). Bob observes P(z | x) ~ N(x + u2, σ2). We wish to compute the posterior P(x | z).

**2.1.** [5 points] By manipulating the expression algebraically, show that the posterior can be written in the form of a Gaussian distribution (you can neglect the normalization constant). Identify the mean and variance of this distribution.

**2.2.** [5 points] Since we know that the maximum probability of a Gaussian occurs at its mean value, we can find the mean of P(x | z) another way. Find the value of x that maximizes the probability P(x | z) by differentiating it, setting to zero, and solving for x.

## Task 3. Classification [15 points][edit]

MOVED TO PS6.

You've gotten tired of seeing movies that you don't like, and have decided to try and figure out if you can predict more accurately if you will enjoy a movie. You know there are several criteria that influence whether or not you enjoy a movie. Specifically, you tend to enjoy movies that:

- have a rating greater than 75 at rottentomatoes.com
- are either action or drama
- Are greater than 110 minutes long, but less than 130 minutes
- and have earned more than $60 million at the box office.

Neither of these criteria are sufficient to enjoy a movie, and they don't always have to hold, but using any single one is better than nothing. You're curious as to whether you can use **boosting** to combine all of these classifiers and get a more accurate prediction of movie enjoyment.

You've collected the following data on 20 movies you've watched:

Sample | Movie Rating | Genre | Runtime (minutes) | Box Office Earnings ($ millions) | Enjoyed |

1 | 92 | Drama | 150 | 120 | Y |

2 | 85 | Romance | 82 | 25 | Y |

3 | 40 | Action | 114 | 104 | N |

4 | 72 | Action | 125 | 84 | Y |

5 | 65 | Drama | 119 | 66 | N |

6 | 50 | Comedy | 106 | 72 | N |

7 | 12 | Action | 99 | 18 | N |

8 | 96 | Romance | 110 | 42 | N |

9 | 70 | Drama | 131 | 23 | N |

10 | 33 | Drama | 119 | 68 | N |

11 | 57 | Comedy | 124 | 57 | N |

12 | 87 | Romance | 130 | 34 | N |

13 | 91 | Documentary | 140 | 8 | Y |

14 | 48 | Comedy | 100 | 52 | N |

15 | 64 | Action | 90 | 40 | N |

16 | 70 | Romance | 102 | 81 | N |

17 | 77 | Documentary | 106 | 22 | N |

18 | 62 | Action | 112 | 53 | Y |

19 | 42 | Action | 85 | 70 | Y |

20 | 75 | Action | 136 | 75 | N |

**4.1** Compute the classification rate of each individual criterion. Which one performs the best?

**4.2** Use Adaboost to create a new classifier. Run through the Adaboost algorithm by hand, using the four classifiers given. For each classifier, show:

- The current total error
- The updated weight for each sample
- The weight for the hypothesis

**4.3** How many samples does the new classifier misclassify?

## Task 4. Kalwar (Challenge problem) [20 points][edit]

In this challenge problem, you'll be implementing an agent to play a simple "arcade" game. Your agent controls a tank, operating in a simple two-dimensional world. This world extends from [-1, 1] in both the X and Y axes. There are immovable and perfectly elastic obstacles forming a box around the world, and there's a circular obstacle in the center. Your tank has shields, missiles, and an opposing tank (which is controlled by other teams in 492!) Your goal is to move your tank to shoot a missile at your opponent, while avoiding being hit yourself.

But there are some catches:

- Your tank can only accelerate at a relatively low rate. (Like the original "asteroids" game). There's also a maximum attainable speed.
- Missiles travel at a fast, but finite, rate. (So you must "lead" your shots to hit moving targets).
- You have an infinite amount of ammunition, but only a finite number (4) of "missile tubes". When loaded, missile tubes can be fired in rapid succession. However, once fired, a missile tube takes time to be reloaded.
- Your tank has finite shields; impacts with other objects consumes shields (in proportion to the change in momentum caused by the collision). Missiles cause significant additional damage.
- Bonus items (blue orbs) appear in the environment which can restore shields. You know the size and type of each bonus item, and "collect" them by driving over them.
- Your shields slowly decay all by themselves. (This eliminates stalemates: someone always wins.)

And the big catch:

- While you know your own position and velocity, you do not know your opponent's. Rather, you receive noisy "radar" (distance, range) measurements of your opponent.

From the 492 perspective, the key component of this game is implementing an Extended Kalman Filter to predict the location of the enemy tank. You'll have to handle observation steps (from "radar" measurements) as well as time-propagation steps (accounting for your best estimate of their velocity as well as the effects of their unknown future acceleration). Combined with a reasonable attempt at an agent (one which does something sensible, but not necessarily planning ahead) will yield a full-credit solution. A "sensible" agent is one that moves with some purpose (perhaps accelerating towards bonus objects) and periodically fires missiles towards the enemy. A more elaborate agent may wish to consider the effects of colliding off walls and the strategy of how/when to fire your missiles. The simulation environment used by the game engine itself can be used by your agent.

Your agent must be in a package in a globally-unique namespace. (You can pick one of your team member's uniqnames, for example). E.g., you might submit a JAR file named "ebolson.jar". Your agent must be "ebolson.MyAgent" (replacing "ebolson" with your package name). These extra requirements will allow us to take submitted code and to run tournaments amongst all of team's submissions. In order to make the tournament's workable, your agent must not use an excessive amount of CPU time-- no more than about one minute per game.

The initial version of the code is below. Please monitor the course email list for bug fixes, and updates as this is a brand-new problem set!

You can see the game in action using two sample agents by running:

java -jar kalwar.jar

To test the game using custom agents, try:

java -jar kalwar.jar -a agent1 -b agent2

We currently ship two example agents you can try: kalwar.SittingDuck and kalwar.DrunkenDuck

When submitting your code, we'd like your agent to be in a namespace the same as your uniqname (choose anybody in your team!), and called ChallengeAgent. To make this a bit easier to set up, we've included a script that will generate everything for you. To set this up, try:

./generate-build-environment.sh <uniqname> ant java -cp kalwar.jar:<uniqname>.jar kalwar.KalwarGUI -a agent1 -b <uniqname>.ChallengeAgent

You can edit your newly-created agent in src/<uniqname>/ChallengeAgent.java

For the purpose of submitting your email, please include the results of your agent against our provided agents by running:

java -cp kalwar.jar:ebolson.jar kalwar.ChallengeTournament ebolson.MyAgent | tee submission.txt

## Task 5. Certification and Peer Evaluation [required for credit, 0 points][edit]

Print or write the following on a sheet of paper:

**"I participated and contributed to team discussions on each problem, and I attest to the integrity of each solution. Our team met as a group on [DATE(S)]." **

Each team member should sign the paper (physically, not digitally), and a photo of the paper should be included with your problem set. By doing so, you are asserting that you were materially involved in each answer, and that you certify that every solution complies with the collaboration policy of this course. In the event of a violation of the course's guidelines, your certification will be forwarded to the honor council.

If the statement requires qualification (i.e., it's not quite true), please describe the issue and the steps you took to mitigate it. If a team member did not materially contribute to a particular solution, then their score on that problem will generally be reduced. The staff will use their discretion based on the specific circumstances you identify, and we are happy to discuss potential problems prior to the problem set due date.

If your team works together, as required, this will be painless! :)

Each member of your team must also complete a peer evaluation, accessible from the apps page.

Your evaluations of your teammates will remain private. If problems develop with your team, it is up to your team to resolve them: we will not intercede except in the most extreme situations.