Predicting Object Functionality Using Physical Simulations
Lauren Hinkle and Edwin Olson

Abstract— It is challenging for a robot acting in the world
to interact with and use novel objects. While a person may be
able to look past visual differences and recognize the intended
function of an object, doing so is more difﬁcult for robots,
which tend to rely on visual similarity to recognize categories
of objects. A robot that recognizes and classiﬁes objects based
on their functional properties and potential capabilities is better
prepared to use unknown objects.
We propose a technique for functionally classifying objects
using features obtained through physical simulations. The
described method simulates spheres falling onto an object from
above. We show how a feature vector can be derived from the
results of the physics-based simulation, and that this feature
vector is informative for a variety of affordance classiﬁcation
tasks. This process allows a robot equipped with a 3D sensor to
determine the functionality of objects in its environment given
only a few training examples from various function classes. We
show that this method is able to accurately learn membership of
3D models in three function classes: “drinking vessel”, “table”,
and “sittable”. We then show that this can be extended to 3D
scans of objects using the models as training examples.

Fig. 1: The result of simulating spheres falling onto a cup,
table, chair, and sofa (not shown to scale).

I. I NTRODUCTION
The dominant approach in object classiﬁcation is to use
the appearance of objects: a chair might be recognized
because particular visual features (e.g. SIFT features [1])
are associated with it. The challenge with this approach is
that visual appearance within many classes, such as chairs, is
highly diverse, making learning difﬁcult. What all chairs have
in common, however, is that they serve a speciﬁc physical
function: their construction affords the speciﬁc interaction of
“sitting”. This shared functional property, or affordance, is
what allows all chair-like objects to be grouped in a class.
Functional properties can be used to classify objects and to
give insight into how an object can be used and by whom.
In this paper, we approach the problem of object classiﬁcation by considering the physical properties of objects. Our
hypothesis is that these physical properties, which we predict
by computing a 3D model of the object and subjecting it to a
series of simulations using a simple physics engine, will be
highly predictive in object classiﬁcation tasks. We test this
hypothesis using a limited set of physics-based simulations
and show how machine learning features can be extracted
from the results.
Functions and affordances are intrinsically linked to the
physical properties of an object, but the exact mapping
between them is unclear. Previous work approaches function recognition and classiﬁcation in a variety of ways
The authors are with the Computer Science and Engineering
Department,
University
of
Michigan,
Ann
Arbor,
MI
48104,
USA
{lhinkle,ebolson}@umich.edu

http://april.eecs.umich.edu

including predeﬁning characteristics of function classes [2],
[3], analyzing images or video of people interacting with
objects [4], [5], [6], and, more commonly, having a robot
learn functional properties by either interacting with objects
directly [7], [8] or in simulation [9], [10]. These techniques
require a large time commitment and focus only on the
functionality afforded to a given user, either the human being
watched or the robot that is interacting. Learning the actions
a given robot can perform with an object misses many of
the possible functions and uses of the object. These more
general properties can’t be discovered through an individual’s
interactions, but can be explored in simulation.
We introduce features for predicting object functionality
obtained by physical simulation. In particular, we simulate
the effect of dropping spheres onto an object from above,
allowing them to roll and settle on the object. The resulting
distribution of spheres is analyzed and used for function
classiﬁcation. We show that objects in the same function
class have correspondingly similar distributions of equally
sized simulated spheres.
In this paper we:
• Propose a class of object features based on physical
properties determined through simulation
• Experimentally show the proposed framework can support learning functional classes like “sittable”, “tablelike”, and “drinking vessel” using 3D object models
• Demonstrate successful function classiﬁcation on examples of real-world objects using sensory data from a
Kinect.

II. P RIOR W ORK
Prior work on function classiﬁcation can largely be divided
into systems that reason about objects’ physical properties
and systems that learn functionality through interaction [11],
[12].
A. Reasoning About Functionality
The initial work on distinguishing function classes emphasized measurable quantities of objects, such as their
dimensions, with pre-deﬁned parametric requirements for
each class. Among the ﬁrst function classes explored was
“sittable.” Stark et. al use a list of the faces and vertices deﬁning an object to determine whether it can be appropriately
used as a chair [3]. The list helps determine characteristics
such as the object’s dimensions, relationships between its
faces, and its predicted center of gravity. These higher-level
features are compared to a parameterized structural model
deﬁning a sittable object. The model isn’t learned, but rather
is speciﬁed with hard-coded “knowledge primitives” that
deﬁne which relationships must exist.
Rivlin et al. introduced a method to determine the overarching functional properties of an object by recognizing and
relating the function of parts of the object [2]. By combining
shape primitives such as the dimensions, spatial primitives
describing the relationship between parts, and pre-deﬁned
functional primitives, they are able to recognize hammers in
a scene.
Rather than using size and shape directly, Horton et al.
use sensory feedback from an AIBO to discover visual symmetries and what they term “inverse symmetries”. They use
these symmetries to predict relationships between objects,
allowing an AIBO to use a tool to push an object over rough
terrain [13].
B. Learning Functionality Through Interaction
The majority of work on recognizing the functional properties of objects is framed in terms of affordances. The term
“affordance” comes from ecological psychology [14], but has
been adopted and adapted by other communities including
human computer interaction and artiﬁcial intelligence. An
affordance is often interpreted as a perceivable attribute of
an object that alerts the perceiver as to what functions he,
she, or it can performed on or with the object. Approaching
function recognition through the lens of affordances has
led to an emphasis on either analyzing images of people
interacting with objects or having robots interact (or simulate
interacting) with objects.
Systems that analyze how humans interact with objects
attempt to recognize where people grasp objects and the
motions they perform with them. This is often done by
analyzing a series of frames to determine when and where
contact is made and tracking the ensuing motion [4], [5], [6].
Most methods that use robot interaction employ a “babbling” stage, in which the robot interacts freely in the
environment without any goals, to determine the functions
it can perform on nearby objects. Stoychev created a system
that learns to use sticks with a variety of end-effectors to

move a puck around a table [8]. After hundreds of test
trials, the robot learns to associate the puck’s trajectories
with the color-coding of the stick that caused them and is
able to choose the correct stick to move the puck from a
start position to a goal [10]. Similarly, Brown and Sammut
created a simulated robot that employs babbling to learn to
use a stick to push balls out of tunnels and to climb ramps
to reach rewards. The robot learns numerical relationships
between the tools it uses and the tasks they help perform
(for example, that the stick must be thin enough to ﬁt in the
tunnel) [9].
Although tool-use is the most common emphasis in affordance and function recognition, other affordances such
as traversibility are also explored through robot interaction
and babbling. Erdemir et al. explore traversibility using
statistical information obtained in a babbling stage to perform
internal rehearsals before acting in later stages [7]. U˘ ur et
g
al. consider traversibility of a robot among objects that can
or cannot be moved, such as balls or boxes respectively [15].
In a departure from both tool-use and traversibility, Grifﬁth
et al. use vision and sound to determine whether objects are
containers. A robot accomplishes this both by attempting to
manipulate a small toy inside the object [16], and by moving
objects under running water in a sink [17].
These affordance-driven techniques allow the robot to
learn what actions it is capable of performing with an object,
but do not reveal wider, functional properties of an object.
For example, although an object might provide an excellent
seat for a person, most robots are incapable of sitting and will
therefore never discover this affordance. Simulation provides
an arena for exploring properties a robot cannot discover
through its own interactions but which may be useful to
recognize when communicating with humans or other robots
with differing capabilities.
Bar Aviv and Rivlin use simulated “examination agents”
that interact with objects to determine the objects’ functional
properties [18]. Each function category is deﬁned as a
combination of an examination agent and a set of constraints
on the location and orientation of the agent, both of which
must be pre-deﬁned. For example, in order to test if an object
can be sat upon, a human model is manipulated on the object
and tested for “sitting positions”, parameters in which the
angles of the model’s joints must lie. If the model can be
positioned within these constraints, the object is functionally
categorized as a chair. They are similarly able to recognize
tables using a simulated person and bookshelves using a
simulated book.
The method described in this paper also explores the
use of simulation to predict the functional properties of
objects without restricting functions to those performable
by a particular robot or person. However, unlike Bar-Aviv
and Rivlin’s method, function classes do not have predeﬁned
“examination agents” or goal orientations.

III. S IMULATED I NTERACTION FOR F UNCTION
C LASSIFICATION
Our method is, in short, to simulate dropping spheres onto
an object and use the resulting distribution to functionally
classify it. We use a 3D point cloud of an object to predict
how spheres dropped on the object from above will respond
to it. The simulated spheres follow basic physical rules when
colliding with either the object’s model or other spheres
before coming to rest in static locations. Each function
class is associated with a sphere radius that provides the
most informative distribution, i.e. that allowing for the best
classiﬁcation accuracy, which is learned using initial training
examples. We derive a feature vector from a histogram of the
locations of the spheres on the object.
We deﬁne the functional properties of an object to be how
the object is able to interact with other objects, speciﬁcally,
how it affects them and how it is affected by them. Coarse
simulation of these interactions predicts how an object
might react in the real world without requiring any actual
interaction to occur. The beneﬁts of this are that it’s often
faster than initiating physical interaction and that one can
explore interactions that are irreversible in the real world.
Additionally, it may allow for the discovery of a more general
set of functions and affordances. We hypothesize that even
if the simulated interactions don’t perfectly model the realworld interactions they are designed to imitate, they reveal
useful information about the object. We choose spheres for
our preliminary interaction simulation tests because their
physics are straightforward and they are able to roll and ﬁt
the shape of objects.
A. Simulating an Object and Falling Spheres
Although an object mesh provides more information and
results in more accurate simulations, we use only a point
cloud for simulation to reﬂect the data obtained from 3D
sensors. A robot exploring the world with a 3D sensor does
not have the advantages afforded by an object mesh. While it
can easily build a point cloud of its environment, transforming the point cloud into a set of faces is a computationally
expensive and unnecessary task. Basic spatial knowledge of
the point clouds is assumed by creating a ground plane along
the lowest Z-plane of the bounding box. Spheres fall toward
this plane, in a simulation of gravitational effects.
A more sophisticated physics simulation system like Open
Dynamics Engine [19] could be used for our approach.
However, high-ﬁdelity results are not necessary for good
classiﬁcation performance. Consequently, we use a lowﬁdelity simulation system that runs much more quickly. In
our system, simulated spheres drop one at a time from
randomly chosen (x, y) positions above the object’s point
cloud. A sphere drops straight down until it collides with
points in the point cloud or another sphere. The sphere “rolls
off” the points it collides with by estimating a plane through
the points and rolling along the plane. Similarly, a falling
sphere rolls off spheres that have already settled. A sphere
continues to roll along the object until it either collides with

Fig. 2: A simple 2D example of how features are extracted.
Once the simulation is complete, the object is divided into
equal bins. The number of sphere centers in each bin is
counted and these counts form the histogram and feature
vector.

an object that blocks its path, comes to rest on a ﬂat surface,
or rolls off the object entirely.
Once a sphere comes to rest, it is static and does not move
regardless of other spheres colliding with it. The histogram is
calculated using only spheres that land on the object; spheres
that roll off the object and strike the ground are ignored.
Spheres continue to fall onto the object until a user-deﬁned
number of sequential spheres either roll off or fail to be
placed on the object. Together, these two conditions suggest
that the object is covered by spheres and cannot hold more.
Although this threshold could be varied for different object
sizes, in our tests, we set this threshold to 50 for all objects.
Spheres pile upon one another without direct support from
the object while the pile ﬁts within the object’s bounding box.
This compensates for the fact that spheres become static once
they have landed. In the real world, a ball falling on top of
or rolling up against another will generally cause the second
ball to roll as well if it is not held in place; a pile of balls
doesn’t form without a structure to support them. The result
of this constraint is that cups ﬁll up with spheres, chairs have
a slope growing from the front edge of the seat to the back,
and tables have a ﬂat surface of spheres. This can be seen
in Fig. 1.
B. Feature Extraction
The ﬁnal location of the spheres is summarized by a
histogram, where each element of the histogram represents
the number of spheres that fell within a given box in space.
A simpliﬁed, two-dimensional example of the histogram and
features resulting from spheres being dropped on a chair is
shown in Fig. 2.
Histograms are chosen to represent the results of the
physical simulation because we predict different objects will
cause the spheres to land and group in different ways. The
distribution of the spheres is affected by their radii as well as
the size and shape of the object. Prior work uses object size
and shape as features for recognizing function classes, and
although the proposed method does not use these properties
explicitly, the resulting vectors are affected by them. The
distribution of the spheres is summarized using a histogram
that divides the object into equally sized cubes.
The initial feature vector for each object is constructed
with the histograms derived from a variety of sphere sizes.
It is reasonable to believe that different sphere sizes will
perform better for different function types, which may vary

greatly in expected size themselves. For example, large
spheres may not be helpful for recognizing drinking vessels
because they can support only one large sphere, while such
spheres may be more relevant in determining objects that are
sittable for a person. To explore this, a variety of radii are
initially used. In practice, we have found that smaller spheres,
on the scale of a few centimeters, lead to better classiﬁcation
for all function types because they provide richer histograms.
The accuracy of the histograms depends upon the discretization chosen. Large bins will encapsulate whole objects, causing spatial relationships between spheres to be
lost. Alternatively, bins that are too small will only contain
a single sphere center, making comparisons between feature
vectors of different objects harder because spheres are required to land in the same location rather than in the same
general area. After testing several alternatives, including a
range of ﬁxed sizes of bins and sphere radius-to-bin ratios,
we found that bins three to four times the diameter of the
spheres perform best. This means the histograms resulting
from different sphere radii have different numbers of bins
and are not comparable. Additionally, this results in a single
histogram bins of some of the larger spheres fully containing
some of the smaller objects. This is non-ideal, but was not
further explored in this work.

(a) The average F-measure obtained with an increasing number of
total training examples over 10 trials. Error bars denote variance.

Class
Cup-like
Table-like
Sittable

Radius (cm)
0.5
1
0.5

F-Measure
0.80
0.85
0.74

Accuracy
0.96
0.94
0.92

(b) Ultimate radius chosen for each function class and the F-measure
and accuracy at that radius.

Fig. 3: Performance of function classiﬁcation using features
derived from simulating falling spheres.

C. Function Classiﬁcation
Each function class is associated with a single radius that
is found to produce sphere distributions that result in accurate
classiﬁcations. In order to determine the best-performing
radius for a given function class, initial examples of each
class are divided 70% − 30% into a training and a validation
set. Function classiﬁcation is performed using a binary, onevs-all, classiﬁer for each class. In our evaluation we use a
weighted nearest neighbor classiﬁer where training examples
in the classiﬁer are weighted with the inverse Euclidean distance between an example and the histogram being evaluated.
A weighted nearest neighbor classiﬁer was chosen because
we expect to have few initial training examples, and we
have found the weighted nearest neighbor classiﬁer gives
good predictions with few training examples. Individual
binary, one-vs-all classiﬁers are necessary because they do
not restrict function class membership to a single class. Thus
a sofa, which can be either sat or laid upon will have positive
classiﬁcation for “sittable” and “layable” classiﬁers but will
be negative for classiﬁers such as “table”.
The F-measure is chosen as our evaluation metric because
the goal of function classiﬁcation is to correctly identify as
many objects exemplifying the function as possible while
minimizing the number of false positives. The F-measure
accomplishes this by heavily penalizing either low precision
or low recall.
F-measure =

2 ∗ precision ∗ recall
precision + recall

Each radius is evaluated by calculating the F-measure of
its classiﬁcation for each function class on the validation set.
The radius with the highest F-measure for a given function

is chosen as the representative for that function class and is
used in the future when determining whether an object is a
member of that class.
When functionally classifying an object, only the chosen
radius for a given class is tested on the object. A robot
looking for sittable objects in its environment would only
simulate dropping spheres of a speciﬁc radius to make a
classiﬁcation. Alternatively, if the robot wanted to determine which function classes an object exhibited, it would
simulate dropping spheres with the representative radius for
each function class and consider the resulting distributions
individually.
This radii selection test can be performed either ofﬂine
with 3D object models or online using the ﬁrst several examples of the functional class the robot sees while exploring
the world.
IV. E VALUATION
We evaluate how well our method is able to functionally
classify three classes, “table”, “drinking vessel”, and “sittable”, by testing the classiﬁcation accuracy for a collection
of 3D models. We are able to achieve high accuracy and
recall in identifying objects. We further evaluate function
recognition with real-world data from a Kinect, using the
3D models as training examples.
A. 3D Model Dataset
For testing and evaluation, we created our own database of
200 CAD object models that were individually downloaded

cup-like
table-like
sittable
none

cup-like
10
0
1
2

table-like
0
20
0
8

sittable
0
0
6
0

none
6
2
15
33

sittable
0
0
13
3

none
5
4
6
34

sittable
0
0
18
6

none
4
1
1
29

(a) 20 Training Examples

cup-like
table-like
sittable
none

cup-like
11
0
1
1

table-like
0
18
3
2

(b) 45 Training Examples

cup-like
table-like
sittable
none

cup-like
12
0
1
2

table-like
0
21
2
3

(c) 85 Training Examples

Fig. 4: Confusion matrices showing the function classiﬁcation for the same test set given differing numbers of training
examples. Rows indicate actual membership while columns
denote classiﬁed memberships. As more training examples
are seen, function classiﬁcation improves.

from several free online databases1 . These models were
downloaded as .stl or .3ds ﬁles and were converted into
unordered point clouds by sampling at least one point every
half centimeter along the faces. Models were rotated so that
all objects have the XY plane as ground with the positive
Z-axis pointing up, and were translated to the origin. Models
with incorrect scales were manually adjusted to reasonable
values.
Of the 200 objects acquired, 40 were a type of sittable
object (chairs, sofas, benches, etc), 40 were table-like objects
(dining room tables, desks, etc), and 30 were drinking vessels
(glasses, bowls, etc). The others were household objects
ranging from furniture such as baths, ovens, and bookshelves,
to appliances such as televisions, vacuums, microwaves, and
computers to other knick-knacks such as books, shoes, and
lamps.
B. Recognition on Model Dataset
We evaluate how many training examples are necessary for
our system to accurately classify these functional categories.
Of the three function classes, we expect sittable objects to be
the most challenging because the objects in it vary the most.
For example, drinking vessels tend to be small and similarly
sized, and most tables and desks are of similar height,
although their widths and lengths vary. Sittable objects, on
the other hand, range from desk chairs to sofas to bar
stools, all of which have very different sizes and expected
distributions.
1 http://grabcad.com

and http://archive3D.net

For each test, the dataset was split randomly into equally
sized training and testing sets. As described in Sec. III-C,
roughly 30% of the training set is used for cross validation
to select the best radii. This makes classiﬁcation challenging
until a sufﬁcient number of examples of each function type
have been seen to ensure a positive example appears in both
the training and the cross validation set. The radius with
the highest F-measure on the cross validation set is used to
evaluate the test set. The results can be seen in Fig. 3, where
the total number of training examples of all classes is plotted
against the F-measure for each function class. These results
are an average obtained over 10 trials which used the same
200 objects but varied the test set.
The score is a result of higher recall and lower precision,
as is indicated in the confusion matrices in Fig. 4, where
there are more false positives than false negatives. This
means more objects are identiﬁed as having a given function
than actually do, but most objects that have it are correctly
identiﬁed. The confusion matrices show the membership
classiﬁcations for the test data given 20, 45, and 85 total
training examples, and are from one of the trials averaged
in Fig. 3. The rows represent the function classes the
objects actually belong to, and the columns represent which
functions they are classiﬁed as. Although this dataset has no
overlapping function classes, membership is not restricted
to a single class. Despite not having this restriction, most
objects were predicted to belong to only one class. The
few exceptions are evident in the confusion matrices which
sum to more than 100, as this indicates a single object was
classiﬁed positively by more than one function classiﬁer.
Fig. 5 shows a matrix of function classiﬁcation for each
object in our dataset. Each object was functionally classiﬁed
using the other 199 objects as training examples. Each row
of the matrix represents predicted membership in a given
function class. Additionally, “no predicted memberships”
is indicated in the bottom row. In order for an object to
be classiﬁed as having no functions it must be negatively
classiﬁed by each function classiﬁer, as there is no “none”
classiﬁer. Every column represents a different object from
the dataset, grouped by their labeled functions. These labels
are somewhat subjective as, for example, people sometimes
sit on tables or desks. For this work we chose to label
objects only with their primary functional property. Among
the challenging objects to classify were bar stools, which
have the height of a table and do not have backs like most
chairs, mid-sized ﬁling cabinets, bookshelves, and counters
which were confused for tables (and which are, indeed, large
ﬂat surfaces often put to similar uses as tables), and vases
and shoes which were often confused as drinking vessels.
These results show the proposed feature set can functionally classify objects that belong to a given class, though increased precision may be desired. The results were obtained
using only the features derived from the physical simulation.
One could imagine integrating other feature types, such as
those based on appearance, into the classiﬁer. Such features
are largely orthogonal to our current feature set and would
provide very different information. We hypothesize that a

Fig. 5: Classiﬁcation of each object in the 3D model dataset. This matrix shows the function classiﬁcation for each object in
our dataset, given the other 199 objects. If all objects were classiﬁed correctly, one would expect to see four unbroken lines.
Note that there is no “none” classiﬁer, so correctly identifying an “other” object requires all the other classiﬁers correctly
classifying it as a negative example.

(a) 3D point cloud of a dining room.

(b) Segmentation of dining room

(c) Objects identiﬁed with functional properties.

Fig. 6: Object function classiﬁcation on real world data. Using the point cloud of this room obtained by a Kinect, the table,
bench, and one of the chairs were correctly identiﬁed with their functional properties. Objects identiﬁed with functional
properties have color-coded boxes drawn around them. Cyan boxes indicate objects that are sittable, while magenta indicates
table-like objects.

system that used both types of features would perform even
better.
C. Recognition with Real-World Data
We evaluated our method on real data, obtained with a
Microsoft Kinect, in order to investigate the performance
impact of imperfect models arising from occlusion and noise.
We captured several scenes that contained sittable and tablelike objects, including the dining room shown in Fig. 6 as
well as several ofﬁce environments. Drinking vessels were
not tested because their bottoms would not be observed by
the Kinect and so their simulated counterpart would not be
able to hold any spheres. An assumption that all objects have
a ﬂat bottom even when the bottom cannot be seen by the
robot may help solve this.
The dining room shown in Fig. 6 is representative of how
well functional properties were recognized in all the analyzed
scenes in terms of how well objects are classiﬁed and
some incorrect segmentations. Although an RGB-D scene
is shown in the ﬁrst image, only the depth data is used
for segmentation and classiﬁcation. 3D segmentation is a
challenging problem on its own, and in this work we assume
a correctly segmented scene. For this experiment we use
scenes with few overlapping objects to maximize our simple
segmentation algorithm’s likelihood of segmenting correctly.
A scene is segmented by identifying and removing the ﬂoor
plane using RANSAC [20]. Points are then agglomerated
into objects based on the Euclidean distance between them.
Points within 2 cm of one another are considered part of
the same object. This somewhat large threshold is a result

of the Kinect’s noisy range data, which varies from a few
millimeters up close to 4 cm at the edge of its range [21]. We
believe better segmentation algorithms should have at least
equivalent function classiﬁcation performance.
The point cloud for each segmented object is tested for
membership in the three function categories. The full CAD
model dataset provides training examples for each function
type and is used to select the radius for each function class.
Simulated spheres are dropped onto the point clouds using
the method described. The majority of the segmented objects
in the scenes (which includes several slices of the wall and
the legs of the front-most chair in Fig. 6) were not positively
categorized by any of the function classiﬁers. Objects that
were positively classiﬁed are indicated with a box. The
dining room table was correctly classiﬁed as a table, and
the bench as sittable. The closest chair was too reﬂective
for the Kinect to obtain depth measurements, but the further
chair, while poorly segmented, was correctly identiﬁed as a
sittable surface.
V. C ONCLUSION
This work is a step toward incorporating features discovered through physical simulation into object function
recognition. Additional simulated interactions, such as “dropping” balls from all directions instead of just from above, or
considering multiple objects in conjunction with each other
might create features with strong predictive capabilities. Such
features could supplement other methods of function and
affordance recognition.

In this work we explored the features that can be obtained
for function classiﬁcation and recognition from basic sensory
data beyond the more commonly used camera and depth
data. We show that informative features can be discovered
by simulating physical interactions with an object. We have
begun exploring this concept by simulating bombarding
an object with spheres of differing radii and calculating
the distribution of the spheres that land on it. Using the
distribution of spheres as a feature vector, a classiﬁer is
able to predict whether the object belongs to three different
function classes. We are able to learn the appropriate radius
size for each function classes that leads to high classiﬁcation
accuracy. We obtain reasonable classiﬁcations using a dataset
of 3D models, and further show real-world objects detected
with a Kinect can be correctly functionally classiﬁed using
the models as training examples.
ACKNOWLEDGMENTS
This research was supported by ONR grant # N00014-131-0217.
R EFERENCES
[1] D. G. Lowe, “Object recognition from local scale-invariant features,”
in Proceedings of the seventh IEEE international conference on
Computer vision, vol. 2. Ieee, 1999, pp. 1150–1157.
[2] E. Rivlin, S. J. Dickinson, and A. Rosenfeld, “Recognition by functional parts,” in Proceedings of IEEE Computer Society Conference on
Computer Vision and Pattern Recognition. IEEE, 1994, pp. 267–274.
[3] L. Stark and K. Bowyer, “Generic recognition through qualitative
reasoning about 3-d shape and object function,” in Proceedings of
IEEE Computer Society Conference on Computer Vision and Pattern
Recognition. IEEE, 1991, pp. 251–256.
[4] Z. Duric, J. A. Fayman, and E. Rivlin, “Function from motion,”
Proceedings of IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 18, no. 6, pp. 579–591, 1996.
[5] A. Gupta and L. S. Davis, “Objects in action: An approach for combining action understanding and object perception,” in Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition.
IEEE, 2007, pp. 1–8.
[6] M. Stark, P. Lies, M. Zillich, J. Wyatt, and B. Schiele, “Functional
object class detection based on learned affordance cues,” Computer
Vision Systems, pp. 435–444, 2008.

[7] E. Erdemir, C. B. Frankel, K. Kawamura, S. M. Gordon, S. Thornton,
and B. Ulutas, “Towards a cognitive robot that uses internal rehearsal
to learn affordance relations,” in Proceedings of the IEEE International
Conference on Intelligent Robots and Systems. IEEE, 2008, pp. 2016–
2021.
[8] A. Stoytchev, “Behavior-grounded representation of tool affordances,”
in Proceedings of the 2005 IEEE International Conference on Robotics
and Automation. IEEE, 2005, pp. 3060–3065.
[9] S. Brown and C. Sammut, “An architecture for tool use and learning in
robots,” in Australian Conference on Robotics and Automation, 2007.
[10] J. Sinapov and A. Stoytchev, “Detecting the functional similarities
between tools using a hierarchical representation of outcomes,” in
Proceedings of the IEEE International Conference on Development
and Learning. IEEE, 2008, pp. 91–96.
[11] E. Bicici and R. S. Amant, “Reasoning about the functionality of
¸
tools and physical artifacts,” Department of Computer Science, North
Carolina State University, Tech. Rep. 22, 2003.
[12] T. E. Horton, A. Chakraborty, and R. St. Amant, “Affordances
for robots: a brief survey,” AVANT. Pismo Awangardy FilozoﬁcznoNaukowej, no. 2, pp. 70–84, 2012.
[13] T. E. Horton, L. Williams, W. Mu, and R. S. Amant, “Visual affordances and symmetries in canis habilis: A progress report,” in AAAI
Fall Symposium Technical Report, 2008.
[14] J. Gibson, “The ecological approach to visual perception,” 1979.
[15] E. U˘ ur and E. Sahin, “Traversability: A case study for learning and
g
¸
perceiving affordances in robots,” Adaptive Behavior, vol. 18, no. 3-4,
pp. 258–284, 2010.
[16] S. Grifﬁth, J. Sinapov, V. Sukhoy, and A. Stoytchev, “How to separate
containers from non-containers? a behavior-grounded approach to
acoustic object categorization,” in Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 2010, pp.
1852–1859.
[17] S. Grifﬁth, V. Sukhoy, T. Wegter, and A. Stoytchev, “Object categorization in the sink: Learning behavior–grounded object categories with
water,” in Proceedings of the ICRA Workshop on Semantic Perception,
Mapping and Exploration, 2012.
[18] E. Bar-Aviv and E. Rivlin, “Functional 3d object classiﬁcation using
simulation of embodied agent,” in Proceedings of the British Machine
Vision Conference, 2008, pp. 32–1.
[19] R. Smith, “Open dynamics engine,” accessed July, 2013. [Online].
Available: http://www.ode.org/
[20] M. A. Fischler and R. C. Bolles, “Random sample consensus: a
paradigm for model ﬁtting with applications to image analysis and
automated cartography,” Communications of the ACM, vol. 24, no. 6,
pp. 381–395, 1981.
[21] K. Khoshelham, “Accuracy analysis of kinect depth data,” in ISPRS
workshop on laser scanning, vol. 38, 2011, p. 1.