Object recognition

Our object recognition comes from two different modalities, namely 3D based object recognition and 2D object detection and recognition.

3D object recognition models

Our 3D object recognition node uses segmented point clouds described in 3D object segmentation as the input to the models. These segemented point clouds are published from mir_object_recognition node.

The tutorial for training the model is described in Training models.

We use two models for the 3D object recognition, namely:

  • Random forest with Radial density distribution and 3D modified Fisher vector (3DmFV) as features as described in our paper.

  • Dynamic Graph CNN: an end-to-end point cloud classification. However, in addition to points, we also incorporate colors as inputs.

You can change the classifier in the launch file

 0<?xml version="1.0"?>
 1<launch>
 2
 3  <arg name="camera_name" default="arm_cam3d" />
 4  <arg name="input_pointcloud_topic"  default="/$(arg camera_name)/depth_registered/points" />
 5  <arg name="target_frame" default="odom" />
 6  <arg name="model" default="cnn_based" />
 7  <arg name="model_id" default="dgcnn" />
 8  <arg name="dataset" default="all" />
 9  <arg name="model_dir" default="$(find mir_pointcloud_object_recognition_models)/common/models/$(arg model_id)/$(arg dataset)" />
10
11  <group ns="mir_perception">
12    <node pkg="mir_object_recognition" type="pc_object_recognizer_node" name="pc_object_recognizer_node" output="screen" 
13        respawn="false" ns="multimodal_object_recognition/recognizer/pc">
14      <param name="model" value="$(arg model)" type="str" />
15      <param name="model_id" value="$(arg model_id)" type="str" />
16      <param name="model_dir" value="$(arg model_dir)" type="str" />
17      <param name="dataset" value="$(arg dataset)" type="str" />
18      <remap from="~input/object_list" to="/mir_perception/multimodal_object_recognition/recognizer/pc/input/object_list" />
19      <remap from="~output/object_list" to="/mir_perception/multimodal_object_recognition/recognizer/pc/output/object_list"/>
20    </node>
21  </group>
22
23</launch>

Where:

  • model: whether it is CNN based (cnn_based) or traditional ML estimators (feature_based)

  • model_id: the actual name of the model, available model ids:

    • cnn_based: dgcnn

    • feature_based: fvrdd

  • dataset: the dataset name where the model was trained on

2D object recognition models

We use squeezeDet for out 2D object detection model. This is lightweight, one-shot object detection and classification. The model can be changed in the rgb_object_recognition.launch

 0<?xml version="1.0"?>
 1<launch>
 2  <arg name="net" default="detection" />
 3  <arg name="classifier" default="yolov7" />
 4  <arg name="dataset" default="ss22_local_competition" />
 5  <arg name="model_dir" default="$(find mir_rgb_object_recognition_models)/common/models/$(arg classifier)/$(arg dataset)" />
 6
 7  <group ns="mir_perception">
 8    <node pkg="mir_object_recognition" type="rgb_object_recognizer_node" name="rgb_object_recognizer" output="screen" 
 9        respawn="false" ns="multimodal_object_recognition/recognizer/rgb">
10      <param name="net" value="$(arg net)" type="str" />
11      <param name="classifier" value="$(arg classifier)" type="str" />
12      <param name="dataset" value="$(arg dataset)" type="str" />
13      <param name="model_dir" value="$(arg model_dir)" type="str" />
14      <remap from="~input/images" to="/mir_perception/multimodal_object_recognition/recognizer/rgb/input/images" />
15      <remap from="~output/object_list" to="/mir_perception/multimodal_object_recognition/recognizer/rgb/output/object_list"/>
16    </node>
17  </group>
18</launch>

Where:

  • classifier: the model used to detect and classify objects

  • dataset: the dataset used to train the model

Multimodal object recognition

multimodal_object_recognition_node coordinates the whole perception pipeline as described in the following items:

  • Subscribes to rgb and point cloud topics

  • Transforms point cloud to the target fram

  • Finds 3D object clusters from the point cloud using mir_object_segementation

  • Sends the 3D clusters to point cloud object recognizer (pc_object_recognizer_node)

  • Sends the image to rgb object detection and recognition node (rgb_object_recognized_node)

  • Waits until it gets results from both classifiers or if the timeout is reached

  • Posts processing of the recognized objects

    • Applies filters for the objects

  • Sends object_list to object_list_merger

Trigger multimodal_object_recognition

rostopic pub /mir_perception/multimodal_object_recognition/event_in std_msgs/String e_start

Outputs

/mcr_perception/object_detector/object_list
/mir_perception/multimodal_object_recognition/output/workspace_height

Visualization outputs

/mir_perception/multimodal_object_recognition/output/bounding_boxes
/mir_perception/multimodal_object_recognition/output/debug_cloud_plane
/mir_perception/multimodal_object_recognition/output/pc_labels
/mir_perception/multimodal_object_recognition/output/pc_object_pose_array
/mir_perception/multimodal_object_recognition/output/rgb_labels
/mir_perception/multimodal_object_recognition/output/rgb_object_pose_array
/mir_perception/multimodal_object_recognition/output/tabletop_cluster_pc
/mir_perception/multimodal_object_recognition/output/tabletop_cluster_rgb