Object recognition
Our object recognition comes from two different modalities, namely 3D based object recognition and 2D object detection and recognition.
3D object recognition models
Our 3D object recognition node uses segmented point clouds described in 3D object segmentation as the input to the models. These segemented point clouds are published from mir_object_recognition node.
The tutorial for training the model is described in Training models.
We use two models for the 3D object recognition, namely:
Random forest with Radial density distribution and 3D modified Fisher vector (3DmFV) as features as described in our paper.
Dynamic Graph CNN: an end-to-end point cloud classification. However, in addition to points, we also incorporate colors as inputs.
You can change the classifier in the launch file
0<?xml version="1.0"?>
1<launch>
2
3 <arg name="camera_name" default="arm_cam3d" />
4 <arg name="input_pointcloud_topic" default="/$(arg camera_name)/depth_registered/points" />
5 <arg name="target_frame" default="odom" />
6 <arg name="model" default="cnn_based" />
7 <arg name="model_id" default="dgcnn" />
8 <arg name="dataset" default="all" />
9 <arg name="model_dir" default="$(find mir_pointcloud_object_recognition_models)/common/models/$(arg model_id)/$(arg dataset)" />
10
11 <group ns="mir_perception">
12 <node pkg="mir_object_recognition" type="pc_object_recognizer_node" name="pc_object_recognizer_node" output="screen"
13 respawn="false" ns="multimodal_object_recognition/recognizer/pc">
14 <param name="model" value="$(arg model)" type="str" />
15 <param name="model_id" value="$(arg model_id)" type="str" />
16 <param name="model_dir" value="$(arg model_dir)" type="str" />
17 <param name="dataset" value="$(arg dataset)" type="str" />
18 <remap from="~input/object_list" to="/mir_perception/multimodal_object_recognition/recognizer/pc/input/object_list" />
19 <remap from="~output/object_list" to="/mir_perception/multimodal_object_recognition/recognizer/pc/output/object_list"/>
20 </node>
21 </group>
22
23</launch>
Where:
model: whether it is CNN based (cnn_based) or traditional ML estimators (feature_based)
model_id: the actual name of the model, available model ids:
cnn_based: dgcnn
feature_based: fvrdd
dataset: the dataset name where the model was trained on
2D object recognition models
We use squeezeDet for out 2D object detection model. This is lightweight, one-shot object detection and classification. The model can be changed in the rgb_object_recognition.launch
0<?xml version="1.0"?>
1<launch>
2 <arg name="net" default="detection" />
3 <arg name="classifier" default="yolov7" />
4 <arg name="dataset" default="ss22_local_competition" />
5 <arg name="model_dir" default="$(find mir_rgb_object_recognition_models)/common/models/$(arg classifier)/$(arg dataset)" />
6
7 <group ns="mir_perception">
8 <node pkg="mir_object_recognition" type="rgb_object_recognizer_node" name="rgb_object_recognizer" output="screen"
9 respawn="false" ns="multimodal_object_recognition/recognizer/rgb">
10 <param name="net" value="$(arg net)" type="str" />
11 <param name="classifier" value="$(arg classifier)" type="str" />
12 <param name="dataset" value="$(arg dataset)" type="str" />
13 <param name="model_dir" value="$(arg model_dir)" type="str" />
14 <remap from="~input/images" to="/mir_perception/multimodal_object_recognition/recognizer/rgb/input/images" />
15 <remap from="~output/object_list" to="/mir_perception/multimodal_object_recognition/recognizer/rgb/output/object_list"/>
16 </node>
17 </group>
18</launch>
Where:
classifier: the model used to detect and classify objects
dataset: the dataset used to train the model
Multimodal object recognition
multimodal_object_recognition_node coordinates the whole perception pipeline as described in the following items:
Subscribes to rgb and point cloud topics
Transforms point cloud to the target fram
Finds 3D object clusters from the point cloud using mir_object_segementation
Sends the 3D clusters to point cloud object recognizer (pc_object_recognizer_node)
Sends the image to rgb object detection and recognition node (rgb_object_recognized_node)
Waits until it gets results from both classifiers or if the timeout is reached
Posts processing of the recognized objects
Applies filters for the objects
Sends object_list to object_list_merger
Trigger multimodal_object_recognition
rostopic pub /mir_perception/multimodal_object_recognition/event_in std_msgs/String e_start
Outputs
/mcr_perception/object_detector/object_list /mir_perception/multimodal_object_recognition/output/workspace_height
Visualization outputs
/mir_perception/multimodal_object_recognition/output/bounding_boxes /mir_perception/multimodal_object_recognition/output/debug_cloud_plane /mir_perception/multimodal_object_recognition/output/pc_labels /mir_perception/multimodal_object_recognition/output/pc_object_pose_array /mir_perception/multimodal_object_recognition/output/rgb_labels /mir_perception/multimodal_object_recognition/output/rgb_object_pose_array /mir_perception/multimodal_object_recognition/output/tabletop_cluster_pc /mir_perception/multimodal_object_recognition/output/tabletop_cluster_rgb