REFERENCES
1. Ni J, Wang X, Gong T, Xie Y. An improved adaptive ORB-SLAM method for monocular vision robot under dynamic environments. Int J Mach Learn Cyber 2022;13:3821-36.
2. Li J, Xu Z, Zhu D, et al. Bio-inspired intelligence with applications to robotics: a survey. Intell Robot 2021;1:58-83.
3. Ni J, Tang M, Chen Y, Cao W. An improved cooperative control method for hybrid unmanned aerial-ground system in multitasks. Int J Aerosp Eng 2020; doi: 10.1155/2020/9429108.
4. Zhao ZQ, Zheng P, Xu ST, Wu X. Object Detection with Deep Learning: A Review. IEEE Trans Neural Netw Learn Syst 2019;30:3212-32.
5. Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Martinez-Gonzalez P, Garcia-rodriguez J. A survey on deep learning techniques for image and video semantic segmentation. Appl Soft Comput 2018;70:41-65.
6. Wang L, Huang Y. A survey of 3D point cloud and deep learning-based approaches for scene understanding in autonomous driving. IEEE Intell Transport Syst Mag 2022;14:135-54.
7. Naseer M, Khan S, Porikli F. Indoor scene understanding in 2.5/3D for autonomous agents: a survey. IEEE Access 2019;7:1859-87.
8. Zhu M, Ferstera A, Dinulescu S, et al. A peristaltic soft, wearable robot for compression therapy and massage. IEEE Robot Autom 2023;8:4665-72.
9. Sun P, Shan R, Wang S. An intelligent rehabilitation robot with passive and active direct switching training: improving intelligence and security of human-robot interaction systems. IEEE Robot Automat Mag 2023;30:72-83.
10. Wang TM, Tao Y, Liu H. Current researches and future development trend of intelligent robot: a review. Int J Autom Comput 2018;15:525-46.
11. Lowe DG. Distinctive image features from scale-invariant keypoints. Int J comput vis 2004;60:91-110.
12. Zhou H, Yuan Y, Shi C. Object tracking using SIFT features and mean shift. Comput Vis Image Underst 2009;113:345-52.
13. Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J comput vis 2001;42:145-75.
14. Hofmann T. Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 2001;42:177-96.
15. Sarhan S, Nasr AA, Shams MY. Multipose face recognition-based combined adaptive deep learning vector quantization. Comput Intell Neurosci 2020;2020:8821868.
16. Liu B, Wu H, Su W, Zhang W, Sun J. Rotation-invariant object detection using Sector-ring HOG and boosted random ferns. Vis Comput 2018;34:707-19.
17. Wang X, Han TX, Yan S. An HOG-LBP human detector with partial occlusion handling. In: 2009 IEEE 12th International Conference on Computer Vision; 2009 Sep 29 - Oct 02; Kyoto, Japan. IEEE; 2010. p. 32-39.
18. Vailaya A, Figueiredo MA, Jain AK, Zhang HJ. Image classification for content-based indexing. XX 2001;10:117-30.
19. Li LJ, Su H, Xing EP, Fei-Fei L. Object bank: a high-level image representation for scene classification & semantic feature sparsification. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems; 2010. p. 1378–86. Available from:
20. Zhang L, Li W, Yu L, Sun L, Dong X, Ning X. GmFace: an explicit function for face image representation. Displays 2021;68:102022.
21. Ning X, Gong K, Li W, Zhang L, Bai X, et al. Feature refinement and filter network for person re-identification. IEEE Trans Circuits Syst Video Technol 2021;31:3391-402.
22. Ni J, Chen Y, Chen Y, Zhu J, Ali D, Cao W. A survey on theories and applications for self-driving cars based on deep learning methods. Appl Sci 2020;10:2749.
23. Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence; 2012 Jun 16-21; RI, USA. IEEE; 2012. p. 3354-61.
24. Caesar H, Bankiti V, Lang AH, et al. nuScenes: a multimodal dataset for autonomous driving. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020 Jun 13-19; Seattle, WA, USA. IEEE; 2020. p. 11618-28.
25. Hinterstoisser S, Lepetit V, Ilic S, et al. Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee KM, Matsushita Y, Rehg JM, Hu Z, editors. Computer Vision - ACCV; 2013. p. 548-62.
26. Piga NA, Onyshchuk Y, Pasquale G, Pattacini U, Natale L. ROFT: Real-time tptical flow-aided 6D object pose and velocity tracking. IEEE Robot Autom Lett 2022;7:159-66.
27. Everingham M, Gool LV, Williams CKI, Winn JM, Zisserman A. The pascal visual object classes (VOC) challenge. Int J Comput Vis 2010;88:303-38.
28. Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27-30; Las Vegas, NV, USA. IEEE; 2016. p. 3213-23.
29. Wang W, Shen J, Guo F, Cheng MM, Borji A. Revisiting video saliency: a large-scale benchmark and a new model. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018 Jun 18-23; Salt Lake City, UT, USA. IEEE; 2018. p. 4894-903.
30. Kristan M, Leonardis A, Matas J, et al. The visual object tracking VOT2017 challenge results. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW); 2017 Oct 22-29; Venice, Italy. IEEE; 2017. p. 1949-72.
31. Ni J, Shen K, Chen Y, Yang SX. An improved SSD-like deep network-based object detection method for indoor scenes. IEEE Trans Instrum Meas 2023;72:1-15.
32. Qian R, Lai X, Li X. BADet: boundary-aware 3D object detection from point clouds. Pattern Recognit 2022;125:108524.
33. Shi S, Wang Z, Shi J, Wang X, Li H. From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE Trans Pattern Anal Mach Intell 2021;43:2647-64.
34. Li Y, Ma L, Zhong Z, Cao D, Li J. TGNet: geometric graph CNN on 3-D point cloud segmentation. IEEE Trans Geosci Remote Sens 2020;58:3588-600.
36. Qi CR, Liu W, Wu C, Su H, Guibas LJ. Frustum pointnets for 3D object detection from RGB-D data. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018 Jun 18-23; Salt Lake City, UT, USA; IEEE; 2018. p. 918-27.
37. Wang Z, Jia K. Frustum convNet: sliding frustums to aggregate local point-wise features for amodal 3D object detection. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2019 Nov 03-08; Macau, China. IEEE; 2019. p. 1742-49.
38. Chen Y, Liu S, Shen X, Jia J. Fast point R-CNN. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV); 2019 Oct 27 - Nov 02; Seoul, Korea. IEEE; 2019. p. 9774-83.
39. He C, Zeng H, Huang J, Hua XS, Zhang L. Structure aware single-stage 3D object detection from point cloud. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020 Jun 13-19; Seattle, WA, USA. IEEE; 2020. p. 11870-9.
40. Liu Z, Zhao X, Huang T, Hu R, Zhou Y, Bai X. TANet: robust 3D object detection from point clouds with triple attention. In: 34th AAAI Conference on Artificial Intelligence, AAAI; 2020 Feb 7-12; New York, NY, United states. California: AAAI; 2020. p. 11677-84.
41. Yin T, Zhou X, Krahenbuhl P. Center-based 3D Object Detection and Tracking. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021. Virtual, Online, United states; 2021. pp. 11779 – 11788.
42. Wang H, Shi S, Yang Z, et al. RBGNet: ray-based Grouping for 3D Object Detection. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022 Jun 18-24; New Orleans, LA, USA. IEEE; 2022. p. 1100-09.
43. Chen Y, Ni J, Tang G, Cao W, Yang SX. An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images. Multimed Tools Appl 2023; doi: 10.1007/s11042-023-15845-5.
44. Hoang DC, Stork JA, Stoyanov T. Voting and attention-based pose relation learning for object pose estimation from 3D point clouds. IEEE Robot Autom Lett 2022;7:8980-7.
45. Yue K, Sun M, Yuan Y, Zhou F, Ding E, Xu F. Compact generalized non-local network. arXiv. [Preprint.] November 1, 2018. Available from:
46. Chen H, Wang P, Wang F, Tian W, Xiong L, Li H. Epro-pnp: Generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation. arXiv. [Preprint.] August 11, 2022 Available from:
47. Moon G, Chang JY, Lee KM. V2V-poseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018 Jun 18-23; Salt Lake City, UT, USA. IEEE; 2018. p. 5079-88.
48. Li Z, Wang G, Ji X. CDPN: coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV); 2019 Oct 27 - Nov 02; Seoul, Korea (South). IEEE; 2020. p. 7677-86.
49. Wang H, Sridhar S, Huang J, Valentin J, Song S, Guibas LJ. Normalized object coordinate space for category-level 6D object pose and size estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15-20; Long Beach, CA, USA. IEEE; 2020. p. 2637-46.
50. Yu X, Zhuang Z, Koniusz P, Li H. 6DoF object pose estimation via differentiable proxy voting loss. arXiv. [Preprint.] Febuary 11, 2020. Available from:
51. Chen W, Jia X, Chang HJ, Duan J, Leonardis A. G2L-net: global to local network for real-time 6D pose estimation with embedding vector features. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020 Jun 13-19; Seattle, WA, USA. IEEE; 2020. p. 4232-41.
52. He Y, Sun W, Huang H, Liu J, Fan H, Sun J. PVN3D: a deep point-wise 3D keypoints voting network for 6DoF pose estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020 Jun 13-19 Seattle, WA, USA. IEEE; 2020. pp. 11629-38.
53. He Y, Huang H, Fan H, Chen Q, Sun J. FFB6D: a full flow bidirectional fusion network for 6D pose estimation. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021 Jun 20-25; Nashville, TN, USA. IEEE; 2021. p. 3002-12.
54. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. arXiv. [Preprint.] November 14, 2014. Available from:
55. Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 2017;39:2481-95.
56. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv. [Preprint.] December 22, 2014. Available from:
57. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 2017;40:834-48.
58. Lin G, Milan A, Shen C, Reid I. Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: 2017IEEE conference on computer vision and pattern recognition (CVPR); 2017. p. 1925-34.
59. Zeng L, Zhang S, Wang P, Li Z, Hu Y, Xie T. Defect detection algorithm for magnetic particle inspection of aviation ferromagnetic parts based on improved DeepLabv3+. Meas Sci Technol 2023;34: 065401. Measurement Science and Technology 2023;34: 065401.
60. Yin R, Cheng Y, Wu H, Song Y, Yu B, Niu R. Fusionlane: multi-sensor fusion for lane marking semantic segmentation using deep neural networks. IEEE Trans Intell Transport Syst 2022;23:1543-53.
61. Hu P, Perazzi F, Heilbron FC, et al. Real-time semantic segmentation with fast attention. IEEE Robot Autom Lett 2021;6:263-70.
62. Sun Y, Zuo W, Yun P, Wang H, Liu M. FuseSeg: Semantic segmentation of urban scenes based on RGB and thermal data fusion. IEEE Trans Automat Sci Eng 2021;18:1000-11.
63. Yang M, Yu K, Zhang C, Li Z, Yang K. DenseASPP for semantic segmentation in street scenes. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018 Jun 18-23; Salt Lake City, UT, USA. IEEE; 2018. p. 3684-92.
64. Zhang H, Dana K, Shi J, et al. Context encoding for semantic segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018 Jun 18-23; Salt Lake City, UT, USA. IEEE; 2018. p. 7151-60.
65. Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15-20; Long Beach, CA, USA. IEEE; 2020. p. 3141-9.
66. He J, Deng Z, Zhou L, Wang Y, Qiao Y. Adaptive pyramid context network for semantic segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15-20; Long Beach, CA, USA. IEEE; 2020. p. 7511-20.
67. Zhang C, Lin G, Liu F, Yao R, Shen C. CANet: class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15-20; Long Beach, CA, USA. IEEE; 2020. p. 5212-21.
68. Liu J, He J, Zhang J, Ren JS, Li H. EfficientFCN: holistically-guided decoding for semantic segmentation. In: Vedaldi A, Bischof H, Brox T, Frahm JM, editors. Computer Vision – ECCV 2020. Cham: Springer; 2020. p. 1-17.
69. Ha Q, Watanabe K, Karasawa T, Ushiku Y, Harada T. MFNet: towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS); 2017 Sep 24-28; Vancouver, BC, Canada. IEEE; 2017. p. 5108-15.
70. Cheng B, Schwing AG, Kirillov A. Per-pixel classification is not all you need for semantic segmentation. Signal Process Image Commun 2021;88:17864-75.
71. Zhou W, Yue Y, Fang M, Qian X, Yang R, Yu L. BCINet: bilateral cross-modal interaction network for indoor scene understanding in RGB-D images. Inf Fusion 2023;94:32-42.
72. Lou J, Lin H, Marshall D, Saupe D, Liu H. TranSalNet: towards perceptually relevant visual saliency prediction. Neurocomputing 2022;494:455-67.
73. Judd T, Ehinger K, Durand F, Torralba A. Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision; 2009 Sep 29 - Oct 02; Kyoto, Japan. IEEE; 2010. p. 2106-13.
74. Ishikura K, Kurita N, Chandler DM, Ohashi G. Saliency detection based on multiscale extrema of local perceptual color differences. IEEE Trans Image Process 2018;27:703-17.
75. Zou W, Zhuo S, Tang Y, Tian S, Li X, Xu C. STA3D: spatiotemporally attentive 3D network for video saliency prediction. Pattern Recognit Lett 2021;147:78-84.
76. Wang W, Shen J, Dong X, Borji A. Salient object detection driven by fixation prediction. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018 Jun 18-23; Salt Lake City, UT, USA. IEEE; 2018. p. 1711-20.
77. Huang R, Xing Y, Wang Z. RGB-D salient object detection by a CNN with multiple layers fusion. IEEE Signal Process Lett 2019;26:552-6.
78. Wang N, Gong X. Adaptive fusion for RGB-D salient object detection. IEEE Access 2019;7:55277-84.
79. Zhang J, Yu M, Jiang G, Qi Y. CMP-based saliency model for stereoscopic omnidirectional images. Digit Signal Process 2020;101:102708.
80. Fang Y, Zhang C, Min X, et al. DevsNet: deep video saliency network using short-term and long-term cues. Pattern Recognit 2020;103:107294.
81. Li F, Zheng J, fang Zhang Y, Liu N, Jia W. AMDFNet: adaptive multi-level deformable fusion network for RGB-D saliency detection. Neurocomputing 2021;465:141-56.
82. Lee H, Kim S. SSPNet: learning spatiotemporal saliency prediction networks for visual tracking. Inf Sci 2021;575:399-416.
83. Xue H, Sun M, Liang Y. ECANet: explicit cyclic attention-based network for video saliency prediction. Neurocomputing 2022;468:233-44.
84. Zhang N, Nex F, Kerle N, Vosselman G. LISU: low-light indoor scene understanding with joint learning of reflectance restoration. SPRS J Photogramm Remote Sens 2022;183:470-81.
85. Tang G, Ni J, Chen Y, Cao W, Yang SX. An improved cycleGAN based model for low-light image enhancement. IEEE Sensors J 2023; doi: 10.1109/JSEN.2023.3296167.
86. He J, Li M, Wang Y, Wang H. OVD-SLAM: an online visual SLAM for dynamic environments. IEEE Sensors J 2023;23:13210-9.
87. Lu X, Sun H, Zheng X. A feature aggregation convolutional neural network for remote sensing scene classification. IEEE Trans Geosci Remote Sens 2019;57:7894-906.
88. Ma D, Tang P, Zhao L. SiftingGAN: generating and sifting labeled samples to improve the remote sensing image scene classification baseline in vitro. IEEE Geosci Remote Sens Lett 2019;16:1046-50.
89. Zhang X, Qiao Y, Yang Y, Wang S. SMod: scene-specific-prior-based moving object detection for airport apron surveillance systems. IEEE Intell Transport Syst Mag 2023;15:58-69.
90. Tang G, Ni J, Shi P, Li Y, Zhu J. An improved ViBe-based approach for moving object detection. Intell Robot 2022;2:130-44.
91. Lee CY, Badrinarayanan V, Malisiewicz T, Rabinovich A. Roomnet: end-to-end room layout estimation. arXiv. [Preprint.] August 7, 2017. Available from:
92. Hsiao CW, Sun C, Sun M, Chen HT. Flat2layout: Flat representation for estimating layout of general room types. arXiv. [Preprint.] May 29, 2019. Available from:
93. Sarhan S, Nasr AA, Shams MY. Multipose face recognition-based combined adaptive deep learning vector quantization. Comput Intell Neurosci 2020;2020:8821868.
94. Rublee E, Rabaud V, Konolige K, Bradski G. ORB: an efficient alternative to SIFT or SURF. In: 2011 International conference on computer vision; 2011 Nov 06-13; Barcelona, Spain. IEEE; 2012. p. 2564-71.
95. Wang K, Ma S, Ren F, Lu J. SBAS: salient bundle adjustment for visual SLAM. IEEE Trans Instrum Meas 2021;70:1-9.
96. Ni J, Gong T, Gu Y, Zhu J, Fan X. An improved deep residual network-based semantic simultaneous localization and mapping method for monocular vision robot. Comput Intell Neurosci 2020;2020:7490840.
97. Fu Q, Yu H, Wang X, et al. Fast ORB-SLAM without keypoint descriptors. IEEE Trans Image Process 2022;31:1433-46.
98. Engel J, Schöps T, Cremers D. LSD-SLAM: large-scale direct monocular SLAM. In: Fleet D, Pajdla T, Schiele B, Tuytelaars, editors. Computer Vision – ECCV 2014. Cham: Springer; 2014. p. 834-49.
100. Wang Y, Zhang S, Wang J. Ceiling-view semi-direct monocular visual odometry with planar constraint. Remote Sens 2022;14:5447.
101. Forster C, Zhang Z, Gassner M, Werlberger M, Scaramuzza D. SVO: semidirect visual odometry for monocular and multicamera systems. IEEE Trans Robot 2017;33:249-65.
102. Chen Y, Ni J, Mutabazi E, Cao W, Yang SX. A variable radius side window direct SLAM method based on semantic information. Comput Intell Neurosci 2022;2022:4075910.
103. Liu L. Image classification in htp test based on convolutional neural network model. Comput Intell Neurosci 2021;2021:6370509.
104. Zheng D, Li L, Zheng S, et al. A defect detection method for rail surface and fasteners based on deep convolutional neural network. Comput Intell Neurosci 2021;2021:2565500.
105. Gao X, Wang R, Demmel N, Cremers D. LDSO: direct sparse odometry with loop closure. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2018 Oct 01-05; Madrid, Spain. IEEE; 2019. p. 2198-204.
106. Tang C, Zheng X, Tang C. Adaptive discriminative regions learning network for remote sensing scene classification. Sensors 2023;23:1-5.
107. Song Y, Feng W, Dauphin G, Long Y, Quan Y, Xing M. Ensemble alignment subspace adaptation method for cross-scene classification. IEEE Geosci Remote Sensing Lett 2023;20:1-5.
108. Zhu S, Wu C, Du B, Zhang L. Adversarial divergence training for universal cross-scene classification. IEEE Trans Geosci Remote Sens 2023;61:1-12.
109. Ni J, Shen K, Chen Y, Cao W, Yang SX. An improved deep network-based scene classification method for self-driving cars. IEEE Trans Instrum Meas 2022;71:1-14.
110. Mohapatra RK, Shaswat K, Kedia S. Offline handwritten signature verification using CNN inspired by inception V1 architecture. In: 2019 Fifth International Conference on Image Information Processing (ICIIP); 2019 Nov 15-17; Shimla, India. IEEE; 2020. p. 263-7.
111. McCall R, McGee F, Mirnig A, et al. A taxonomy of autonomous vehicle handover situations. Transp Res Part A Policy Pract 2019;124:507-22.
112. Wang L, Guo S, Huang W, Xiong Y, Qiao Y. Knowledge guided disambiguation for large-scale scene classification with multi-resolution CNNs. IEEE Trans Image Process 2017;26:2055-68.
113. Hosny KM, Kassem MA, Fouad MM. Classification of skin lesions into seven classes using transfer learning with AlexNet. J Digit Imaging 2020;33:1325-34.