HybridNet: Indoor Segmentation with Range and Voxel Fusion |
Sangsik Yeom(Seoul National University of Science and Technology, Korea), Jongeun Ha(Institution, Korea) |
In this paper, we propose a HybridNet that improves performance by fusing 2D and 3D features. A voxel-based method and a projection-based method were adopted to derive the results through one scan. Our approach consists of two parallel networks, extracts features along each dimension, and converges them in a Fusion Network. In the fusion network, the voxel blocks and 2D feature maps extracted from each structure are fused to the voxel grid and then trained through convolution. For effective training of 2D networks, we use data augmentation techniques using coordinate system rotation transformation. |
|
Visual Surveillance Transformer |
Keoung Hun Choi, Jong Eun Ha(Seoul National University of Science and Technology, Korea) |
In the case of the unmanned surveillance system field, even if it is the same object, the detection result will
be different depending on the state of the object and the configuration of the surrounding environment. Therefore, artificial
intelligence for unmanned surveillance needs to understand the environment on the image, understand the state of the
object within the image, and understand the relationship between them. For this purpose, in this study, a transformed
transformer structure that can receive a single image, which is 2D data, as an input, unlike splitting one image into a
certain size and using it as an input, is presented, and the effect between neighboring pixels is considere |
|
Segmentation Applying TAG Type Label Data and Transformer |
Keoung Hun Choi, Jong-Eun Ha(Seoul National University of Science and Technology, Korea) |
In this paper, to improve this point, a transformed
transformer structure is applied to improve the
performance of segmentation, and it is proposed to use
data in a format different from the existing label data. By
using a single image as an input, there is no loss of
location information, and a lighter model is presented by
obtaining a segmentation image without going through a
separate process. At the same time, in order to improve
generalization performance, a method of assigning one
label to one characteristic rather than assigning one label
to one object was applied to the composition of the label
data, and the difference in generalization ability was
compared. |
|
Scene Text Recognition with Multi-decoders |
Wang Yao(Seoul National University of Science and Technology, Korea), Jong-Eun Ha(Seoul National University of Science&Technology, Korea) |
In this article, we focus on the scene text recognition problem, which is one of the challenging sub-files of
computer vision because of the random existence of scene text. Recently, scene text recognition has achieved state-of-art performance because of the improvement of deep learning. Specifically, at the decoder part, connectionist temporal classification(CTC), attention mechanism, and transformer(self-attention) are three main approaches used in recent research. a novel decoder mechanism is introduced in our study. |
|
Hyperspectral Imaging Labelling Tool using Preprocessing and Edge Detection |
Sangho Jo, Sungho Kim(Yeungnam University, Korea) |
Hyperspectral imaging data has hundreds of data for each pixel. The data present in each pixel has an independent presence, for which reason ground truth for deep learning training consists in segmentation format. Therefore, the class must be assigned to each pixel, so the data labelling must be refined. We felt the need for a labelling tool suitable for hyperspectral data because the current segmentation labelling tool is designed as a technique for specifying areas directly by humans. For this reason, in this paper, we considered the relationship of light reflection between light sources and objects, and created a hyperspectral labelling tool. |
| |