AprilTags论文解读_华清远见教育科技集团

當前位置：首頁 > 嵌入式培訓 > 嵌入式學習 > 講師博文 > AprilTags論文解讀

AprilTags論文解讀(du) 時(shi)間：2018-01-11 來(lai)源：未知

一、Apriltag是改進(jin)的ARToolkit、 ARTag 。

1.1 ARToolkit的劣勢：

A major disadvantage of this approach is the computational cost associated with decoding tags, since each template required a separate, slow correlation operation. A second disadvantage is that it is difficult to generate templates that are approximately orthogonal to each other.

主要意(yi)思是說：第一個劣(lie)勢(shi)每個模板都是獨立(li)的(de)所有校(xiao)正操作非常(chang)的(de)慢，第二個劣(lie)勢(shi)是說為每一個合(he)適正交直(zhi)線的(de)圖(tu)像創建模板是非常(chang)的(de)困難(nan)。

The tag detection scheme used by ARToolkit is based on a simple binarization of the input image based on a userspecified threshold.

這是因(yin)為(wei)在tag獲取的(de)時候(hou)只是通過用戶給定的(de)一個閾值得到一個簡單二值化(hua)圖(tu)像。

This scheme is very fast, but not robust to changes in illumination.

這種方(fang)法很(hen)快，但是在改變光強(qiang)的時候(hou)就不實用。

In general, ARToolkit’s detections can not handle even modest occlusions of the tag’s border.

通常，ARToolkit也不能出來(lai)有適當遮(zhe)擋(dang)的標簽邊緣。

1.2 ARTag 對ARToolkit的改進：

the detection mechanism was based on the image gradient, making it robust to changes in lighting.

使用圖像的梯度(du)來獲取(qu)tag，這(zhe)樣讓他在(zai)光照的改變上更加的實用。

While the details of the detector algorithm are not public, ARTag’s detection mechanism is able to detect tags whose border is partially occluded.

ARTag 的詳細的獲取算法不公開(kai)，并且他可以(yi)獲取tag邊緣被部分(fen)閉(bi)塞。

ARTag also provided the first coding system based on forward error correction, which

made tags easier to generate, faster to correlate, and provided greater orthogonality between tags.

ARTag 提供第一個(ge)(ge)向(xiang)前(qian)糾錯的解碼系統，這(zhe)個(ge)(ge)讓(rang)tag容易(yi)產生，快速(su)糾錯，也提供更好的算法。

二、獲取tags(Detector)

2.1 整體(ti)描述：

we describe the detector whose job is to estimate the position of possible tags in an image. Loosely speaking, the detector attempts to find four-sided regions (“quads”) that have a darker interior than their exterior. The tags themselves have black and white borders in order to facilitate this.

尋找場景中可(ke)能的tag圖像，即嘗試著尋找內“黑(hei)”外“白”的四邊形，并且為了(le)好識別(bie)tag本身(shen)有黑(hei)白的邊緣特(te)征(zheng)。如(ru)下(xia)圖。

2.2 獲取線(xian)段(Detecting line segments )

Our approach begins by detecting lines in the image. Our approach, similar in basic approach to the ARTag detector, computes the gradient direction and magnitude at every pixel and agglomeratively clusters the pixels into components with similar gradient directions and magnitudes.

大(da)概意思(si)是說，類似于ARTag 的(de)(de)獲取方(fang)法，即計算tag的(de)(de)每一(yi)個像素(su)點(dian)的(de)(de)梯(ti)度方(fang)向和(he)幅值，并且(qie)把相(xiang)同的(de)(de)梯(ti)度方(fang)向和(he)幅值得像素(su)集群(qun)到(dao)一(yi)個部件中。

2.3 之(zhi)前(qian)的方法(Early processing steps)

First：The tag detection algorithm begins by computing the gradient at every pixel, computing their magnitudes (通過計算像素的梯度得(de)到幅值圖像)。

Second：gradient direction(得到梯度方向)

Third：similar gradient directions and magnitude are clustered into components(相似(si)的(de)梯度方向(xiang)和幅值集(ji)群到一(yi)個組件)

集群算法：

The clustering algorithm is similar to the graph-based method of Felzenszwalb : a graph is created in which each node represents a pixel.

使用(yong)類(lei)似于Felzenszwalb集群算法(fa)，每一個節點node來代表一個像素(su)。

算法描述：

Edges are added between adjacent pixels with an edge weight equal to the pixels’ difference in gradient direction. These edges are then sorted and processed in terms of increasing edge weight: for each edge, we test whether the connected components that the pixels belong to should be joined together.

邊緣被(bei)添加是(shi)通過(guo)臨近的不同(tong)的像素梯度方向的邊緣權重(zhong)。這些邊緣在增長(chang)邊緣權重(zhong)方面被(bei)分類和處理：為(wei)了每(mei)個(ge)邊緣，測(ce)試像素屬于應(ying)該被(bei)集群的像素是(shi)否連(lian)接組(zu)件。

算法問題：

This gradient-based clustering method is sensitive to noise in the image: even modest amounts of noise will cause local gradient directions to vary, inhibiting the growth of the components. The solution to this problem is to low-pass filter the image.

算法(fa)(fa)對于噪(zao)聲集群方法(fa)(fa)很敏感，甚至適當(dang)的噪(zao)聲會導致局部梯度不同(tong)，約束部件增長。解決方案的問(wen)題可以通過低(di)通濾(lv)波(bo)。

Unlike other problem domains where this filtering can blur useful information in the image, the edges of a tag are intrinsically large-scale features (particularly in comparison to the data field), and so this filtering does not cause information loss. We recommend a value of σ = 0.8.

不像其他(ta)問題域，這個(ge)濾波(bo)會(hui)模糊一(yi)些有用的信息，tag的邊緣(yuan)本質上是一(yi)個(ge)很大的特征(zheng)，所以濾波(bo)不會(hui)導致(zhi)信息丟失，建議(yi)設置值(zhi)為0.8。

Fourth：Using weighted least squares, a line segment is then fit to the pixels in each component.(使用加權小(xiao)二乘(cheng)法,一條線(xian)段就適合每(mei)個(ge)組件的像素。)

The direction of the line segment is determined by the gradient direction, so that segments are dark on the left, light on the right. The direction of the lines are visualized by short perpendicular “notches” at their midpoint; note that these “notches” always point towards the lighter region.

線段的方向(xiang)通過梯度的方向(xiang)來(lai)(lai)決定，因此線段的左邊是(shi)暗部，右邊是(shi)亮(liang)部。線段的方向(xiang)在線段的中部短的垂直槽口來(lai)(lai)直觀表示，注意(yi)這(zhe)些槽口總是(shi)指(zhi)向(xiang)亮(liang)得區域。

2.4 獲(huo)取線(xian)段的(de)總(zong)結(jie)

The segmentation algorithm is the slowest phase in our detection scheme. As an option, this segmentation can be performed at half the image resolution with a 4x improvement in speed. The sub-sampling operation can be efficiently combined with the recommended low-pass filter. The consequence of this optimization is a modestly reduced detection range, since very small quads may no longer be detected.

分(fen)割算(suan)法(fa)是慢的在獲取(qu)方案中，作為一個(ge)選項，這種(zhong)分(fen)割可以在一半的圖像分(fen)辨率(lv)提升了(le)4倍的速度。二級抽樣操作推(tui)薦與低通(tong)濾波器(qi)結(jie)合(he)能增加效(xiao)率(lv)。有效(xiao)的結(jie)果是適當的減少獲取(qu)范圍，因此非(fei)常小的四邊形不再被獲取(qu)。

2.5 四邊形獲取

Our approach is based on a recursive depth-first search with a depth of four: each level of the search tree adds an edge to the quad. At depth one, we consider all line segments. At depths two through four, we consider all of the line segments that begin “close enough” to where the previous line segment ended and which obey a counter-clockwise winding order.

我們的(de)方法(fa)是(shi)基于一個深度(du)為4的(de)遞歸深度(du)優先搜(sou)索(suo)算法(fa)：每一層(ceng)(ceng)搜(sou)索(suo)添(tian)加一個邊緣到四邊形。在(zai)第一層(ceng)(ceng)深度(du)，考(kao)慮(lv)所有的(de)線(xian)段。在(zai)第二層(ceng)(ceng)到第四層(ceng)(ceng)，考(kao)慮(lv)所有的(de)線(xian)段從(cong)“完全閉(bi)合”之前線(xian)段結束的(de)地方開始，并(bing)且服從(cong)一個逆時針纏(chan)繞(rao)順序。

Robustness to occlusions and segmentation errors is handled by adjusting the “close enough” threshold: by making the threshold large, significant gaps around the edges can be handled. Our threshold for “close enough” is twice the length of the line plus five additional pixels. This is a large threshold which leads to a low false negative rate, but also results in a high false positive rate.

魯棒性遮(zhe)擋(dang)和分割(ge)錯(cuo)誤(wu)(wu)處(chu)理通(tong)過調整“完全閉合”閾(yu)(yu)值(zhi)：通(tong)過標記大的閾(yu)(yu)值(zhi)，大的間(jian)隙邊緣會被處(chu)理。我們閾(yu)(yu)值(zhi)足(zu)夠近兩(liang)倍的長(chang)度線加另外5個(ge)(ge)像素，這是(shi)一個(ge)(ge)大門檻導致(zhi)負(fu)錯(cuo)誤(wu)(wu)率(lv)很低,但也導致(zhi)較高正錯(cuo)誤(wu)(wu)率(lv)。

We populate a two-dimensional lookup table to accelerate queries for line segments that begin near a point in space.

填(tian)充一個二維查(cha)找表(biao)來加快查(cha)詢線段,開始在空間中的一個點(dian)。

三、算出tag距相機(ji)距離與角度

3.1 Homography and extrinsics estimation(單(dan)應性(xing)和外(wai)在(zai)評估)

3.1.1 通過(guo)DLT得(de)到單(dan)應矩(ju)陣

We compute the 3×3 homography matrix that projects 2D points in homogeneous coordinates from the tag’s coordinate system (in which [0 0 1]T is at the center of the tag and the tag extends one unit in the xˆ and yˆdirections) to the 2D image coordinate system. The homography is computed using the Direct Linear Transform (DLT) algorithm. Note that since the homography projects points in homogeneous coordinates, it is defined only up to scale.

計算的(de)3x3 單應(ying)矩陣, 項目2D 點的(de)均勻(yun)坐標(biao)(biao)從(cong)標(biao)(biao)簽(qian)的(de)坐標(biao)(biao)系(xi) (在(zai)其中(zhong) [0 0 1] T 是在(zai)標(biao)(biao)簽(qian)的(de)中(zhong)心和標(biao)(biao)簽(qian)擴展一個單位在(zai) xˆ和 yˆ方向) 到(dao)2D 圖像坐標(biao)(biao)系(xi)統。應(ying)是使用直接線性變換 (DLT) 算法(fa)計算的(de)。請(qing)注(zhu)意，由于(yu)單應(ying)項目是以(yi)齊(qi)次坐標(biao)(biao)表示的(de), 所(suo)以(yi)它的(de)定義只有按比例(li)。

3.1.2 計算方法(fa)

Computation of the tag’s position and orientation requires additional information: the camera’s focal length and the physical size of the tag.

標簽的位置(zhi)和方向的計算需(xu)要附加信息:相(xiang)機(ji)的焦(jiao)距和標簽的物理(li)大小(xiao)。

The 3 × 3 homography matrix (computed by the DLT) can be written as the product of the 3 × 4 camera projection matrix P (which we assume is known) and the 4 × 3 truncated extrinsics matrix E.

3 x 3 單應矩陣(zhen)(zhen) (由 DLT 計算) 可以寫(xie)成 3 x 4 相機投影矩陣(zhen)(zhen) P (我們假(jia)設已知) 和 4 x 3 截斷extrinsics矩陣(zhen)(zhen)E的乘積(ji)。

截斷(duan)extrinsics矩陣 E：

extrinsics matrix are typically 4 × 4, but every position on the tag

is at z = 0 in the tag’s coordinate system. Thus, we can rewrite every tag coordinate as a 2D homogeneous point with z implicitly zero, and remove the third column of the extrinsics matrix, forming the truncated extrinsics matrix.

extrinsics 矩陣(zhen)通常是(shi) 4 x 4, 但(dan)每(mei)個位置(zhi)上的標簽在(zai)標記的坐標系統中(zhong)為(wei) z = 0。因此, 我們可以將每(mei)個標記坐標重寫為(wei)一個具有(you) z 隱式零的2D 齊(qi)點, 并(bing)移除 extrinsics 矩陣(zhen)的第三列。

We represent the rotation components of P as Rijand thetranslation components as Tk. We also represent the unknownscale factor as s.

我們代表 P 的旋轉(zhuan)分量為(wei)(wei) Rij和轉(zhuan)換組件作為(wei)(wei) Tk。我們也代表未知比例因(yin)子為(wei)(wei)s。

Note that we cannot directly solve for E because P is rankdeficient. We can expand the right hand side of Eqn. 2, andwrite the expression for each hij as a set of simultaneousequations。

請注(zhu)意, 我們不能直(zhi)接解決 E, 因(yin)為 P 是秩不足(zu).我們可以擴大(da)右手(shou)邊的 Eqn 2,將(jiang)每(mei)個hij的表達式寫(xie)為一組同等(deng)方程。

These are all easily solved for the elements of Rij and Tkexcept for the unknown scale factor s. However, since thecolumns of a rotation matrix must all be of unit magnitude,we can constrain the magnitude of s. We have two columnsof the rotation matrix, so we compute s as the geometric the

geometric average of their magnitudes. The sign of s canbe recovered by requiring that the tag appear in front of thecamera, i.e., that Tz < 0. The third column of the rotationmatrix can be recovered by computing the cross product ofthe two known columns, since the columns of a rotation

matrix must be orthonormal.

這些都很容易解決的 Rij 和 Tk 的元素(su)，除了未知(zhi)的比例因子 s。然而, 由于旋(xuan)(xuan)轉矩陣(zhen)的列(lie)(lie)(lie)必須都是單位幅(fu)值，我們(men)可以(yi)限制 s 的大小。我們(men)有兩列(lie)(lie)(lie)的旋(xuan)(xuan)轉矩陣(zhen), 所以(yi)我們(men)計算 s 為他們(men)幅(fu)值的幾何平(ping)均值。標記s可以(yi)重新獲得(de)通過請求在(zai)相機前的tag。即Tz < 0。旋(xuan)(xuan)轉的第三列(lie)(lie)(lie)矩陣(zhen)可以(yi)通過計算交叉乘積來恢復(fu)兩個已知(zhi)列(lie)(lie)(lie), 因為旋(xuan)(xuan)轉的列(lie)(lie)(lie)矩陣(zhen)必須是正交的。

The DLT procedure and the normalization procedureabove do not guarantee that the rotation matrix is strictlyorthonormal. To correct this, we compute the polar decomposition of R, which yields a proper rotation matrix whileminimizing the Frobenius matrix norm of the error.

DLT 程序與規范化程序以上不(bu)保(bao)證(zheng)旋(xuan)轉(zhuan)矩(ju)(ju)陣是嚴格正交.為了糾正這一(yi)點, 我們計算(suan) R 的(de)(de)極分解, 它產生一(yi)個適當的(de)(de)旋(xuan)轉(zhuan)矩(ju)(ju)陣, 而小化誤差的(de)(de) Frobenius 矩(ju)(ju)陣范數。

3.2PAYLOAD DECODING (有效載荷解碼)

3.2.1 整(zheng)體概述

The final task is to read the bits from the payload field.We do this bycomputing the tag-relative coordinates of eachbit field, transforming them into image coordinates using thehomography, and then thresholding the resulting pixels. Inorder to be robust to lighting (which can vary not only fromtag to tag, but also within a tag), we use a spatially-varyingthreshold.

后的任務是從有效負載字段(duan)中讀取位。我們通(tong)過計算每(mei)個位字段(duan)的tag相對坐標系, 利(li)用單(dan)應(ying)性將它們轉換為(wei)圖像(xiang)坐標, 然后對結(jie)果像(xiang)素(su)進行閾(yu)(yu)值化(hua)(hua)。為(wei)了受光照影響小(xiao) (這不僅可以(yi)tag到tag, 而且也(ye)可以(yi)在一個tag), 我們使用空間變化(hua)(hua)閾(yu)(yu)。

we build spatially-varying model of the intensity of “black” pixels, and a second model for the intensity of“white” models. We use the border of the tag, which contains known examples of both white and black pixels.

我(wo)們(men)建立(li)了(le) "黑色" 像素的(de)(de)強度的(de)(de)空間變化模(mo)型(xing), 以及第二(er)個模(mo)型(xing)的(de)(de)強度"白色" 模(mo)型(xing)。我(wo)們(men)使用標簽的(de)(de)邊框(kuang), 它(ta)包含(han)白色和黑色像素的(de)(de)已知示例。

A fourth quad is detected around one of the payload bits of the larger

tag. These two extraneous detections are eventually discarded because their payload is invalid. The white dots correspond to samples around the tags border which are used to fit a linear model of intensity of “white” pixels; a model is similarly fit for the black pixels. These two models are used to threshold the data payload bits, shown as yellow dots.

在較(jiao)大的(de)一個有(you)效(xiao)(xiao)載(zai)荷位的(de)tag檢(jian)測到(dao)一個四個方形(xing)。這兩個外部檢(jian)測終被(bei)丟棄, 因(yin)為它們的(de)有(you)效(xiao)(xiao)負(fu)載(zai)無效(xiao)(xiao)。白(bai)點對應(ying)于tag周圍的(de)樣本用于擬合 "白(bai)" 像素強(qiang)度線(xian)性模(mo)型的(de)邊界;模(mo)型同(tong)樣適合黑色像素。這兩種模(mo)型用于閾值數(shu)據有(you)效(xiao)(xiao)負(fu)載(zai)位, 顯示(shi)為黃色點。

This model has four parameters which are easily computedusing least squares regression. We build two such models,one for black, the other for white. The threshold used whendecoding data bits is then just the average of the predictedintensity values of the black and white models.

該(gai)模型(xing)有(you)四(si)參(can)數, 易于計算使用小(xiao)二(er)乘法回歸。我們建立了兩個(ge)這樣的模型(xing)一個(ge)是黑(hei)色的, 另一個(ge)是白色的。使用的閾值(zhi)解碼數據位, 然后只是平均(jun)的預測黑(hei)白模型(xing)的強度值(zhi)。

3.2.2 CODING SYSTEM (編碼(ma)系統，決定(ding)獲取的四邊形是否有效。)

Thegoals of a coding system are to:

• Maximize the number of distinguishable codes

• Maximize the number of bit errors that can be detectedor corrected

• Minimize the false positive/inter-tag confusion rate

• Minimize the total number of bits per tag (and thus thesize of the tag)

These goals are often in conflict, and so a given coderepresents atrade-off.

編碼系統的目標(biao)是:

·大(da)化可(ke)區分碼(ma)的數量

·大化(hua)可檢(jian)測或更(geng)正的位錯誤數

·小的the false positive/inter-tag 混淆(xiao)率(lv)

·小(xiao)化(hua)每個tag的總(zong)位數(shu) (tag的大小(xiao))

這些目標(biao)經常(chang)處于沖突中, 因此給(gei)定的代碼表示權衡。

we describe a newcoding system based on lexicodes that provides significantadvantages over previous methods. Our procedure can generate lexicodes with a variety of properties, allowing the userto use a code that best fits their needs.

我(wo)們描(miao)述了一(yi)個新基于(yu) lexicodes 的(de)編碼系統, 提供了顯著優于(yu)以前(qian)的(de)方法。我(wo)們的(de)程序可以生(sheng)成具有多(duo)種屬性的(de) lexicodes, 允許用(yong)(yong)戶使(shi)用(yong)(yong)符合其(qi)需(xu)要(yao)的(de)代碼。

we use a lexicode system that can generatecodes for any arbitrary tag size (e.g., 3x3, 4x4, 5x5, 6x6)and minimum Hamming distance. Ourapproach explicitlyguarantees the minimum Hamming distance for all four

rotations of each tag and eliminates tags which are oflow geometriccomplexity. Computing the tags can be anexpensive operation, but is done offline. Small tags (5x5)can be easily computed in seconds or minutes, but largertags (6x6) can take several days of CPU time.

我們(men)使用一個 lexicode 系統, 可(ke)(ke)以(yi)生成任意標記大小(xiao)的碼 (例如, 3x3, 4x4, 5x5, 6x6)和(he)小(xiao)漢(han)明(ming)距(ju)離。我們(men)的方(fang)法明(ming)確保(bao)證小(xiao)漢(han)明(ming)距(ju)離為每個tag的4方(fang)向旋轉和(he)消除標簽低幾何復雜(za)度(du)。計(ji)算tag是昂(ang)貴的操作, 但離線完成。小(xiao)標簽 (5x5)可(ke)(ke)以(yi)很(hen)容(rong)易地以(yi)秒或分鐘計(ji)算, 但更(geng)大標記 (6x6) 可(ke)(ke)能需要幾天的 CPU 時(shi)間。

下一篇：C語言數組如何初始化

熱點文章(zhang)推薦

華(hua)清(qing)學員就業(ye)榜單

高薪(xin)學(xue)員經驗分享

熱(re)點新(xin)聞推薦

久久婷婷香蕉热狠狠综合,精品无码国产自产拍在线观看蜜,寡妇房东在做爰3,中文字幕日本人妻久久久免费,国产成人精品三上悠亚久久