亲爱的大家,
我刚刚遇到了一个非常有用,快速且高效的功能,用于将一个数据帧中的点与另一数据帧中的最近邻居进行匹配。当然,我’愿与您分享我新获得的智慧之珠。
I’d like to outline the problem definition by providing a specific example of application: I wanted to match numerous GPS-tracks (about 200 GPX files indicating bike routes in Austria) to an underlying 路线图 covering all roads in Austria. Having read the track points from the GPX files, I had two simple feature collections of geometry type POINT
I wanted to match:
- object
road_graph
, which is ansf
object containing a graph of the Austrian road network (frc
between000
and005
) in intervals of 50 meters (coordinates refer to the mean of each segment) - object
bike_graph
, which is ansf
object containing the gps waypoints of the bike tracks
Basically, one could use the function snapPointsToLines()
from the packagemaptools
or the rgeos
implementation of gDistance()
to perform this task. However, these functions are extremely inefficient if you have large 数据 sets, since you have to calculate all distances between all possible pairs of points and subsequently select the nearest point based on the minimum distance.
This is where the function nn2()
from the package RANN
comes into play.
# libraries library(dplyr) library(sf) library(RANN) # get coordinate matrices bike_coords <- do.call(rbind, st_geometry(bike_graph)) bike_coords <- cbind(moto_coords, 1:nrow(bike_coords)) graph_coords <- do.call(rbind, st_geometry(road_graph)) # fast 最近的邻居 search closest <- nn2(bike_coords[,1:2], graph_coords, k = 1, searchtype = "radius", radius = 0.001) closest <- sapply(closest, cbind) %>% as_tibble # create logical vector indicating bike routes and add it to the 路线图 road_graph$bikeroute <-ifelse(closest$nn.idx == 0, FALSE, TRUE) # define smoother function via run length encoding track_smoother <- function(route, smooth_length=100){ r <- rle(route) index <- r$lengths < smooth_length r$values[index] <- 1 return(inverse.rle(r)) } # apply smoother on all tracks across all roads road_graph <- road_graph %>% group_by(road) %>% mutate(bikeroute_smooth = track_smoother(bikeroute))
Some explanatory remarks on the nn2()
function:
- 该函数使用一个 树 to find the k number of near neighbours for each point. Specifying
k = 1
yields only the ID of the nearest neighbor. - Since I basically simply wanted to flag bike routes, I used
searchtype = "radius"
to only searches for neighbours within a specified radius of the point. If no waypoints (i.e. bike routes) lie within this radius,nn.idx
will contain 0 andnn.dists
will contain 1.340781e+154 for that point. I used this information to establish a logical vector indicating bike routes in the subsequent ifelse-statement. - 请注意,半径是基于lon / lat坐标之间的小数的距离。看过了 十进制度数的Wikipedia页面 (mpre precisely: the table about degree precision versus length), we can see that 3 decimal places (0.001 degrees) correspond to 111.32 m in N/S and 78.71 m E/W at 45N/S. Thus,
radius = 0.001
will search for the nearest point within approx. 110 in N/S direction and approx. 75 meters in W/E direction in Austria.
这是通过QGIS进行简单可视化的结果。灰色点是基本的道路图,蓝色星号表示GPS航路点,红色点表示位于GPS轨道半径内的已拟合路段,绿色点表示平滑的一定牛遗漏道:
最好的祝福,
马蒂亚斯
1条评论
您可以在这篇文章中发表评论。
这很有帮助!谢谢!
罗比·罗默(Robbie Roemer) 3年前
发表回复