I’d like to outline the problem definition by providing a specific example of application: I wanted to match numerous GPS-tracks (about 200 GPX files indicating bike routes in Austria) to an underlying 路线图 covering all roads in Austria. Having read the track points from the GPX files, I had two simple feature collections of geometry type POINT I wanted to match:

  • object road_graph, which is an sf object containing a graph of the Austrian road network (frc between 000 and 005) in intervals of 50 meters (coordinates refer to the mean of each segment)
  • object bike_graph, which is an sf object containing the gps waypoints of the bike tracks

Basically, one could use the function snapPointsToLines() from the packagemaptools or the rgeos implementation of gDistance() to perform this task. However, these functions are extremely inefficient if you have large 数据 sets, since you have to calculate all distances between all possible pairs of points and subsequently select the nearest point based on the minimum distance.

This is where the function nn2() from the package RANN comes into play.

# libraries

# get coordinate matrices
bike_coords <- do.call(rbind, st_geometry(bike_graph))
bike_coords <- cbind(moto_coords, 1:nrow(bike_coords))
graph_coords <- do.call(rbind, st_geometry(road_graph))

# fast  最近的邻居  search
closest <- nn2(bike_coords[,1:2], graph_coords, k = 1, searchtype = "radius", radius = 0.001)
closest <- sapply(closest, cbind) %>% as_tibble

# create logical vector indicating bike routes and add it to the  路线图 
road_graph$bikeroute <-ifelse(closest$nn.idx == 0, FALSE, TRUE)

# define smoother function via run length encoding
track_smoother <- function(route, smooth_length=100){
  r <- rle(route)
  index <- r$lengths < smooth_length
  r$values[index] <- 1

# apply smoother on all tracks across all roads
road_graph <- road_graph %>%
  group_by(road) %>% 
  mutate(bikeroute_smooth = track_smoother(bikeroute))

Some explanatory remarks on the nn2() function:

  • 该函数使用一个 to find the k number of near neighbours for each point. Specifying k = 1 yields only the ID of the nearest neighbor.
  • Since I basically simply wanted to flag bike routes, I used searchtype = "radius" to only searches for neighbours within a specified radius of the point. If no waypoints (i.e. bike routes) lie within this radius,  nn.idx will contain 0 and nn.dists will contain 1.340781e+154 for that point. I used this information to establish a logical vector indicating bike routes in the subsequent ifelse-statement.
  • 请注意,半径是基于lon / lat坐标之间的小数的距离。看过了 十进制度数的Wikipedia页面 (mpre precisely: the table about degree precision versus length), we can see that 3 decimal places (0.001 degrees) correspond to 111.32 m in N/S and 78.71 m E/W at 45N/S. Thus, radius = 0.001 will search for the nearest point within approx. 110 in N/S direction and approx. 75 meters in W/E direction in Austria.




马蒂亚斯 在维也纳自然资源与生命科学大学学习了环境信息管理,并获得了环境统计博士学位。他的论文的重点是罕见(极端)事件的统计建模,作为对关键基础设施进行漏洞评估的基础。他目前在奥地利国家气象和地球物理服务局(ZAMG)和BOKU大学山区风险工程研究所工作。他目前专注于(统计)不良天气事件和自然灾害以及减少灾害风险的评估。他的主要兴趣是环境现象的统计建模以及用于数据科学,地理信息和遥感的开源工具。



  • 这很有帮助!谢谢!

    罗比·罗默(Robbie Roemer) 3年前 回复