Abstract: This paper assumes prior detections of multiple targets at each time instant, and uses a graph-based approach to connect those de- tections across time, based on their position and appearance estimates. In contrast to most earlier works in the field, our framework has been designed to exploit the appearance features, even when they are only sporadically available, or affected by a non-stationary noise, along the sequence of detections. This is done by implementing an iterative hy- pothesis testing strategy to progressively aggregate the detections into short trajectories, named tracklets. Specifically, each iteration considers a node, named key-node, and investigates how to link this key-node with other nodes in its neighbourhood, under the assumption that the target appearance is defined by the key-node appearance estimate. This is done through shortest path computation in a temporal neighbourhood of the key-node. The approach is conservative in that it only aggregates the shortest paths that are sufficiently better compared to alternative paths. It is also multi-scale in that the size of the investigated neighbourhood is increased proportionally to the number of detections already aggregated into the key-node. The multi-scale and iterative nature of the process makes it both computationally efficient and effective. Experimental val- idations are performed extensively on a 15 minutes long real-life basket- ball dataset, captured by 7 cameras, and also on PETS’09 dataset.