By Jorge Vergara, MAS´ and DIE UChile postdoctoral researcher

The Large Synoptic Survey Telescope (LSST), the world’s largest wide-angle telescope, is expected to begin operation in 2022 and it will be located at Cerro Pachón, IV Region, Chile. With the launch of the LSST a new era in Astrophysics will begin, as real-time image and data processing will push the current data processing systems to the limits of their capabilities. The LSST aims to study the visible sky repeatedly 1000 times over a period of 10 years, creating 100 petabytes in image files and 20 petabytes of databases for research purposes. The automatic analysis of this large volume of data poses challenges in many areas, including the machine learning algorithms, which must be solved before the LSST start into operation.

Currently, one of the modules developed for the LSST, the Moving Object Pipeline System (MOPS), which aims at detecting and identifying asteroid trajectories. The MOPS system analyzes the trajectories for each of the possible asteroid candidates; however, the inherent noise in the images causes a large number of false positives, makes the number of candidates to be very high, increases the computational cost of MOPS, and complicates the real time analysis.

At MAS, together with the LSST team of the University of Washington, we study new strategies to optimize the MOPS process in real time. Within the strategies to reduce the amount of data in the process, we work with unsupervised analysis of images and light curves. In our study  we proposed a methodology that not only allows us to reduce the amount of data through clustering, but also to identify the dynamics generated by the event under study. On the other hand, within the strategies to create and select the most relevant features to discriminate between asteroids and non-asteroids, we work on the detection and identification of synergic feature groups using information theory. In our current research of this strategy, we have found that the identification of synergistic feature groups improves the performance and classification between asteroids and non-asteroids, and at the same time provide us with knowledge of which features dominate on each class.

Visualization of the 20 most relevant features (from a total of 200 features) obtained from asteroid / non-asteroid images of 20×20 pixels. The image was synthesized using a technique called T-Distributed Stocastic Neighbor Embedding (T-SNE)