Data¶

This project combines astrometric and spectroscopic data to enable a detailed chemo-dynamical analysis of the Milky Way’s stellar populations. The datasets used include:

The full data used for the project can be found here: Research Project - Google Drive Data Although these are provided for completeness, they are not required for the basic functionality of the pipeline. The contents are outlined below:

Directory	Description
`data/raw/`	The full surveys datasets as well as Value Added Catalogues (VACs) before effective cuts have been applied. These are predominantly required in Notebook 0 which constructed the datasets used throughout the analysis. They are also used within the plotting of GMM results to provide background context on the overall data distribution (although this can be easily removed by removing the `full_survey_file` input path).
`data/filtered/`	Holds the resultant APOGEE and GALAH datasets after quality and scientific cuts have been applied (built in Notebook 0). These are also included directly in the GitHub repository.
`XD_Results/`	Stores intermediate outputs from all initialisation of the clustering pipelines, including Gaussian parameters (means and covariances), model selection scores (e.g., AIC/BIC), and assignment probabilities. This allows the user to recreate the results without rerunning the full pipeline (computationally expensive).

Additionally, the repository contains:

Directory	Description
`XD_Results/`	Stores intermediate outputs from all initialisations of the clustering pipelines, including Gaussian parameters (means and covariances), model selection scores (e.g., AIC/BIC), and assignment probabilities. This allows the user to recreate the results without rerunning the full pipeline (which is computationally expensive).