Talk: building a Feature Store with Dagster and Ray
While working as an MLOps Engineer at Sanas I designed and developed the Feature Store used for model training and other workloads. The Feature Store had around 10 deep learning and statistical features (with cross-features dependencies), every feature having 10-40M files.
Dagster was used for orchestration, and Ray (KubeRay
) — for scaling the jobs. I’ve written a custom RayClusterResource
which handled automatic KubeRay
cluster provisioning for Dagster’s ops and assets. It was possible to write something like this:
...
to automatically run the @asset
body in an auto-scaling KubeRay
cluster on Kubernetes.
This talk at the Dagster Community Meetup explains the solution and how I’ve arrived at it in more details.