Skip to main content

What is Shakudo

Shakudo is an end-to-end data platform that provides the maximum flexibility on data tooling. On Shakudo, data teams can choose and mix and match best-of-breed tools and try out new emerging tools without the DevOps overhead. On Shakudo the workflow is simplified with the Shakudo components:

  • Session is the unified development environment with pre-configured environment, mounted credentials, network connections and connections to databases to allow start building.

  • Jobs is the batch job deployment ochestration. you can use any GIT repositories, which are developed and pushed in the sessions, or anywhere else. You can also deploy a pre-built Docker images. Jobs can be triggered on a schedule or with any KEDA scalers.

  • Services is the service deployment ochestration. Similar to jobs, you can use GIT repositories or pre-built Docker images. A service exposes an endpoint, which can be a dashboard, a website or an API endpoint.

  • Shakudo Stack Components is a universe of pre-configured fully-connected, ever-evolving data stack that supports end to end use cases of data and machine learning applications.

Shakudo adds new integrations every day. Visit our integration page to see the latest list. If you can't find the tool you are looking for, please send us an integration request.

Data Warehouse

  • Snowflake
  • Google BigQuery
  • Amazon Redshift
  • Dremio
  • Redshift
  • Apache Hudi
  • SingleStore

Blob Storage

  • Azure blob storage
  • AWS S3
  • Google storage bucket
  • Oracle blob storage
  • Cloudflare R2
  • Wasabi

Data Ingestion and streaming

  • Airbyte
  • Amazon EventBridge
  • Apache Kafka

IDE

  • Jupyter notebooks
  • VSCode
  • Code-Server
  • PyCharm

Data Transformation

  • DBT
  • DuckDB
  • Trino

Pipeline Orchestration

  • Airflow
  • Prefect
  • Dagster
  • Jenkins

Distributed computing

  • Apache Spark
  • Dask
  • Ray
  • Fugue

Data Visualization

  • Apache Superset
  • Cube
  • Streamlit
  • Metabase
  • PowerBI
  • QuickSight
  • Looker

DataCatelog

  • Datahub
  • Amundsen

Model training

  • Transformers
  • Pytorch
  • Tensorflow
  • Jax
  • MXNet
  • NVIDIA RAPIDS
  • Ray Tune
  • PostgresML

Model and application serving

  • Triton
  • TensorFlow Serving
  • TorchServe
  • Django
  • FastAPI
  • Flask

Model monitoring and governance

  • MLFlow
  • Whylogs
  • Weights & Biases
  • Evidently
  • GreatExpectations

Monitoring and Alerting

  • Prometheus
  • Grafana
  • PagerDuty
  • Slack

Data source

  • Openbb

Geospatial

  • Xclim
  • Xarray
  • cdo
  • Geopandas
  • GDAL
  • ESMF
  • Zarr

When to use Shakudo

  • Data engineering, including data transformation development and deployment
  • Distributed computing for data larger than memory
  • Data analytics and visualization
  • Deployment of batch jobs
  • Serving data applications and pipelines
  • Machine learning model training
  • Machine learning model serving