Estimated reading time: 12 minute(s).
Utilities
A collection of utilities that seem interesting to me (haven't used them all):
- Rich CMS: https://github.com/jzombie/rich-cms
- Pipper: https://github.com/jzombie/pipper
- Sitemap validator: https://www.xml-sitemaps.com/validate-xml-sitemap.html
- Docker Lynx: https://github.com/jzombie/docker-lynx
- Harlequin (The SQL IDE for Your Terminal [TUI]): https://github.com/tconbeer/harlequin?tab=readme-ov-file
- Textual library: https://textual.textualize.io/#what-is-textual (which Harlequin uses; apps over SSH)
- GitUI (Rust-based TUI): https://github.com/extrawurst/gitui?tab=readme-ov-file#installation
- Dive (Docker image explorer TUI): https://github.com/wagoodman/dive
- Pandaral·lel (Parallelize Pandas operations on all CPUs, by changing only one line of code): https://github.com/nalepae/pandarallel ( https://github.com/jzombie/pandarallel )
- Ibis (lightweight, universal interface for data wrangling; Pandas compatible [mostly, according to what I've been lead to believe]): https://github.com/ibis-project/ibis
- Python time distribution as a heatmap: https://github.com/csurfer/pyheat
- Pip requirements.txt generator based on imports in project: https://pypi.org/project/pipreqs/
- MLflow: A Machine Learning Lifecycle Platform: https://github.com/mlflow/mlflow/
- Galactic: cleaning and curation tools for massive unstructured text datasets: https://github.com/taylorai/galactic
- Radicle: Open-Source, P2P GitHub Alternative: https://news.ycombinator.com/item?id=39600810
- Borg Backup: Deduplicating backup (compatible w/ rsync.net ): https://github.com/borgbackup/borg
- Lightweight plotting to the terminal (4x resolution via Unicode): https://github.com/olavolav/uniplot
- node-red-contrib-machine-learning: https://flows.nodered.org/node/node-red-contrib-machine-learning
- Monolith: bundle any web page into a single HTML file: https://github.com/Y2Z/monolith
- Jampack: Optimizes static websites for best user experience and best Core Web Vitals scores: https://github.com/divriots/jampack
- QuantStats: Portfolio analytics for quants: https://github.com/ranaroussi/quantstats
- Markmap (Visualize Markdown as mindmaps): https://github.com/markmap/markmap (demo: https://markmap.js.org/repl )
- zerve (Data Science & AI Workbench): https://www.zerve.ai/
- miceforest: Fast, Memory Efficient Imputation with LightGBM (fill in missing values in datasets) https://github.com/AnotherSamWilson/miceforest (LinkedIn post: https://www.linkedin.com/posts/khuyen-tran-1401_miceforest-is-a-python-library-for-imputing-activity-7187108646344916994-Tqo7 ). Variable importance may be of additional interest: https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#variable-importance
- Neural Forecast (User friendly state-of-the-art neural forecasting models): https://github.com/Nixtla/neuralforecast (LinkedIn thread: https://www.linkedin.com/posts/khuyen-tran-1401_timeseries-activity-7192169580264337411-BB6q )
- Itertools (vs. index slicing in Python): https://www.linkedin.com/posts/khuyen-tran-1401_python-activity-7193615521097920512-YwWK (itertools.islice() offers a more efficient approach by enabling the processing of only a portion of the data stream at a time, without the need to load the entire dataset into memory.)
- Insanely Fast Whisper (STT): https://github.com/Vaibhavs10/insanely-fast-whisper
- TimeGPT-1: The first foundation model for forecasting and anomaly detection: https://github.com/Nixtla/nixtla ( https://www.linkedin.com/posts/khuyen-tran-1401_timegpt-is-a-powerful-generative-pre-trained-activity-7200865032140673026-uUhg )
- Modin (drop-in Pandas replacement which uses all CPU cores): https://github.com/modin-project/modin (related LinkedIn thread: https://www.linkedin.com/posts/yukikakegawa_python-datascience-dataengineering-activity-7200118431524818946-2-LM/ )
- Dask (Python library for parallel and distributed computing): https://docs.dask.org/en/stable/
- PyOD (Python library for detecting anomalies in multivariate data): https://github.com/yzhao062/pyod (LinkedIn thread: https://www.linkedin.com/posts/eric-vyacheslav-156273169_amazing-python-library-pyod-use-it-to-detect-activity-7212107779673661443-hEt5?utm_source=share&utm_medium=member_desktop )
- Hyperfine (Compare the Speed of Two Commands): https://codecut.ai/hyperfine-compare-the-speed-of-two-commands/
- GPU.js (GPU accelerated JavaScript): https://gpu.rocks/
- Lunr (site search with vector support): https://lunrjs.com/ (Getting started: https://lunrjs.com/guides/getting_started.html )
- Text to icon: https://text2icon.app/
- Image to vector SVG: https://vectormaker.co/
- WASM-based image to vector SVG (lower quality, but much faster): https://igutechung.github.io/
- Datamuse (word-finding query engine for developers): https://www.datamuse.com/api/
- Cloudflare Tunnels (reverse-proxy tunnels):
- https://www.cloudflare.com/products/tunnel/
- https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/
- https://www.reddit.com/r/selfhosted/comments/ync1zd/cloudflare_tunnels_are_so_awesome/
- react-ts-tradingview-widgets docs: https://tradingview-widgets.jorrinkievit.xyz/docs/intro
- Record and share terminal sessions: https://asciinema.org/
- svg-term-cli: https://github.com/marionebl/svg-term-cli
- Hermes JS Engine [Facebook / React Native] (video: https://www.youtube.com/watch?v=ipYQpxAyunc): https://github.com/facebook/hermes
- "World's Fastest Voice Bot Demo": https://github.com/CerebriumAI/examples/tree/master/18-realtime-voice-agent
- Pipecat (framework for building [streaming] voice [and multimodal] conversational agents): https://github.com/pipecat-ai/pipecat
- mlforecast (Scalable machine learning for time series forecasting): https://github.com/Nixtla/mlforecast
- SDL (Simple DirectMedia Layer; cross-platform development library to provide low-level access to audio, keyboard, mouse, joystick and graphics hardware via OpenGL and Direct3D): https://www.libsdl.org/
- Rayon (data-parallelism library for Rust [can even be made to work via WASM]): https://crates.io/crates/rayon (usage with WebAssembly: https://github.com/rayon-rs/rayon?tab=readme-ov-file#usage-with-webassembly ) (LinkedIn discussion: https://www.linkedin.com/feed/update/urn:li:activity:7220084077826031617 )
- Warp (Web server in Rust: "A super-easy, composable, web server framework for warp speeds."): https://github.com/seanmonstar/warp (WebSocket chat example: https://github.com/seanmonstar/warp/blob/master/examples/websockets_chat.rs )
- Cursor ("The AI Code Editor"): https://www.cursor.com/
- GIT GUI clients: https://git-scm.com/download/gui/linux
- "Favicon Generator. For real.": https://realfavicongenerator.net/ (includes a nice checker ["Check your favicon"] which analyzes the site for what can be improved)
- Nostr relays (API): https://api.nostr.watch/
- Lazygit (git TUI): https://github.com/jesseduffield/lazygit
- Rust Simple Virtual DOM: https://github.com/richardanaya/rust-simple-virtual-dom
- SVGO ("SVG Optimizer" - A Node.js library and command-line application for optimizing SVG files): https://github.com/svg/svgo
- cargo-selector (Cargo subcommand [TUI] to select and execute binary/example targets): https://github.com/lusingander/cargo-selector
- Ratatui (An open source Rust library that's all about cooking up terminal user interfaces (TUIs)): https://www.linkedin.com/company/ratatui-rs/
- GitHub Desktop (Linux fork): https://github.com/shiftkey/desktop
- Linear ("purpose-built tool for planning and building products"): https://linear.app/
- node-red-node-pglite (PGlite is a WASM build of Postgres, packaged into a TypeScript/JavaScript client library): https://github.com/conoro/node-red-pglite ( https://conoroneill.net/2024/08/18/running-postgres-inside-node-red-via-wasm-and-pglite/; https://news.ycombinator.com/item?id=41287478 )
- DuckDB WASM: https://duckdb.org/docs/api/wasm/overview.html
- websocat (Netcat, curl and socat for WebSockets; also written in Rust and has a Dockerfile): https://github.com/vi/websocat
- ETF Matcher (match ETFs using potential fractional shares): https://etfmatcher.com/
- Rust-based Electron alternative ("Build an optimized, secure, and frontend-independent application for multi-platform deployment."): https://tauri.app/
- sec-edgar ("download all of a company’s periodic reports, filings and forms from the EDGAR database with a single command"): https://github.com/sec-edgar/sec-edgar (docs: https://sec-edgar.github.io/sec-edgar/ )
- handcalcs ("Python calculations in Jupyter, as though you wrote them by hand"): https://github.com/connorferster/handcalcs
- Eget ("easy pre-built binary installation"): https://github.com/zyedidia/eget/
- DBpedia Spotlight (open-source tool that automatically annotates text with DBpedia resources, enabling entity recognition and linking of text to structured data within the DBpedia knowledge base; primarily trained on data extracted from Wikipedia): https://www.dbpedia-spotlight.org/
- WordLlama ("fast, lightweight NLP toolkit that handles tasks like fuzzy-deduplication, similarity and ranking with minimal inference-time dependencies and optimized for CPU hardware): https://github.com/dleemiller/WordLlama
- CommandDash - AI Assist for Libraries: https://commanddash.io/
- SAQ (Simple Async Queue [for Python]): https://github.com/tobymao/saq
- Crawl4AI ("Crawl4AI simplifies web crawling and data extraction, making it accessible for large language models (LLMs) and AI applications"): https://github.com/unclecode/crawl4ai
- Workalendar (Python module that offers classes able to handle calendars, list legal / religious holidays and gives working-day-related computation functions): https://github.com/workalendar/workalendar
- Andi ("Search for the next generation with an AI chat assistant"): https://andisearch.com/ 0 OpenBB ("Investment research made easy with AI"): https://openbb.co/
- Stock intrinsic value calculator: https://www.alphaspread.com/dashboard/watchlists
- Google's AlphaChip ("open-source framework for generating chip floorplans with distributed deep reinforcement learning"): https://github.com/google-research/circuit_training/?tab=readme-ov-file#PreTrainedModelCheckpoint (related Ars Technica article: https://arstechnica.com/information-technology/2024/09/major-ai-updates-from-meta-and-google-and-a-new-era-for-ai-designed-chips/ )
- Playwright Test Generator (automatically generates test scripts by recording user interactions with the browser): https://playwright.dev/docs/codegen
- git-of-theseus (graphical tools to analyze git repos): https://github.com/erikbern/git-of-theseus
- Dockview ("Zero dependency layout manager supporting tabs, groups, grids and splitviews. Supports React, Vue and Vanilla TypeScript"): https://github.com/mathuo/dockview (demo: https://dockview.dev/)
- Wasmer (WebAssembly runtime; run WebAssembly modules on servers, desktops, and embedded devices, not just in browsers): https://wasmer.io/
- LiteRT (short for Lite Runtime; formerly TensorFlow Lite): https://ai.google.dev/edge/lite (related: https://www.npmjs.com/package/@tensorflow/tfjs-tflite )
- splink ("Fast, accurate and scalable data linkage and deduplication"): https://github.com/moj-analytical-services/splink
- httpfs (Create virtual filesystems using any HTTP framework): https://github.com/progrium/httpfs
- supertree (Interactive Decision Tree Visualization): https://github.com/mljar/supertree
- TensorFlow Quantum (hybrid quantum-classical machine learning): https://www.tensorflow.org/quantum
- Chart.xkcd (JS chart library that plots “sketchy”, “cartoony” or “hand-drawn” styled charts): https://github.com/timqian/chart.xkcd
- CuteCharts (Python chart library that plots “sketchy”, “cartoony” or “hand-drawn” styled charts): https://github.com/cutecharts/cutecharts.py
- Jupyter Rust ("A prototype Docker container for Jupyter Lab with Rust"): https://github.com/davideuler/jupyter-rust
- Tree Sitter (code parser generator tool used in neovim): https://tree-sitter.github.io/tree-sitter/
- Ocrs (ocrs is a Rust library and CLI tool for extracting text from images): https://github.com/robertknight/ocrs
- Microsoft Clarity ("Clarity is a free product that captures how people use your site"): https://clarity.microsoft.com (discovered via: https://www.linkedin.com/posts/mahomedalid_microsoft-clarity-is-like-google-analytics-activity-7275208241263878144-cZZY )
- Genesis ("A Generative and Universal Physics Engine for Robotics and Beyond"): https://genesis-embodied-ai.github.io/ ( https://github.com/Genesis-Embodied-AI/Genesis; LinkedIn discussion: https://www.linkedin.com/posts/drjimfan_if-an-ai-can-control-1000-robots-to-perform-activity-7275562526661062657-EavX )
- DXOS (open-source framework for real-time, collaborative web apps which operate entirely on the client-side): https://www.dxos.org/
- ipyvizzu-story: Create animated, interactive data stories in Jupyter Notebooks; exports to HTML files: https://github.com/vizzuhq/ipyvizzu-story
- yellowbrick (extends scikit-learn API to allow human steering of the model selection process): https://github.com/DistrictDataLabs/yellowbrick
- sweetviz ("In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code!"): https://github.com/fbdesignpro/sweetviz
- ydata-profiling (formerly "pandas-profiling "; one-line EDA for pandas DataFrames; can be exported to HTML and JSON): https://github.com/ydataai/ydata-profiling
- Feature-engine ("Unlike Scikit-learn, Feature-engine is designed to work with dataframes. No column order or name changes. A dataframe comes in, same dataframe comes out, with the transformed variables"): https://feature-engine.trainindata.com/
- Raspberry Pi Pico FreeRTOS Shell ("add an interactive shell with custom commands to your application"): https://github.com/JZimnol/pico_freertos_shell (LinkedIn thread: https://www.linkedin.com/posts/jakub-zimnol-309395221_what-do-programmers-do-in-their-free-time-activity-7281308306353082368-RoHB )
- Plotters ("A Rust drawing library focusing on data plotting for both WASM and native applications"): https://github.com/plotters-rs/plotters (Note: It includes demo project links for WebAssembly, minifb [frame buffer], and interactive GTK usage)
- uutils coreutils ("Cross-platform Rust rewrite of the GNU coreutils"): https://github.com/uutils/coreutils
- py2many ("Python to many CLike languages transpiler"): https://github.com/py2many/py2many
- Excalidraw (hand-drawn inspired schematics; "all your data is saved locally in your browser"): https://excalidraw.com/
- Open R1 ("A fully open reproduction of DeepSeek-R1"): https://github.com/huggingface/open-r1
- Pandera (Data validation schemas for Pandas & Polars [similar to Pydantic for dataframes?]) https://github.com/unionai-oss/pandera
- Tokio - Bridging with sync code (for use cases where a majority of the app is synchronous and parts of it are asynchronous): https://tokio.rs/tokio/topics/bridging
- Twiggy ("[Rust-based] code size profiler for Wasm"): https://github.com/rustwasm/twiggy
- Reticulum ("the cryptography-based networking stack for building local and wide-area networks with readily available hardware"): https://github.com/markqvist/Reticulum
- Yew ("a modern Rust framework for creating multi-threaded front-end web apps with WebAssembly" [looks similar to React]): https://github.com/yewstack/yew
- Helium ("Helium is a Python library for automating browsers such as Chrome and Firefox. The name Helium was chosen because it is also a chemical element like Selenium, but it is lighter."): https://github.com/mherrmann/helium
- nes-rust (Nintendo emulator written in Rust; runs in WASM): https://github.com/takahirox/nes-rust
-
Gitingest ("Turn any Git repository into a simple text digest of its codebase; This is useful for feeding a codebase into any LLM"; hint: can replace
github
in repo URL withgitingest
to view directly): https://gitingest.com/ - Luckyshot (Rust-based "finding the most relevant files in your codebase for AI-assisted programming."): https://github.com/richardanaya/luckyshot
- GDELT Full Text Search API: https://blog.gdeltproject.org/gdelt-doc-2-0-api-debuts/
- Lapce (Rust-powered native IDE; supports remote development; plugins; terminal): https://lap.dev/lapce/
- Helix (Rust-powered terminal IDE): https://helix-editor.com/
- Rune programming language (dynamic programming language built in Rust): https://rune-rs.github.io/
- ConnectorX (Fast dataframe loader): https://github.com/sfu-db/connector-x
- Apache Arrow: https://arrow.apache.org/
- Dragonfly DB: https://www.dragonflydb.io/
- IronRDP ("A collection of Rust crates providing an implementation of the Microsoft Remote Desktop Protocol [RDP], with a focus on security"): https://github.com/Devolutions/IronRDP
- OpenBB ("The first financial Platform that is free and fully open source"): https://github.com/OpenBB-finance/OpenBB
- heretek (GDB TUI Dashboard): https://github.com/wcampbell0x2a/heretek
- Apache ECharts (Open-source JavaScript Visualization Library): https://echarts.apache.org/en/index.html (HN Thread: https://news.ycombinator.com/item?id=43624220 )
- Code Server (VS Code in browser): https://github.com/coder/code-server
- JS-inspired mini-language to WebAssembly: https://github.com/mgechev/yac (LinkedIn thread: https://www.linkedin.com/posts/mgechev_javascript-wasm-webassembly-activity-7317550961365864449-uhkT )
- fast-dotproduct ("Fast dot product calculations for the web platform. Speeds up your dot product calculations by up to 103457%" [seriously?]): https://github.com/kyr0/fast-dotproduct
- Axum Websocket Example (built by Tokio team): https://github.com/tokio-rs/axum/blob/main/examples/websockets/src/main.rs
- Act ("Run GitHub Actions locally"): https://github.com/nektos/act (LinkedIn discussion: https://www.linkedin.com/posts/tarasowski_dont-waste-time-when-testing-github-actions-activity-7321790143965642753-dPh_ )
- bitnet.cpp ("the official inference framework for 1-bit LLMs (e.g., BitNet b1.58)."): https://github.com/microsoft/BitNet (LinkedIn Thread: https://www.linkedin.com/posts/akshay-pachaar_microsoft-just-changed-the-game-theyve-activity-7327676567461990401-pmzD )
- tscircuit ("build electronics with code, AI, and drag'n'drop tools. "): https://tscircuit.com/
- RMCP (Rust SDK for MCP): https://github.com/modelcontextprotocol/rust-sdk
- BusyBox ("The Swiss Army Knife of Embedded Linux"): https://busybox.net/about.html
- python-build-standalone (for embedding): https://github.com/astral-sh/python-build-standalone/releases
-
Foam (A personal knowledge management and sharing system for VSCode):
https://github.com/foambubble/foam
[Personal note: Thought would be to perhaps this combined with
mdbook
to replace this current CMS] - Cargo bundle (Wrap Rust executables in OS-specific app bundles [OS-specific app installers]): https://crates.io/crates/cargo-bundle
-
Apple
container
(container
is a tool that you can use to create and run Linux containers as lightweight virtual machines on your Mac. It's written in Swift, and optimized for Apple silicon.): https://github.com/apple/container - Radix ([React] "An open source component library optimized for fast development, easy maintenance, and accessibility. Just import and go—no configuration required"): https://www.radix-ui.com/ [Notes: Best looking system menus I've seen; related YouTube video: https://www.youtube.com/watch?v=lY-RQjWeweo]
- wgpu ([Rust] "A cross-platform, safe, pure-rust graphics API. It runs natively on Vulkan, Metal, D3D12, and OpenGL; and on top of WebGL2 and WebGPU on wasm. The API is based on the WebGPU standard. It serves as the core of the WebGPU integration in Firefox, Servo, and Deno."): https://github.com/gfx-rs/wgpu
- servo ([Rust] "A web browser rendering engine written in Rust, with WebGL and WebGPU support, and adaptable to desktop, mobile, and embedded applications."): https://servo.org/
- Hydra ([Python] "A framework for elegantly configuring complex applications."): https://github.com/facebookresearch/hydra
- ATAC ([Rust] "Arguably a Terminal API Client" [Like Postman, but inside a Terminal]): https://github.com/Julien-cpsn/ATAC
- Tuono ([Rust] Web development framework inspired by Next.js [Personal note: At first I got this confused with Tauri ]): https://tuono.dev/
- Rust UI ([Rust] "A Blazzingly Fast [Web] UI Library): https://www.rust-ui.com/
- Leptos ([Rust] "Leptos is a full-stack, isomorphic Rust web framework leveraging fine-grained reactivity to build declarative user interfaces."): https://github.com/leptos-rs/leptos/
- SearXNG Docker ("Create a new SearXNG instance in five minutes using Docker"): https://github.com/searxng/searxng-docker
- TigerBeetle ("The Financial Transactions Database"): https://tigerbeetle.com/
- Docling ([Python] "Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem."): https://github.com/docling-project/docling
- sledgehammer bindgen ([Rust] "Sledgehammer bindgen provides faster rust batched bindings for js code."): https://github.com/ealmloff/sledgehammer_bindgen
- Dioxus ([Rust] "One codebase, every platform. Dioxus is the Rust framework for building fullstack web, desktop, and mobile apps."): https://dioxuslabs.com/
- Dioxus TUI ([Rust] "Leverage React-like patterns, CSS, HTML, and Rust to build beautiful, portable, terminal user interfaces with Dioxus."): https://docs.rs/dioxus-tui/latest/dioxus_tui/
- Chord (distributed hash table): https://en.wikipedia.org/wiki/Chord_(peer-to-peer)
- ort ([Rust]: open-source Rust binding for ONNX Runtime): https://ort.pyke.io/ [used in https://crates.io/crates/fastembed]
- PyTorch ONNX converter: https://docs.pytorch.org/docs/stable/onnx.html
- Zed ([Rust] "Zed is a next-generation code editor designed for high-performance collaboration with humans and AI."): https://zed.dev/
- gpui ([Rust] "A fast, productive UI framework for Rust from the creators of Zed."): https://www.gpui.rs/
- Miri ([Rust] "An Undefined Behavior detection tool for Rust. It can run binaries and test suites of cargo projects and detect unsafe code that fails to uphold its safety requirements."): https://github.com/rust-lang/miri
SQL Parsers & Testers
- [Rust] https://github.com/apache/datafusion-sqlparser-rs
- [Rust] https://github.com/risinglightdb/sqllogictest-rs
Interesting Research Prototypes
While some of the above could likely go in here, these definitely go in here.
- Signals from scratch: https://github.com/tigerabrodi/signals-from-scratch/
- Jotai from scratch: https://github.com/tigerabrodi/jotai-from-scratch
Services
- RichCMS Git Actions Monitoring: https://github.com/jzombie/rich-cms/actions/
- Rsync.net (cloud backup): https://www.rsync.net/cloudstorage.html
Python Lists on Disks
-
DiskList
: A python list implementation that uses the disk to handle very large collections of pickle-able objects. https://github.com/Belval/disklist -
mmaparray
: Disk-backed arrays with a structure similar to Python's built-in array module. https://pypi.org/project/mmaparray/ -
Darr
: Python library designed for working with large, disk-based Numpy arrays. https://github.com/gbeckers/darr