Estimated reading time: 5 minute(s).
Utilities
A collection of utilities that seem interesting to me (haven't used them all):
- Rich CMS: https://github.com/jzombie/rich-cms
- Pipper: https://github.com/jzombie/pipper
- Sitemap validator: https://www.xml-sitemaps.com/validate-xml-sitemap.html
- Docker Lynx: https://github.com/jzombie/docker-lynx
- Harlequin (The SQL IDE for Your Terminal [TUI]): https://github.com/tconbeer/harlequin?tab=readme-ov-file
- Textual library: https://textual.textualize.io/#what-is-textual (which Harlequin uses; apps over SSH)
- GitUI (Rust-based TUI): https://github.com/extrawurst/gitui?tab=readme-ov-file#installation
- Dive (Docker image explorer TUI): https://github.com/wagoodman/dive
- Pandaral·lel (Parallelize Pandas operations on all CPUs, by changing only one line of code): https://github.com/nalepae/pandarallel ( https://github.com/jzombie/pandarallel )
- Ibis (lightweight, universal interface for data wrangling; Pandas compatible [mostly, according to what I've been lead to believe]): https://github.com/ibis-project/ibis
- Python time distribution as a heatmap: https://github.com/csurfer/pyheat
- Pip requirements.txt generator based on imports in project: https://pypi.org/project/pipreqs/
- MLflow: A Machine Learning Lifecycle Platform: https://github.com/mlflow/mlflow/
- Galactic: cleaning and curation tools for massive unstructured text datasets: https://github.com/taylorai/galactic
- Radicle: Open-Source, P2P GitHub Alternative: https://news.ycombinator.com/item?id=39600810
- Borg Backup: Deduplicating backup (compatible w/ rsync.net ): https://github.com/borgbackup/borg
- Lightweight plotting to the terminal (4x resolution via Unicode): https://github.com/olavolav/uniplot
- node-red-contrib-machine-learning: https://flows.nodered.org/node/node-red-contrib-machine-learning
- Monolith: bundle any web page into a single HTML file: https://github.com/Y2Z/monolith
- Jampack: Optimizes static websites for best user experience and best Core Web Vitals scores: https://github.com/divriots/jampack
- QuantStats: Portfolio analytics for quants: https://github.com/ranaroussi/quantstats
- Markmap (Visualize Markdown as mindmaps): https://github.com/markmap/markmap (demo: https://markmap.js.org/repl )
- zerve (Data Science & AI Workbench): https://www.zerve.ai/
- miceforest: Fast, Memory Efficient Imputation with LightGBM (fill in missing values in datasets) https://github.com/AnotherSamWilson/miceforest (LinkedIn post: https://www.linkedin.com/posts/khuyen-tran-1401_miceforest-is-a-python-library-for-imputing-activity-7187108646344916994-Tqo7 ). Variable importance may be of additional interest: https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#variable-importance
- Neural Forecast (User friendly state-of-the-art neural forecasting models): https://github.com/Nixtla/neuralforecast (LinkedIn thread: https://www.linkedin.com/posts/khuyen-tran-1401_timeseries-activity-7192169580264337411-BB6q )
- Itertools (vs. index slicing in Python): https://www.linkedin.com/posts/khuyen-tran-1401_python-activity-7193615521097920512-YwWK (itertools.islice() offers a more efficient approach by enabling the processing of only a portion of the data stream at a time, without the need to load the entire dataset into memory.)
- Insanely Fast Whisper (STT): https://github.com/Vaibhavs10/insanely-fast-whisper
- TimeGPT-1: The first foundation model for forecasting and anomaly detection: https://github.com/Nixtla/nixtla ( https://www.linkedin.com/posts/khuyen-tran-1401_timegpt-is-a-powerful-generative-pre-trained-activity-7200865032140673026-uUhg )
- Modin (drop-in Pandas replacement which uses all CPU cores): https://github.com/modin-project/modin (related LinkedIn thread: https://www.linkedin.com/posts/yukikakegawa_python-datascience-dataengineering-activity-7200118431524818946-2-LM/ )
- Dask (Python library for parallel and distributed computing): https://docs.dask.org/en/stable/
- PyOD (Python library for detecting anomalies in multivariate data): https://github.com/yzhao062/pyod (LinkedIn thread: https://www.linkedin.com/posts/eric-vyacheslav-156273169_amazing-python-library-pyod-use-it-to-detect-activity-7212107779673661443-hEt5?utm_source=share&utm_medium=member_desktop )
- Hyperfine (Compare the Speed of Two Commands): https://codecut.ai/hyperfine-compare-the-speed-of-two-commands/
- GPU.js (GPU accelerated JavaScript): https://gpu.rocks/
- Lunr (site search with vector support): https://lunrjs.com/ (Getting started: https://lunrjs.com/guides/getting_started.html )
- Text to icon: https://text2icon.app/
- Image to vector SVG: https://vectormaker.co/
- WASM-based image to vector SVG (lower quality, but much faster): https://igutechung.github.io/
- Datamuse (word-finding query engine for developers): https://www.datamuse.com/api/
- Cloudflare Tunnels (reverse-proxy tunnels):
- https://www.cloudflare.com/products/tunnel/
- https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/
- https://www.reddit.com/r/selfhosted/comments/ync1zd/cloudflare_tunnels_are_so_awesome/
- react-ts-tradingview-widgets docs: https://tradingview-widgets.jorrinkievit.xyz/docs/intro
- Record and share terminal sessions: https://asciinema.org/
- svg-term-cli: https://github.com/marionebl/svg-term-cli
- Hermes JS Engine [Facebook / React Native] (video: https://www.youtube.com/watch?v=ipYQpxAyunc): https://github.com/facebook/hermes
- "World's Fastest Voice Bot Demo": https://github.com/CerebriumAI/examples/tree/master/18-realtime-voice-agent
- Pipecat (framework for building [streaming] voice [and multimodal] conversational agents): https://github.com/pipecat-ai/pipecat
- mlforecast (Scalable machine learning for time series forecasting): https://github.com/Nixtla/mlforecast
- SDL (Simple DirectMedia Layer; cross-platform development library to provide low-level access to audio, keyboard, mouse, joystick and graphics hardware via OpenGL and Direct3D): https://www.libsdl.org/
- Rayon (data-parallelism library for Rust [can even be made to work via WASM]): https://crates.io/crates/rayon (usage with WebAssembly: https://github.com/rayon-rs/rayon?tab=readme-ov-file#usage-with-webassembly ) (LinkedIn discussion: https://www.linkedin.com/feed/update/urn:li:activity:7220084077826031617 )
- Warp (Web server in Rust: "A super-easy, composable, web server framework for warp speeds."): https://github.com/seanmonstar/warp (WebSocket chat example: https://github.com/seanmonstar/warp/blob/master/examples/websockets_chat.rs )
- Cursor ("The AI Code Editor"): https://www.cursor.com/
- GIT GUI clients: https://git-scm.com/download/gui/linux
- "Favicon Generator. For real.": https://realfavicongenerator.net/ (includes a nice checker ["Check your favicon"] which analyzes the site for what can be improved)
- Nostr relays (API): https://api.nostr.watch/
- Lazygit (git TUI): https://github.com/jesseduffield/lazygit
- Rust Simple Virtual DOM: https://github.com/richardanaya/rust-simple-virtual-dom
- SVGO ("SVG Optimizer" - A Node.js library and command-line application for optimizing SVG files): https://github.com/svg/svgo
- cargo-selector (Cargo subcommand [TUI] to select and execute binary/example targets): https://github.com/lusingander/cargo-selector
- Ratatui (An open source Rust library that's all about cooking up terminal user interfaces (TUIs)): https://www.linkedin.com/company/ratatui-rs/
- GitHub Desktop (Linux fork): https://github.com/shiftkey/desktop
- Linear ("purpose-built tool for planning and building products"): https://linear.app/
- node-red-node-pglite (PGlite is a WASM build of Postgres, packaged into a TypeScript/JavaScript client library): https://github.com/conoro/node-red-pglite ( https://conoroneill.net/2024/08/18/running-postgres-inside-node-red-via-wasm-and-pglite/; https://news.ycombinator.com/item?id=41287478 )
- DuckDB WASM: https://duckdb.org/docs/api/wasm/overview.html
- websocat (Netcat, curl and socat for WebSockets; also written in Rust and has a Dockerfile): https://github.com/vi/websocat
- ETF Matcher (match ETFs using potential fractional shares): https://etfmatcher.com/
- Rust-based Electron alternative ("Build an optimized, secure, and frontend-independent application for multi-platform deployment."): https://tauri.app/
- sec-edgar ("download all of a company’s periodic reports, filings and forms from the EDGAR database with a single command"): https://github.com/sec-edgar/sec-edgar (docs: https://sec-edgar.github.io/sec-edgar/ )
- handcalcs ("Python calculations in Jupyter, as though you wrote them by hand"): https://github.com/connorferster/handcalcs
- Eget ("easy pre-built binary installation"): https://github.com/zyedidia/eget/
- DBpedia Spotlight (open-source tool that automatically annotates text with DBpedia resources, enabling entity recognition and linking of text to structured data within the DBpedia knowledge base; primarily trained on data extracted from Wikipedia): https://www.dbpedia-spotlight.org/
- WordLlama ("fast, lightweight NLP toolkit that handles tasks like fuzzy-deduplication, similarity and ranking with minimal inference-time dependencies and optimized for CPU hardware): https://github.com/dleemiller/WordLlama
- CommandDash - AI Assist for Libraries: https://commanddash.io/
- SAQ (Simple Async Queue [for Python]): https://github.com/tobymao/saq
- Crawl4AI ("Crawl4AI simplifies web crawling and data extraction, making it accessible for large language models (LLMs) and AI applications"): https://github.com/unclecode/crawl4ai
- Workalendar (Python module that offers classes able to handle calendars, list legal / religious holidays and gives working-day-related computation functions): https://github.com/workalendar/workalendar
- Andi ("Search for the next generation with an AI chat assistant"): https://andisearch.com/ 0 OpenBB ("Investment research made easy with AI"): https://openbb.co/
- Stock intrinsic value calculator: https://www.alphaspread.com/dashboard/watchlists
- Google's AlphaChip ("open-source framework for generating chip floorplans with distributed deep reinforcement learning"): https://github.com/google-research/circuit_training/?tab=readme-ov-file#PreTrainedModelCheckpoint (related Ars Technica article: https://arstechnica.com/information-technology/2024/09/major-ai-updates-from-meta-and-google-and-a-new-era-for-ai-designed-chips/ )
- Playwright Test Generator (automatically generates test scripts by recording user interactions with the browser): https://playwright.dev/docs/codegen
- git-of-theseus (graphical tools to analyze git repos): https://github.com/erikbern/git-of-theseus
- Dockview ("Zero dependency layout manager supporting tabs, groups, grids and splitviews. Supports React, Vue and Vanilla TypeScript"): https://github.com/mathuo/dockview (demo: https://dockview.dev/)
- Wasmer (WebAssembly runtime; run WebAssembly modules on servers, desktops, and embedded devices, not just in browsers): https://wasmer.io/
- LiteRT (short for Lite Runtime; formerly TensorFlow Lite): https://ai.google.dev/edge/lite (related: https://www.npmjs.com/package/@tensorflow/tfjs-tflite )
- splink ("Fast, accurate and scalable data linkage and deduplication"): https://github.com/moj-analytical-services/splink
Services
- RichCMS Git Actions Monitoring: https://github.com/jzombie/rich-cms/actions/
- Rsync.net (cloud backup): https://www.rsync.net/cloudstorage.html
Python Lists on Disks
- DiskList: A python list implementation that uses the disk to handle very large collections of pickle-able objects. https://github.com/Belval/disklist
- mmaparray: Disk-backed arrays witha structure similar to Python's built-in array module. https://pypi.org/project/mmaparray/
- Darr: Python library designed for working with large, disk-based Numpy arrays. https://github.com/gbeckers/darr