Estimated reading time: 7 minute(s).

Links of Interest

“No inventions; no innovations," a History of US Steel: https://www.construction-physics.com/p/no-inventions-no-innovations-a-history
Why are Apple silicon VMs so different?: https://eclecticlight.co/2023/12/29/why-are-apple-silicon-vms-so-different/
How Arm conquered the chip market without making a single chip: https://www.theverge.com/23373371/arm-chips-chip-shortage-ceo-rene-haas-tech-intel-apple-decoder
The Random Transformer: https://osanseviero.github.io/hackerllama/blog/posts/random_transformer/
You don't need analytics on your blog: https://blog.yossarian.net/2023/12/24/You-dont-need-analytics-on-your-blog
On building a semantic search: https://vickiboykis.com/2024/01/05/retro-on-viberary/
Some good pointers on lasting web pages: https://jeffhuang.com/designed_to_last/
Simple lasts longer: https://newsletter.pnote.eu/p/simple-lasts-longer
Guide to Self-Attention: https://twiecki.io/blog/2024/01/04/
Building a Container from Scratch in Rust: https://brianshih1.github.io/mini-container/
Nice looking blog w/ contents: https://matthewsanabria.dev/posts/no-shell-for-you-container/#difficulties-of-minimal-container-images
Setting up Windows 11 w/o Microsoft Account: https://www.tomshardware.com/how-to/install-windows-11-without-microsoft-account
Follow RSS across the web: https://openrss.org/
(Machine Learning Engineering Open Book ): https://github.com/stas00/ml-engineering
Why is machine learning hard?: https://ai.stanford.edu/~zayd/why-is-machine-learning-hard.html
Machine learning is still too hard for software engineers: https://news.ycombinator.com/item?id=39109469
Build an LLM (from Scratch): https://github.com/rasbt/LLMs-from-scratch
Goodhart's law: https://en.wikipedia.org/wiki/Goodhart%27s_law ("When a measure becomes a target, it ceases to be a good measure.")
Technology is the problem )includes mention of "doped silicon user interfaces"): https://www.shyamsankar.com/p/technology-is-the-problem
(I just like the heading fonts): https://joshstrange.com/2019/09/26/my-mac-apps/
This site's auto-generated sitemap: https://zenosmosis.com/sitemap.xml
Running Local LLMs and VLMs on the Raspberri Pi: https://towardsdatascience.com/running-local-llms-and-vlms-on-the-raspberry-pi-57bd0059c41a
Adding full-text search to a static site: https://www.markusdosch.com/2022/05/adding-full-text-search-to-a-static-site-no-backend-needed/
AI-driven search engine: https://www.perplexity.ai/
Building container images using no tools: https://ravichaganti.com/blog/2022-11-28-building-container-images-using-no-tools/
(A search engine in 80 lines of Python [give or take])
- https://news.ycombinator.com/item?id=39293050
- https://www.alexmolas.com/2024/02/05/a-search-engine-in-80-lines.html
- Related [found in same Hacker News discussion]: https://github.com/softwaredoug/searcharray
Aligning an LLM with human preferences (I like the "trainer" API design, at first glance): https://datadreamer.dev/docs/latest/pages/get_started/quick_tour/aligning.html
Self-balancing cube: https://willempennings.nl/balancing-cube/
StatQuest: An epic journey through statistics and machine learning: https://statquest.org/
Fractals in Neural Network Hyperparameter Tuning: https://www.linkedin.com/posts/liorsinclair_this-is-incredible-fractal-patterns-were-ugcPost-7162848575834501125-TuDV?utm_source=share&utm_medium=member_desktop (related paper: https://arxiv.org/abs/2402.06184 )
AdaBelief Optimizer: https://arxiv.org/abs/2010.07468 (source code: https://github.com/juntang-zhuang/Adabelief-Optimizer )
[Mostly non-optimized] Machine Learning Algorithms in Python: https://github.com/rushter/MLAlgorithms
Fix loose git object corruption: https://accio.github.io/programming/2021/06/16/fix-loose-objects-in-git.html
GPT in 500 lines of SQL: https://explainextended.com/2023/12/31/happy-new-year-15/
GPT in 60 lines of NumPy (Hacker News): https://news.ycombinator.com/item?id=34726115
Emergent mind (monitors social mdedia for discussions about recently-published arXiv papers): https://www.emergentmind.com/
Accelerating Generative AI with PyTorch II: GPT, Fast: https://pytorch.org/blog/accelerating-generative-ai-2/
Keras (MLX backend): https://github.com/keras-team/keras/pull/18962
Kaggle Finance Dataset (search): https://www.kaggle.com/search?q=finance+dataset
Eloquent JavaScript 4th Edition (2024): https://news.ycombinator.com/item?id=39629044
Famous algorithmic patterns and their everyday usage: https://www.linkedin.com/posts/arslanahmad_systemdesign-softwarearchitecture-softwaredevelopment-activity-7170661917621936128-qwFf?utm_source=share&utm_medium=member_desktop
Big-O cheat sheet: https://www.linkedin.com/posts/arslanahmad_codinginterview-timecomplexity-activity-7167956367599763456-4H4t?utm_source=share&utm_medium=member_desktop
Data structures worth knowing: https://www.linkedin.com/posts/arslanahmad_systemdesign-coding-interviewtips-activity-7171064213992402944-0qSv?utm_source=share&utm_medium=member_desktop
Exponential smoothing animation tricks (with code examples): https://lisyarus.github.io/blog/programming/2023/02/21/exponential-smoothing.html
I Was a Statistics Professor. I Used Sports Betting to Retire at 43: https://www.newsweek.com/i-was-statistics-professor-i-used-sports-betting-retire-43-1858974
Linear warmup with cosine annealing (machine learning): https://github.com/rasbt/LLMs-from-scratch/blob/main/appendix-D/01_main-chapter-code/appendix-D.ipynb
An invention to silence reggaeton with artificial intelligence: https://english.elpais.com/technology/2024-03-17/an-invention-to-silence-reggaeton-with-artificial-intelligence.html
Sam's Journey (new NES game): https://news.ycombinator.com/item?id=39730787
Grok-1 (xAI model): https://github.com/xai-org/grok-1
Spleeter (audio source separation library with pretrained models; stem separation; vocals / drums / bass / piano / other separation): https://github.com/deezer/spleeter/wiki/2.-Getting-started#using-docker-image
Free Spark Resources: https://www.linkedin.com/posts/shubhamwadekar_spark-data-dataengineering-activity-7178246825999556609-CjSI/?utm_source=share&utm_medium=member_android
AssemblyScript: A TypeScript-like language for WebAssembly: https://www.assemblyscript.org/
Radios, how do they work? (A brief introduction to antennas, superheterodyne receivers, and signal modulation schemes): https://lcamtuf.substack.com/p/radios-how-do-they- (Hacker News discussion: https://news.ycombinator.com/item?id=39813679 )
Modern AI Discourse is Talentless Business Students Trying to Give Young Engineers PTSD: https://www.timokats.xyz/?content=writings/aiblog
Reading and Writing WAV Files in Python: https://realpython.com/python-wav-files/
Conversation as an Interface (2016): https://annjose.com/post/conversation-as-interface/
Svelte parses HTML all wrong: https://github.com/sveltejs/svelte/issues/11052
Find Median from Data Stream: https://leetcode.com/problems/find-median-from-data-stream/ (maybe it could be useful for implementation of RobustScaler partial fit [open-source the solution?]; related StackOverflow discussion: https://stackoverflow.com/questions/57291876/robustscaler-partial-fit-similar-to-minmaxscaler-or-standardscaler )
LeetCode (25 questions to cover the most important patterns): https://www.linkedin.com/posts/alexandre-zajac_softwareengineering-coding-programming-activity-7181538559999238144-lwW8?utm_source=share&utm_medium=member_desktop
JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars: https://research.myshell.ai/jetmoe (Hacker News discussion: https://news.ycombinator.com/item?id=39933076 )
Books for Learning Math for Machine Learning: https://mltechniques.com/2022/06/13/math-for-machine-learning-12-must-read-books/
llm.c (Karpathy; LLM training in simple, pure C/CUDA): https://github.com/karpathy/llm.c
ETF Dictionary: https://etfdb.com/etfs/
Pick great stocks if you are a developer: https://medium.com/@Sanji_vals/pick-great-stocks-if-you-are-a-developer-detailed-guidance-on-stock-selection-for-developers-c1629b3a8eed
ML big or small data and how it affects model architecture decisions: https://www.linkedin.com/posts/damienbenveniste_your-data-can-tell-you-a-lot-about-the-type-activity-7186390432434577409-xYI5
JSR (The open-source package registry for modern JavaScript and TypeScript): https://jsr.io/
How Vector Databases Work: https://www.linkedin.com/pulse/understanding-how-vector-databases-work-damien-benveniste-g6htc/
Multihead Attention Layer with a Kolmogorov–Arnold Networks (KAN): https://www.linkedin.com/posts/damienbenveniste_here-is-my-implementation-of-a-multihead-activity-7192940670192394240-xq05
Sparse Multihead Attention (implementation): https://www.linkedin.com/posts/damienbenveniste_here-is-how-you-can-create-a-multihead-self-attention-activity-7195828872121102337-OsPg/
AI-powered robots are finding the flaws in ‘D’ grade U.S. infrastructure, from commuter bridges to military hardware: https://www.cnbc.com/2024/05/15/these-wall-climbing-robots-are-finding-flaws-in-d-grade-infrastructure.html
MoE multi-head attention (PyTorch example): https://www.linkedin.com/posts/damienbenveniste_can-we-mix-the-concepts-of-multihead-attentions-activity-7199089073381072897-nNfF
Here’s what’s really going on inside an LLM’s neural network: https://arstechnica.com/ai/2024/05/heres-whats-really-going-on-inside-an-llms-neural-network/
Economic theory: https://en.wikipedia.org/wiki/Economics#Theoretical_research
GuruFocus (investing): https://www.gurufocus.com/
Financial Statement Analysis with Large Language Models: https://www.newsletter.datadrivenvc.io/p/financial-statement-analysis-with (LinkedIn thread: https://www.linkedin.com/posts/andreretterath_financial-statement-analysis-with-large-language-activity-7199289952910667776-14At )
Vector Indexing all of Wikipedia, on a laptop: https://foojay.io/today/indexing-all-of-wikipedia-on-a-laptop/ (Hacker News discussion: https://news.ycombinator.com/item?id=40514266 )
De-Googling: https://blog.nradk.com/posts/degoogling/
Don't be stupid about trading: https://www.linkedin.com/posts/alfonso-peccatiello-72156a6a_trading-is-hard-and-the-rule-1-not-to-activity-7203750640576086017-KDKA
Interesting CSS-separator generator (different shapes + demo): https://github.com/wwebdev/separator-generator (demo: https://wweb.dev/resources/css-separator-generator)
Google Search console: https://search.google.com/search-console
Apple MusicKit ("what am I listening to?"): https://developer.apple.com/documentation/musickitjs
Immersive Linear Algebra (interactive book): https://immersivemath.com/ila/index.html
Extracting Concepts from GPT-4: https://openai.com/index/extracting-concepts-from-gpt-4/
Polars-Cookbook (Jupyter notebook examples for Python Polars): https://github.com/PacktPublishing/Polars-Cookbook
Naked JSX: https://nakedjsx.org/
Why Google Sheets ported its calculation worker from JavaScript to WasmGC: https://web.dev/case-studies/google-sheets-wasmgc
Rust learning resources:
- Small exercises: https://github.com/rust-lang/rustlings
- Book: https://doc.rust-lang.org/book/
Active Strategies Are Looking Good, But Don’t Abandon Your ETFs: https://www.barrons.com/articles/active-strategies-are-looking-good-but-dont-abandon-your-etfs-2057c60d
Eureka Labs (Andrej Karpathy education project): https://eurekalabs.ai/
GitHub Status: https://www.githubstatus.com/
CrowdStrike Technical Details on Today’s [July 19, 2024] Outage: https://www.crowdstrike.com/blog/technical-details-on-todays-outage/
Branchless Programming: https://www.linkedin.com/posts/heriklima_cplusplus-branchlessprogramming-optimization-activity-7218766293506674688-RQu5
HATEOAS (Hypermedia as the engine of application state): https://en.wikipedia.org/wiki/HATEOAS (Roy Fielding: https://en.wikipedia.org/wiki/Roy_Fielding )
Argo tunnels that live forever: https://blog.cloudflare.com/argo-tunnels-that-live-forever
The Standard (A collection of decades of experience in the engineering industry): https://github.com/hassanhabib/The-Standard
How Postgres stores data on disk: https://drew.silcock.dev/blog/how-postgres-stores-data-on-disk/ (Hacker News thread: https://news.ycombinator.com/ )
This Week in Rust (Aug. 7, 2024): https://this-week-in-rust.org/blog/2024/08/07/this-week-in-rust-559/
Distributed Filesystem written in Rust: https://medium.com/@xorio42/list/distributed-filesystem-written-in-rust-317d40f38304
Thoughts on vector-database searches: https://www.linkedin.com/posts/damienbenveniste_we-have-recently-seen-a-surge-in-vector-databases-activity-7226984916687708160-K-1Z/
e18e (initiative to improve JavaScript ecosystem performance): https://e18e.dev/
Information bottleneck method: https://en.wikipedia.org/wiki/Information_bottleneck_method
Winsorizing (transformation of statistics by limiting extreme values in the statistical data to reduce the effect of possibly spurious outliers): https://en.wikipedia.org/wiki/Winsorizing
How to Make Your Machine Learning Models Robust to Outliers: https://www.comet.com/site/blog/how-to-make-your-machine-learning-models-robust-to-outliers/
Apache Parquet vs. CSV: https://www.databricks.com/glossary/what-is-parquet
Ask HN: What are your favorite index ETFs for Investing? https://news.ycombinator.com/item?id=40947524
DFS as a filesystem (use DNS for a filesystem): https://www.linkedin.com/posts/laurie-kirk_dns-can-be-used-as-a-filesystem-yes-you-activity-7234610957002399744-Lqjv
"Microsoft's Phi 3.5 vision model is really good at OCR/text extraction": https://x.com/dylfreed/status/1828132226523131931
ETF Research Center: https://www.etfrc.com/portfolios/builder.php
Information about passive investing (inspired by the philosophy of Vanguard founder John C. Bogle): https://bogleheads.org/
Multithreading in Node.js: Using Atomics for Safe Shared Memory Operations: https://pavel-romanov.com/multithreading-in-nodejs-using-atomics-for-safe-shared-memory-operations
Windows NT vs. Unix - A design comparison: https://blogsystem5.substack.com/p/windows-nt-vs-unix-design
DARPA Translating All C to Rust (TRACTOR): https://www.darpa.mil/program/translating-all-c-to-rust (related Hacker News thread: https://news.ycombinator.com/item?id=41110269 )
Moodist - Ambient sounds for focus and calm: https://moodist.app/
Apple Mobile Processors Are Now Made in America (By TSMC): https://timculpan.substack.com/p/apple-mobile-processors-are-now-made
The 2 ETFs That Track Congressional Stock Trades: https://www.morningstar.com/funds/2-etfs-that-track-congressional-stock-trades
MLOps-Basics ("Understand the basics of MLOps like model building, monitoring, configurations, testing, packaging, deployment, cicd, etc."): https://github.com/graviraja/MLOps-Basics
LLMs in Finance: https://github.com/hananedupouy/LLMs-in-Finance/tree/main (Related LinkedIn post: https://www.linkedin.com/posts/hanane-d-algo-trader_anthropic-agent-with-llamaindex-financial-activity-7241080939131392000-5sJa )
Google Illuminate ("Transform your content into engaging AI‑generated audio discussions"): https://illuminate.google.com/
ML - How to optimally sample Imbalanced Data: https://www.linkedin.com/feed/update/urn:li:activity:7247530677594718209
JavaScript Structs proposal: https://github.com/tc39/proposal-structs
"Whenever I lack motivation, I watch this": https://www.linkedin.com/posts/linasbeliunas_whenever-i-lack-motivation-i-watch-this-activity-7253121679277641728-koQo
The pros and cons for investors of nonstop trading as NYSE looks to go 22 hours a day: https://www.cnbc.com/2024/10/28/the-pros-and-cons-for-investors-of-nonstop-trading-as-nyse-looks-to-go-22-hours-a-day-.html
Send Web Push notifications from a Node.js server: https://www.bocoup.com/blog/full-stack-web-push-api-guide (also: https://www.npmjs.com/package/web-push )
The Rust eBookshelf (The Rust Language & Ecosystem): https://dieterplex.github.io/rust-ebookshelf/ ( https://github.com/dieterplex/rust-ebookshelf )
Your Hacker News (personalized, newspaper-like Hacker News): https://yourhackernews.com/
You can work at McDonald’s and still become a millionaire: https://www.cnbc.com/2024/11/09/how-to-work-at-mcdonalds-and-still-become-a-millionaire.html

Papers of Interest

Fractals in Neural Network Hyperparameter Tuning: https://arxiv.org/abs/2402.06184
Deep Reinforcement Learning for Quantitative Trading: https://arxiv.org/abs/2312.15730v1
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping: https://arxiv.org/html/2402.14083v1
[Apple's multi-modal, foundational model] MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training: https://arxiv.org/abs/2403.09611 (relevant Hacker News discussion: https://news.ycombinator.com/item?id=39722498 )
Can Large Language Models Reason and Plan? (PDF): https://arxiv.org/pdf/2403.04121.pdf

Books of Interest

Build a Large Language Model (From Scratch): https://www.manning.com/books/build-a-large-language-model-from-scratch?utm_source=raschka&utm_medium=affiliate&utm_campaign=book_raschka_build_12_12_23&a_aid=raschka&a_bid=4c2437a0&chan=mm_github
WebGPU Unleashed: A Practical Tutorial (free book on WebGPU programming): https://shi-yan.github.io/webgpuunleashed/ (Hacker News thread: https://news.ycombinator.com/item?id=41156872 )

Videos of Interest

Jack Bogle on Index Funds, Vanguard, and Investing Advice: https://www.youtube.com/watch?v=MLgn_kVKjCE&t=871s
Warren Buffett reveals his investment strategy for mastering the market: https://www.youtube.com/watch?v=SEZwkbliJr8
Introducing Temporal Similarity Search for Vector Databases: https://www.youtube.com/watch?v=5XRAIcVAMFU
How many kernel system calls do runtimes make? https://www.youtube.com/watch?v=ERaGORGfLF4
Nvidia CEO on doing great work: https://www.linkedin.com/posts/alvinfsc_nvidia-ceo-doing-great-work-is-not-about-activity-7210139644032692224-pYvo
Let's reproduce GPT-2 (124M): https://www.youtube.com/watch?v=l8pRSuU81PU
Building transformers from scratch: https://www.youtube.com/watch?v=kCc8FmEb1nY
AI beats multiple World Records in Trackmania (reinforcement learning): https://www.youtube.com/watch?v=kojH8a7BW04
PyTorch for Deep Learning & Machine Learning (long, ~24-hour video): https://www.youtube.com/watch?v=V_xro1bcAuA
But what is a GPT? Visual intro to Transformers: https://www.youtube.com/watch?v=wjZofJX0v4M
Playing Classic Pink Floyd Solos with the Black Gilmour Squier Strat: https://www.youtube.com/watch?v=77dnGLYJHlA
Fax in Your Code: https://www.youtube.com/watch?v=pJ-25-pRhpY