microgpt
This is a brief guide to my new art project microgpt, a single file of 200 lines of pure Python with no dependencies that trains and inferences a GPT. This file contains the full algorithmic content of what is needed: dataset of documents, tokenizer, autograd engine, a GPT-2-like neural network architecture, the Adam optimizer, training loop, and inference loop. Everything else is just efficiency. I cannot simplify this any further.
The real cost of random I/O The random_page_cost was introduced ~25 years ago, and since the very beginning it’s set to 4.0 by default. The storage changed a lot since then, and so did the Postgres code. It’s likely the default does not quite match the reality. But what value should you use instead? Flash storage is much better at handling random I/O, so maybe you should reduce the default? Some places go as far as recommending setting it to 1.0, same as seq_page_cost. Is this intuition right?
https://vondra.me/posts/the-real-cost-of-random-io/ HN
interesting article about the legacy default setting values vs. updated real world measurements and the resulting consequences.
CI should fail on your machine first When you think of CI, you probably picture a remote server somewhere: GitHub Actions, GitLab CI, Jenkins. You push your code, you wait, and eventually you get a green checkmark or a red X. ... This is the feedback loop we've all accepted as normal. What if CI could fail on your machine, before you even push?
https://blog.nix-ci.com/post/2026-03-09_ci-should-fail-on-your-machine-first
We’ve all heard of those network effect laws: the value of a network goes up with the square of the number of members. Or the cost of communication goes up with the square of the number of members, or maybe it was n log n, or something like that, depending how you arrange the members. Anyway doubling a team doesn't double its speed; there’s coordination overhead. Exactly how much overhead depends on how badly you botch the org design.
But there’s one rule of thumb that someone showed me decades ago, that has stuck with me ever since, because of how annoyingly true it is. The rule is annoying because it doesn’t seem like it should be true. There’s no theoretical basis for this claim that I’ve ever heard. And yet, every time I look for it, there it is.
Here we go: Every layer of approval makes a process 10x slower
Rob_Pike's 5 Rules of Programming
- Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.
- Rule 2. Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.
- Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.)
- Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures.
- Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.
Pike's rules 1 and 2 restate ~Tony_Hoare~ Donald_Knuth's famous maxim "Premature optimization is the root of all evil."
Ken_Thompson rephrased Pike's rules 3 and 4 as "When in doubt, use brute force.".
Rules 3 and 4 are instances of the design philosophy KISS.
Rule 5 was previously stated by Fred_Brooks in The_Mythical_Man-Month. Rule 5 is often shortened to "write stupid code that uses smart objects".
i have no problems falling asleep anymore, but i used to. the reason it's not a problem anymore is probably a mixture of many things: financial security, a stable relationship and, somewhat surprisingly, a TODO-list app i actually use. the last one sounds a bit banal, but i used to jolt up and worry about having forgotten some important task.
other techniques and factors:
today i read about cognitive_shuffling: https://www.bbc.com/future/article/20260311-cognitive-shuffling-the-micro-dreaming-technique-that-helps-your-brain-to-rest
song of the day:
in case you don't know Genny_Harrison's substack yet: https://substack.com/@surfnukumoi
Mostly Tolkien, sometimes other books and films. I write about Gandalf, Elrond, Éomer, Faramir. Middle-earth shows how power, collapse, and hope work, and why these stories still matter now.
e.g. https://surfnukumoi.substack.com/p/when-the-ring-came-to-faramir-he
she's great.
A Compiler Writing Journey
In this Github repository, I'm documenting my journey to write a self-compiling compiler for a subset of the C language. I'm also writing out the details so that, if you want to follow along, there will be an explanation of what I did, why, and with some references back to the theory of compilers.
the situation on facebook is becoming more and more insufferable every single day. apart from the poisoning of the public discourse by bot armies i'm getting drowned in friend request from fake profiles.
if we were to go back to creating a social network for people we have actual social connections with, how could we design a system that ensures the people on it are real, while keeping the effort low (i.e. no manual intervention by admins, not data gathering like license upload)
a few days ago i linked apenwarr's post "Every layer of review makes you 10x slower".
yesterday i read the_guardian's AI got the blame for the Iran school bombing. The truth is far more worrying.
they fit together beautifully and terrifyingly. palantir's maven made the people skip the review stage in favor of faster targeting times, which resulted in the bombing of an iranian girls school.