Scroll down a lot to “Is your website training AI?” (direct anchor link), and you can enter a domain (e.g. yours!) to see how much that page has contributed.
Rank
Domain
Tokens
Percent of all Tokens
11
washingtonpost.com
55M
0.04%
106,350
forum.zettelkasten.de
190k
0.0001%
328,137
christiantietze.de
71k
0.00005%
718,385
zettelkasten.de
32k
0.00002%
Can’t beat a community effort like a forum, no matter how much or how long I post :)
Got the link and the idea to check my sites from jwz.org.
Update 2023-04-20: I should’ve checked the Post’s claims, first; they say it’s Google’s dataset, but that seems to be wrong. Here’s a short summary of the C4 dataset I could find:
Hugging Face’s “Models trained or fine-tuned on c4” sidebar contains Google-owned repositories, but Common Crawl doesn’t appear to be affiliated with Google directly. Wikipedia indicates that the Common Crawl foundation is founded by Gil Elbaz whose company “Applied Semantics” was acquired by Google. But the Common Crawl foundation isn’t, as far as I can tell.
The dataset is likely used by AI, including Google’s, but that’s different.
I stumbled upon this page: http://ratfactor.com/cards/um Dave Gauer describes how he has a shell script, um, that he can use as a man replacement to help remember how to use a command. Dave’s implementation uses the cards} from his own Wiki, because the um pages there are “consolidated, I won’t forget about them, it’s easy to list, create, and update pages.” (To be honest, though, I can’t figure out where his um cards actually are, and what they look like.)
I’m a happy FastMail user. If you want to be a happy, too, use my referral code for 10% off of your first year (and I’ll get a discount, too!) → https://ref.fm/u21056816 I never used their Masked Email feature, though, because it’s so cumbersome to create these addresses from the web UI. I all but forgot about this feature until today, when I looked for something else in my settings.
The ‘wiki’ pages I added some time ago are the best places to summarize topics and embed a list of related posts. So I added a page about VIPER and briefly had a look at my old posts.
On a side note, it’s funny how this approach looked kind of popular for a while, but never really caught on. Another example that programming is a pop culture. VIPER never made the Top 10. While it’s still an approach that does what it set out to do, the shorter ‘VIP’ tried to supplant it later, but the much less opinionated [view] coordinators really took the stage. (Crazy that Soroush’s post is from January 2015, which is 8 years ago.)
I’ve never touched GitHub Copilot in all these years, but everyone seems to be very happy with it. People recommend Copilot for all kinds of refactorings and repetetive tasks. So I figured I might give it a try and see how it works. Just yesterday, I used the Copilot Xcode plugin to write a lot of boilerplate for me. I can confirm it does its job.