26012021

4chan Storage Stats

One of the main problems of the cheaper VPS's is they often offer really limited storage which dosn't go well with imageboards as they host many... guess what, images, some even full on videos.
There are two ways to calculate how much storage you need for your imageboard using 4chan stats, the right way, and the dirty way, let's start with the dirty way:
On the bottom of 4chan's front page inside the stats section right now it says "Active Content: 1258 GB", knowing there are 74 boards in total, each having 100 threads that means 1258/74 = 17GB per board, this number takes into account /hg/ which has a higher file size cap of 6MB and encourages higher quality thus bigger in size images also /gif/ and /wsg/ which both only allow webm files, many users encode their longer webms to use as close to 4MB as possible as to get the highest quality possible, another board that can shift the numbers is the /f/ board which allows file sizes up to 10MB and the files tend to get close to said cap; this way was really simple and dosn't take into account many smaller details but can give you an overall picture on how much storage your imageboard might need, considering you have a 4MB cap and only allow images (no webm) 15GB per board should be plenty, keep in mind if you make the size cap let's say 8MB, you don't actually need 30GB, you should be fine with 20GB, as most users don't get even close to the default 4MB cap, and if once in a while for some coincidence you start running out of space, you can just delete one or two low quality threads, out of a houndred per board there should be plenty and users might not even notice, another tip I already mentioned on this blog is to add a rule against image dump threads, I think such a rule is dumb because of the nature of imageboards but some consider these threads low quality and they can use lots of storage.

You should also know the worst case scenario is 4(MB)*100(threads)*300(posts) = 120GB per board, and, while very unlikely this is possible and I am sure 90% of imageboards aren't prepared for it. (numbers depend on your imageboard configuration)

And the right way is a whole other story...
4chan has a JSON API which means you can easily extract some data they offer about each thread/post/board, now all you would need to do is write a script to scrape specific data over some time period (lets say a week) and with that data you can make some more accurate numbers, this way you would be able to individually scrape each board which means you can ignore specific boards like /hr/, /gif/, /wsg/ and /f/ or in the case you want to see how much storage a single webm-enabled board would use you can individually scrape /gif/, this way is way more accurate and allows you to calculate things the dirty way dosn't.

Conclusion? 28chan needs a host with 225GB of storage, or less boards...
This entry might not seem that interesting but I never thought about how big 28chan would be with all boards full, and its current server is not prepared at all for it even though it should last some before needing to migrate to buyvm.net.