@Gargron (note that my question is kinda ignorant, as I could have researched your past ideas myself. But sometimes, starting a conversation is easier. Sorry. ;))
@denschub Relays are opt-in so easier to control whether you want a lot of public messages shoved into your database. Also scaling, easier for a small Mastodon node to deliver an item to one relay instead of 4,000 servers. Relays can scale separately / use different, more light-weight tech
@Gargron This is true in the scale we are currently living in, but this won't hold up in a future where we are "popular".
Opening connections and delivering the payload is much more expensive compared to receiving an item, so eventually, the relays will simply be unable to deliver their backlog. Now, you can scale the relays by throwing a lot of hardware and a lot of bandwidth to them.
@Gargron According to my experiments and tests, it would be actually more efficient to have the nodes themselves do the work. Yes, they will be kinda busy sometimes, but if the nodes are small (which in a nice world, they would be) there wouldn't be much traffic to federate outbound, so this might actually work.
@Gargron To be more specific: There is a huge difference between a relay delivering 10 posts from 100k nodes (the relay delivering 1 mil. posts) vs. having each node delivering 10 posts.
You'd have to do a lot of work to get the relay-cluster performant enough to handle that load, while delivering 10 posts within a short period of time is a perfectly reasonable workload, even for small nodes on slow hardware.
@denschub a mastodon server has to do far more than just deliver posts to remote servers, though. and as you said, opening connections is quite expensive - increasing amoung of deliveries will delay other forms of processing which will directly impact end-user experience
@Gargron But it doesn't have to impact UX. Put outbound jobs into it's own queue, make sure they don't consume all available sockets so other jobs can run just well.
We (diaspora, that is) would be able to deliver a post to 100k nodes within ~9 minutes on average server hardware (assuming response times I measured across our actual network), while on the same scenario, we would not be able to handle the same load on a central relay, as even the TLS handshakes would consume more than 4 hours.
@Gargron What I'm concerned about, at least in diaspora*s case (and yes, I am obviously biased) is that we are trying to work around an actual issue by implementing a workaround that will not scale in the case we become somewhat popular.
I have not yet found a nice solution, but I always try to talk to people who might also have spent some time thinking about that...
Just my private Mastodon instance. Move along.