It is all about the bandwidth.
For each Snowblossom hash, there are 6 reads of different locations in the snowfield. The reads are only 16 bytes each, but due to how block devices work the OS/Hardware do those as 4kb reads. That means each hash is 24k in reads.
So if you have an NVME that can in theory do 2400MB/s of reads. So:
2400*1024 / 24 = 102400 hashes/sec.
That means the theoretical max, 102kh/s. In practice, on such an NVME I see about 1800 MB/s moved (check via dstat) and a hash rate around 88kh/s. At that hash rate, I should be seeing a bit more transfer but that is probably OS cache hits.
Also, the 6 snowblossom PoW reads are sequential. What it is really doing is:
hash -> read -> hash -> read -> ...
So each thread has to wait for IO to return to do some hashing to find out what the next read location is. So to really saturate a fast IO bus you need a bunch of threads. Way more than CPU cores. Play around with it.
For example, to saturate my Intel 750 NVME I had to go to 128 threads.
- 1 Tuning Parameters
- 2 Arktika
- 2.1 Principals Of Operation
- 2.2 Things you should do
- 2.3 Why is remote faster than SSD
- 2.4 Interesting Configurations
- 2.5 But how many threads and stuff should I use?
- 2.6 Should I run threads on my memory servers or leave them alone?
- 2.7 When I start up some nodes talking to other nodes it just streams errors until the other node is up?
- memfield=(true|false) Cache entire field in memory. Does it as needed so takes some time to warm up.
- threads= (some number)
- memfield_precache_gb = (some number) Caches this many GBs in memory, at startup.
- min_depth_to_disk=(0 to 6) Only goes to disk rather than memory is we are already N passes into the PoW. So 6 means memory mining only, even with partial field. A lower number means only go to disk if we got that far without disk. It gets weird. Give it a try. Only in fastfail branch.
Arktika is our next generation advanced mining agent. It has some complicated tuning options and is not simple.
It is not included in the release zip files, you'll have to build it from source:
bazel build :Arktika_deploy.jar
Principals Of Operation
The configuration of a node consists of an ordered number of layers. Each layer provides access to some or all of the 1gb chunk files (see the chunked torrents on https://snowblossom.org/snowfields/index.html). The first layer to provide any given chunk is the one that is used (with the exception of doing the PoW proof for actual shares, which skips remote sources).
When threads work on parts of PoWs, they do what work they can and then put the work on queues for the next layer needed for the next read the PoW needs. Each layer has a thread pool that handles work on that layer.
Each Arktika node also acts as a GRPC server for remote requests.
Types of Layers
- file - regular file access. Can be SSD, HDD, ramfs, zram or whatever. Anything accessed by a file path.
- mem - memory. Read into memory on startup.
- remote - these chunks are read from a remote Arktika node via GRPC.
- fake - a fake random source. Do not use for real mining. Helps find out what your CPU could do if your IO was not a limitation at all.
Things you should do
- Each node that is to do hashing should have access to all chunks and deck files without relying on remote sources. This is needed for share PoW proof generation.
- Don't put threads on a layer that doesn't get action. They will just fill all the queues on the other layers and waste CPU.
Why is remote faster than SSD
Seriously, network is slow, like 100MB/s, while my SSD is like 500MB/s for even a slow one. What gives? So the key is that for a local block device (or in fact memory) even though snowblossom only needs 16 bytes at a time, these data sources read at 4k blocks at a time. Over the network, we can ask for a bunch of 16 byte "words" and the server can pack them all in a few blocks. So with the client using a queue for the layer and batching up request, we can get some pretty amazing performance over the network.
For example, lets say we have a memory server on a regular gigabit Ethernet network and a cpu worker that can hash at 1MH/s. The worker asks for 1e6 * 6 * 16 bytes of data every second, 96MB/s. That just about maxes out the network. However, to server that the server side has to read 1e6 * 6 * 4096 = 24576 MB/s.
So this way even our modest network can help saturate the memory bus with IO even if we can't cram all the memory and CPUs on one motherboard.
- Nodes that can't fit the whole field in ram hosting parts each and then using each other. Maybe also some workers that use them. See https://github.com/snowblossomcoin/snowblossom/tree/master/example/arktika with node1-3 and worker.
But how many threads and stuff should I use?
No one knows.
Should I run threads on my memory servers or leave them alone?
No one knows. Happy discovery.
When I start up some nodes talking to other nodes it just streams errors until the other node is up?
Ride the snake! It should settle out once everyone is up. Should be no errors then.