Transferring 35TB of data from one system to another is something that doesn’t happen overnight. There are practical limits of transfer speeds. I know where bottlenecks occur, but it’s good to have real experience to see how data moves and figuring better ways to do it.
From my benchmarking of the parity arrays I did last week, I already determined the best sequential write speeds hit around 120MB/s and about 13MB/s for small files. I have two arrays I need to restore, so each can maintain on average about 70MB/s for the typical file mix. When it comes to pulling files from the internet, my connection speed is 1.5 gbps, but my computer itself is connected via the 1 gbps network, so the limit is 1000 mbps. However, I am dependent on the server at my parent’s place which is their upload speed of 160mbps. (Shaw advertises 150 mbps upstream, so they actually get a more.) This is about 20MB/s at best, so transferring 35TB of data would take about 24 days.
Well, I figured out a few ways to improve the transfer which should cut down the time it will take for the full restoration.
- Use a HDD and physically shuttle back and forth. – I have a 16TB drive and I am just copying one set of data of about 7.3TB. I didn’t have a eSATA, and I didn’t want to take apart my computer, so I settled for USB. I’m using an external drive dock USB 3.2 which can do 5 gbps. i.e. 600MB/s which is typical max SATA3 speed. A HDD would max around half that speed anyway. However, didn’t have USB 3.2 on the server, so I didn’t want to go to USB 2.0 as that would choke the speed a significant amount. I realized that I could plug the drive into a newer computer on the network and then I would be limited to the 1000 mbps network. This I felt was acceptable. While copying though, on sustained writes it would peak around the 1000mbps, but while copying the small files, the drive often dips below 10MB/s, so the network speed became less of the bottleneck. It’s the tiny files that are the culprit. I then came up with the idea where I copy just the large files (50MB and up) to the HDD. This makes up 92% of the data, and it would average much higher speeds of over 80MB/s. The smaller 7.3TB data would still take over well over a day to copy.
- A second source of the data over the internet. – My sister also has the same data on her system. She has identical internet which is capped at 160 mbps. Having both syncing to my computer achieves over 300 mbps, so I’m getting about 30MB/s.
The HDD is actually limited if you only have one drive. You need the same amount of time to download the data back the target computer, so effective speed would be half = 40MB/s. As mentioned, I moved only large files of 50MB or more. What about the small files? Syncthing will handle bringing in the small files since it will be channeled through the smaller internet speed limitation.
It’s good that I have three copies of the data, because over the next couple of weeks, I would have been vulnerable to permanent data loss if the one server breaks down.