CentOS 7.x Owncloud upload files parallelly via weddav
From Notes_Wiki
<yambe:breadcrumb>CentOS_7.x_owncloud|Owncloud</yambe:breadcrumb>
CentOS 7.x Owncloud upload files parallelly via weddav
Uploading files to owncloud sequentially using rsync or cp can be slow. This can especially be an issue if you need to upload thousands of small files. To upload multiple small files in parallel use:
- yum -y install parallel
- Create list of files to be copied by comparing only size. This is required as owncloud creates its own timestamps and davfs2 timestamps shown on command-line are not copied to backend.
- rsync -nvr --size-only /mnt/source/ /mnt/owncloud-dest/ > /root/copy-list.txt 2>/root/error-list.txt &
-
- and wait for file-list to be created
- Remove first line similar to:
- sending incremental file list
-
- and last 3 lines similar to:
- sent 65 bytes received 19 bytes 168.00 bytes/sec
- total size is 0 speedup is 0.00 (DRY RUN)
-
- from the created files
- and last 3 lines similar to:
- Use parallel to copy files in parallel using above list to owncloud
- cd /mnt/source ##Very important
- cat /root/copy-list.txt | parallel --will-cite -j 5 cp -v --parents {} /mnt/owncloud-dest/ > /root/cp-output.txt 2>&1 &
-
- where -j 5 indicates 5 parallel copies at any time.
- At any time see 5 copy process running using:
- ps aux | grep "cp -v"
- Also
- ps aux | grep "cp -v" | wc -l
- will show 7 (2 more than -j value) due to grep, parallel commands also getting grepped
- To continuously monitor uploads use:
- watch "ifconfig br0; echo -n "No of copy processes:"; ps aux | grep 'cp -v' | wc -l; echo -n "No of files copied: "; grep -v 'cannot\|omitting' /root/cp-output.txt | wc -l; echo; df -h /; echo; du -sh /var/cache/davfs2; echo; tail /root/cp-output.txt"
-
- where br0 should be replaced with name of interface. This is monitoring:
- Interface statistics to get idea on uploads
- No of parallel cp processes running.
- No of files copied based on no. of lines in /root/cp-output.txt file
- Space in "/". Necessary to monitor this to ensure that cache space is not too large to accommodate in "/" filesystem.
- Disk space usage of davfs2 cache folder
- Last 10 copied files
- If davfs2 size increases automatically, to pause and continue above processes automatically use:
- ps aux | grep parallel
- while true; do sleep 7200; kill -19 <parallel-pid>; sleep 3600; kill -18 <parallel-pid>; done
- where <parallel-pid> is the PID of parallel process as seen in output of ps command.
- This will allow parallel to spawn processes for 2 hours and then pause it for 1 hour and then again continue it for another 2 hours and so on.
- Better option is to use pause-unpause.sh erlang script specified below which will automatically pause parallel with /var/cache/davfs2/ is more than 1000000 KB (approx 1GB) in size and then automatically unpause it when size goes below 100000 KB (approx 100MB)
If you are instead trying to upload small no. (<10) of large files (>1GB) then perhaps have a look at https://unix.stackexchange.com/questions/354026/disable-davfs2-caching
Refer:
- http://www.yourownlinux.com/2015/04/speed-up-file-transfers-using-rsync-with-gnu-parallel.html
- https://bash.cyberciti.biz/guide/Sending_signal_to_Processes
Pause unpause script to ensure /var/cache/davfs2 size is under limits
It is possible to pause parallel process using 'kill -19 <pid>' and then unpause it using 'kill -18 <pid>' based on space occupied by /var/cache/davfs2 folder. This can be done using erlang script:
#!/usr/bin/env escript -define(High, 1000000). %1 GB -define(Low, 100000). %100 MB main(_) -> Output1=tl(string:tokens(os:cmd("ps -C perl -o pid"),"\n")), case Output1 of [] -> io:format("Can't get pid of parallel process. Exiting.~n"); _ -> io:format("Got pid of parallel process as ~p~n",[Output1]), pause(Output1) end. pause(Pid1) -> Space1=list_to_integer(hd(string:tokens(os:cmd("du -s /var/cache/davfs2"), "\n\t"))), if Space1 > ?High -> io:format("Got space as ~p which is higher than ~p. Pausing~n", [Space1, ?High]), Command1=lists:flatten(io_lib:format("kill -19 ~s", [Pid1])), io:format("Will pause with command ~p~n", [Command1]), os:cmd(Command1), unpause(Pid1); true -> sleep(60), pause(Pid1) end. unpause(Pid1) -> Space1=list_to_integer(hd(string:tokens(os:cmd("du -s /var/cache/davfs2"), "\n\t"))), if Space1 < ?Low -> io:format("Got space as ~p which is lower than ~p. Unpausing~n", [Space1, ?Low]), Command1=lists:flatten(io_lib:format("kill -18 ~s", [Pid1])), io:format("Will unpause with command ~p~n", [Command1]), os:cmd(Command1), pause(Pid1); true -> sleep(60), unpause(Pid1) end. sleep(N) -> receive after N*1000 -> ok end.
To use the above script
- Enable epel repository using 'dnf -y install epel-release'
- Install erlang and byobu using 'dnf -y install erlang byobu
- Copy above script as file 'pause-unpause.sh'
- Give execute permission to file using 'chmod +x pause-unpause.sh'
- Start byobu shell using 'byobu'
- Execute pause-unpause.sh using './pause-unpause.sh'
- (Optionally) Exit byobu shell by leaving script running in background using 'F6' key
<yambe:breadcrumb>CentOS_7.x_owncloud|Owncloud</yambe:breadcrumb>