CentOS 7.x Owncloud upload files parallelly via weddav

From Notes_Wiki
Revision as of 18:39, 5 February 2021 by Saurabh (talk | contribs)

<yambe:breadcrumb>CentOS_7.x_owncloud|Owncloud</yambe:breadcrumb>

CentOS 7.x Owncloud upload files parallelly via weddav

New approach by uploading files to data folder and running occ files:scan

To upload files to owncloud use following steps:

  1. Copy files to data folder of corresponding user at desired location. For example to upload files to projects/new/ folder of admin user copy them to '/opt/owncloud-<version>/apps/owncloud/data/admin/files/projects/new/' such as using:
    rsync -vtrp ./new/ /opt/owncloud-<version>/apps/owncloud/data/admin/files/projects/new/
  2. Scan the particular folder for corresponding changes:
    cd /opt/owncloud-10.0.10-4/apps/owncloud/htdocs
    sudo -u daemon /opt/owncloud-10.0.10-4/php/bin/php occ files:scan --path "admin/files/projects/new/"
    This will add any new files in projects/new folder to owncloud database.
    If installation is small you can also consider scanning all folders using:
    sudo -u daemon /opt/owncloud-10.0.10-4/php/bin/php occ files:scan --all
    Refer CentOS 7.x Owncloud file cache and sharing


Old approach using parallel and webdav

This is older approach and perhaps not required. This take considerable time and expertise and is still slower then the faster approach explained above.

Uploading files to owncloud sequentially using rsync or cp can be slow. This can especially be an issue if you need to upload thousands of small files. To upload multiple small files in parallel use:

  1. yum -y install parallel
  2. Create list of files to be copied by comparing only size. This is required as owncloud creates its own timestamps and davfs2 timestamps shown on command-line are not copied to backend.
    rsync -nvr --size-only /mnt/source/ /mnt/owncloud-dest/ > /root/copy-list.txt 2>/root/error-list.txt &
    and wait for file-list to be created. This can be very slow if /mnt/owncloud-dest is mounted using davfs2. To speed this up directly build this list by comparing /mnt/source with contents of /opt/owncloud-<version>/apps/owncloud/data/<user>/files/<path> of remote owncloud machine over ssh or sshfs
  3. Remove first line similar to:
    sending incremental file list
    and last 3 lines similar to:
    sent 65 bytes received 19 bytes 168.00 bytes/sec
    total size is 0 speedup is 0.00 (DRY RUN)
    from the created files
  4. Use parallel to copy files in parallel using above list to owncloud
    cd /mnt/source ##Very important
    cat /root/copy-list.txt | parallel --will-cite -j 5 cp -v --parents {} /mnt/owncloud-dest/ > /root/cp-output.txt 2>&1 &
    where -j 5 indicates 5 parallel copies at any time.
  5. At any time see 5 copy process running using:
    ps aux | grep "cp -v"
    Also
    ps aux | grep "cp -v" | wc -l
    will show 7 (2 more than -j value) due to grep, parallel commands also getting grepped
  6. To continuously monitor uploads use:
    watch "ifconfig br0; echo -n "No of copy processes:"; ps aux | grep 'cp -v' | wc -l; echo -n "No of files copied: "; grep -v 'cannot\|omitting' /root/cp-output.txt | wc -l; echo; df -h /; echo; du -sh /var/cache/davfs2; echo; tail /root/cp-output.txt"
    where br0 should be replaced with name of interface. This is monitoring:
    1. Interface statistics to get idea on uploads
    2. No of parallel cp processes running.
    3. No of files copied based on no. of lines in /root/cp-output.txt file
    4. Space in "/". Necessary to monitor this to ensure that cache space is not too large to accommodate in "/" filesystem.
    5. Disk space usage of davfs2 cache folder
    6. Last 10 copied files
  7. If davfs2 size increases automatically, to pause and continue above processes automatically use:
    ps aux | grep parallel
    while true; do sleep 7200; kill -19 <parallel-pid>; sleep 3600; kill -18 <parallel-pid>; done
    where <parallel-pid> is the PID of parallel process as seen in output of ps command.
    This will allow parallel to spawn processes for 2 hours and then pause it for 1 hour and then again continue it for another 2 hours and so on.
    Better option is to use pause-unpause.sh erlang script specified below which will automatically pause parallel with /var/cache/davfs2/ is more than 1000000 KB (approx 1GB) in size and then automatically unpause it when size goes below 100000 KB (approx 100MB)


If you are instead trying to upload small no. (<10) of large files (>1GB) then perhaps have a look at https://unix.stackexchange.com/questions/354026/disable-davfs2-caching


Refer:


Pause unpause script to ensure /var/cache/davfs2 size is under limits

It is possible to pause parallel process using 'kill -19 <pid>' and then unpause it using 'kill -18 <pid>' based on space occupied by /var/cache/davfs2 folder. This can be done using erlang script:

#!/usr/bin/env escript

-define(High, 1000000).  %1 GB
-define(Low,   100000).  %100 MB

main(_) ->
        Output1=tl(string:tokens(os:cmd("ps -C perl -o pid"),"\n")),
        case Output1 of
           [] ->
                 io:format("Can't get pid of parallel process.  Exiting.~n");
           _ ->
               io:format("Got pid of parallel process as ~p~n",[Output1]),
               pause(Output1)
        end. 
           

pause(Pid1) ->
       Space1=list_to_integer(hd(string:tokens(os:cmd("du -s /var/cache/davfs2"), "\n\t"))),
       if 
           Space1 > ?High ->
                io:format("Got space as ~p which is higher than ~p.  Pausing~n", [Space1, ?High]),
                Command1=lists:flatten(io_lib:format("kill -19 ~s", [Pid1])),
                io:format("Will pause with command ~p~n", [Command1]),
                os:cmd(Command1),
                unpause(Pid1);
           true ->
                sleep(60),
                pause(Pid1)
        end.

     
unpause(Pid1) ->
       Space1=list_to_integer(hd(string:tokens(os:cmd("du -s /var/cache/davfs2"), "\n\t"))),
       if 
           Space1 < ?Low ->
                io:format("Got space as ~p which is lower than ~p.  Unpausing~n", [Space1, ?Low]),
                Command1=lists:flatten(io_lib:format("kill -18 ~s", [Pid1])),
                io:format("Will unpause with command ~p~n", [Command1]),
                os:cmd(Command1),
                pause(Pid1);
           true ->
                sleep(60),
                unpause(Pid1)
        end.

    

sleep(N) ->
    receive
    after N*1000 ->
        ok
    end.

To use the above script

  1. Enable epel repository using 'dnf -y install epel-release'
  2. Install erlang and byobu using 'dnf -y install erlang byobu
  3. Copy above script as file 'pause-unpause.sh'
  4. Give execute permission to file using 'chmod +x pause-unpause.sh'
  5. Start byobu shell using 'byobu'
  6. Execute pause-unpause.sh using './pause-unpause.sh'
  7. (Optionally) Exit byobu shell by leaving script running in background using 'F6' key



<yambe:breadcrumb>CentOS_7.x_owncloud|Owncloud</yambe:breadcrumb>