RRC-SCS: Data Transfer

From RRC
Jump to: navigation, search

Contents

Data transfer options

We currently provide multiple transfer options and the one to use depends on each RRC facility:

  • If our facility collects the data for you, then your results will be available for download via HTTP.
  • If you reserve time to use one of our instruments, then you have the option of transferring your results via FTP.


Please note that although both FTP and HTTP data transfer services are hosted on our MyData Server, they are generally not interchangeable (e.g. if you FTP your data to our server, you cannot also download the same data via the web interface). This limitation is mostly practical rather than technical due to most users finding it more convenient to bulk transfer large files using a FTP client rather than a web browser where files are normally transferred individually.


Also note that the data transfer options provided by our Proteomics Services are independently operated so the user accounts are also independently maintained (if you have a FTP account with our Proteomics Service, it cannot be used to access MyData Server and vice versa). Please contact MMPF staff for further assistance.


FTP / FTPS

Due to the software options on our server, we are only able to support FTP at this time. We may switch to FTPS as we upgrade our servers and network infrastructure. FTPS is much more CPU-intensive for both the sender and receiver, thus increasing transfer time and suitable free client software may not be available for all computers.

Access

User access to our FTP service must be requested by an RRC facility. This is normally done only after instrument training is completed.

Software

There are literally hundreds of FTP clients available and if you already have a favorite one, feel free to continue using it. Otherwise, we recommend FileZilla and the FireFTP add-on for Mozilla Firefox. Both programs are free, cross-platform (Windows, Mac, Linux) and easy to use. Take a look at the FileZilla Client Tutorial and FireFTP help pages for detailed step-by-step usage instructions.

To connect to our server, most FTP clients require only three pieces of information: the host/server address, a username, and a password.

Host/Server: mydata.rrc.uic.edu
Username: (your e-mail address)
Password: (your RRC password)

Some older FTP clients may need two additional bits of information: network port and transfer mode.

Port: 21
Transfer mode: passive

Screenshot of FileZilla after logging in:

FileZilla.initial_login.jpg

Storage limit

For most users the initial storage limit is 10 GB (10,000 MB) or 10,000 files, whichever comes first. Each time you log on your current disk usage will be displayed so you will know how much space you have left. When you reach your storage limit, simply delete older files you have already downloaded to free up additional space. For frequent customers, we will increase storage limits on an individual case-by-case basis.

Sample FTP login message output:

230-Your bandwidth usage is restricted
230-User kevinflynn@tron.net has group access to:  50
230-OK. Current directory is /
230-24 files used (0%) - authorized: 10000 files
230 125662 Kbytes used (1%) - authorized: 10240000 Kb
Remote system type is UNIX.
Using binary mode to transfer files.

Transfer speed

File transfers over FTP tend to involve large data sets which generate lots of network traffic and can take a while to complete. To balance the load on our servers, we cap the maximum network speed of FTP connections so please do not worry if your file transfers take longer than expected.

HTTP

We offer a web-based results download service that is especially useful for our facilities that offer data collection services. Customers submit samples for processing and once their results are ready an e-mail notification is sent along with download instructions.

Sample e-mail notification:

Dear RRC user,

Your results are available for download from the RRC MyData Server:

        http://mydata.rrc.uic.edu/

If you have any questions about your results, please reply to this e-mail.

We appreciate your business.

Thank you,
Research Resources Center

Access

All RRC customers automatically have access to our web-based download tool: http://mydata.rrc.uic.edu/mydata

Software

We try to follow open standards in our web applications so you can use just about any web browser of your choice. In the RRC we use Mozilla Firefox, Google Chrome, Opera, Safari, Internet Explorer and even some web browsers you may have never heard of, so compatibility issues do not go unnoticed for very long.

Storage limit

Currently only limited only by the storage capacity of our servers.

Transfer speed

The majority of our users who download results via our website have smaller data sets that have been heavily compressed in zip archives. The compressed files save us considerable storage space and network bandwidth so data transfers over HTTP run at full speed using the available network bandwidth.

rsync

The rsync protocol is well suited for large files that could be interrupted during the data transfer because of its unique abilities not available in FTP and HTTP.

Access

Although we use rsync internally, we currently do not offer it for general use by all RRC users. However, we are more than happy to support customers who are interested in using it.

Software

Wikipedia has a nice overview on rsync and pointers to freeware and commercial software. Contact us for recommendations.

Storage limit

Currently only limited only by the storage capacity of our servers.

Transfer speed

rsync raw transfer speeds average between FTP and HTTP with it being close to FTP most of the time. Where rsync really shines is when transferring files that have already been transferred before, but have been updated or when a downloaded file is damaged. Unlike transfers over FTP and HTTP, rsync only needs to transfer the differences.

For example, downloading a 100 GB file over a 10Mbps network connection might take 2-3 minutes under ideal conditions. If that same file was updated and is now 101 GB, transferring the updated file over FTP or HTTP would take the same amount of time as the first time plus a little more for the changes. Using rsync, the same update might take only 10-15 seconds. With on-the-fly compression enabled, rsync can transfer data even faster than FTP and HTTP. With even larger files, the time difference can be very significant.

Another advantage to using rsync is that it can "repair" files that are damaged by interrupted transfers (e.g. network connection, computer/program crashes, etc.). Only the damaged sections of a file need to be transferred so that rsync at the destination computer can use the information to patch the damaged file.

Data retention

We currently have enough storage space so that there is no need to delete older files after a preset time period. However, our needs and those of our users, may require a change in the future — should that ever be the case, we will provide plenty of advance notice before applying any new file deletion policy.

Please also keep in mind that although we will do our best to keep your data safe, we cannot guarantee that it will always be available. We recommend keeping your own backups (and just as importantly, checking those backups to make sure they are accessible). Good data archiving procedures require at least two backups that are not physically stored in the same building. Think of our copy only as your second backup set and not your one and only source.

Backup options

Below is a brief list of some backup services that we use ourselves:

  • UIC ACCC offers a free service for students/staff/faculty: ADSM/TSM Network File Backup for Personal Workstations
  • CrashPlan - Unique in that CrashPlan encourages users to use their own external storage devices and other computers to hold backups before spending money on their cloud storage offerings. If you have more than one computer, you can run the CrashPlan program on multiple computers and allow each computer to back up to one or more other computers. It makes it very easy to maintain multiple backups at minimal cost (CrashPlan's Java-based backup software is cross-platform and free along with a free basic user account).
  • rsync.net - One of the best in terms of cross-platform support (Windows, Mac OSX, Linux) and flexibility. They are also HIPAA and Sarbanes-Oxley compliant so they can be used for some types of sensitive data. Be sure to check out their FAQ about open source developers and educational discounts.



Scientific Computing Services & Support (SCSS) provides solutions to research problems and scientific endeavors that require advanced computing tools. Our staff has experience with a variety of computing architectures, commercial/open-source software and programming languages.

Facility Homepage: http://www.rrc.uic.edu/scs


Return to the SCS wiki home page

Personal tools