(616) 371-1037

[email protected]

cloudupload

Do’s and Don’ts for Streaming File Uploads to Azure Blob Storage with .NET MVC

March 29, 2021 - Rachel Hagerman

1 Comment

What’s the big deal about file uploads? Well, the big deal is that it is a tricky operation. Implement file uploads in the wrong way, and you may end up with memory leaks, server slowdowns, out-of-memory errors, and worst of all unhappy users.

With Azure Blob Storage, there multiple different ways to implement file uploads. But if you want to let your users upload large files you will almost certainly want to do it using streams. You’ll find a lot of file upload examples out there that use what I call the “small file” methods, such as IFormFile, or using a byte array, a memory stream buffer, etc. These are fine for small files, but I wouldn’t recommend them for file sizes over 2MB. For larger file size situations, we need to be much more careful about how we process the file.

What NOT to do

Here are some of the Don’ts for .NET MVC for uploading large files to Azure Blob Storage

DON’T do it if you don’t have to

You may be able to use client-side direct uploads if your architecture supports generating SAS (Shared Access Signature) upload Uris, and if you don’t need to process the upload through your API. Handling large file uploads is complex and before tackling it, you should see if you can offload that functionality to Azure Blob Storage entirely.

DON’T use IFormFile for large files

If you let MVC try to bind to an IFormFile, it will attempt to spool the entire file into memory. Which is exactly what we don’t want to do with large files.

DON’T model bind at all, in fact

MVC is very good at model binding from the web request. But when it comes to files, any sort of model binding will try to…you guessed it, read the entire file into memory. This is slow and it is wasteful if all we want to do is forward the data right on to Azure Blob Storage.

DON’T use any memory streams

This one should be kind of obvious, because what does a memory stream do? Yes, read the file into memory. For the same reasons as above, we don’t want to do this.

DON’T use a byte array either

Yep, same reason. Your byte array will work fine for small files or light loading, but how long will you have to wait to put that large file into that byte array? And if there are multiple files? Just don’t do it, there is a better way.

So what are the DOs?

There is one example in Microsoft’s documentation that covers this topic very well for .NET MVC, and it is here, in the last section about large files. In fact, if you are reading this article I highly recommend you read that entire document and the related example because it covers the large file vs small file differences and has a lot of great information. And just go ahead and download the whole example, because it has some of the pieces we need. At the time of this article, the latest version of the sample code available is for .NET Core 3.0 but the pieces we need will work just fine with .NET 5.

The other piece we need is getting the file to Azure Blob Storage during the upload process. To do that, we are going to use several of the helpers and guidance from the MVC example on file uploads. Here are the important parts.

DO use a multipart form-data request

You’ll see this in the file upload example. Multipart (multipart/form-data) requests are a special type of request designed for sending streams, that can also support sending multiple files or pieces of data. I think the explanation in the swagger documentation is also really helpful to understand this type of request.

The multipart request (which can actually be for a single file) can be read with a MultipartReader that does NOT need to spool the body of the request into memory. By using the multipart form-data request you can also support sending additional data through the request.

It is important to note that although it has “multi-part” in the name, the multipart request does not mean that a single file will be sent in parts. It is not the same as file “chunking”, although the name sounds similar. Chunking files is a separate technique for file uploads – and if you need some features such as the ability to pause and restart or retry partial uploads, chunking may be the way you need to go.

DO prevent MVC from model-binding the request

The example linked above has an attribute class that works perfectly for this: DisableFormValueModelBindingAttribute.cs. With it, we can disable the model binding on the Controller Action that we want to use.

DO increase or disable the request size limitation

This depends on your requirements. You can set the size to something reasonable depending on the file sizes you want to allow. If you get larger than 256MB (the current max for single block upload for blob storage), you may need to do the streaming setup described here and ALSO chunk the files across blobs. Be sure to read the most current documentation to make sure your file sizes are supported with the method you choose.

/// <summary>
/// Upload an document using our streaming method
/// </summary>
/// <returns>A collection of document models</returns>
[DisableFormValueModelBinding]
[ProducesResponseType(typeof(List<DocumentModel>), 200)]
[DisableRequestSizeLimit]
[HttpPost("streamupload")]
public async Task<IActionResult> UploadDocumentStream()
...

DO process the boundaries of the request and send the stream to Azure Blob Storage

Again, this comes mostly from Microsoft’s example, with some special processing to copy the stream of the request body for a single file to Azure Blob Storage. The file content type can be read without touching the stream, along with the filename. But remember, neither of these can always be trusted. You should encode the filename and if you really want to prevent unauthorized types, you could go even further by adding some checking to read the first few bytes of the stream and verify the type.

var sectionFileName = contentDisposition.FileName.Value;
// use an encoded filename in case there is anything weird
var encodedFileName = WebUtility.HtmlEncode(Path.GetFileName(sectionFileName));
// now make it unique
var uniqueFileName = $"{Guid.NewGuid()}_{encodedFileName}";

// read the section filename to get the content type
var fileContentType = MimeTypeHelper.GetMimeType(sectionFileName);

// check the mime type against our list of allowed types
var enumerable = allowedTypes.ToList();
if (!enumerable.Contains(fileContentType.ToLower()))
{
    return new ResultModel<List<DocumentModel>>("fileType", "File type not allowed: " + fileContentType);
}

DO look at the final position of the stream to get the file size

If you want to get or save the filesize, you can check the position of the stream after uploading it to blob storage. Do this instead of trying to get the length of the stream beforehand.

DO remove any signing key from the Uri i if you are preventing direct downloads

The Uri that is generated as part of the blob will include an access token at the end. If you don’t want to let your users have direct blob access, you can trim this part off.

// trick to get the size without reading the stream in memory
var size = section.Body.Position;

// check size limit in case somehow a larger file got through. we can't do it until after the upload because we don't want to put the stream in memory
if (maxBytes < size)
{
    await blobClient.DeleteIfExistsAsync();
    return new ResultModel<List<DocumentModel>>("fileSize", "File too large: " + encodedFileName);
}

var doc = new DocumentModel()
{
    FileName = encodedFileName,
    MimeType = fileContentType,
    FileSize = size,
    // Do NOT include Uri query since it has the SAS credentials; This will return the URL without the querystring.
    // UrlDecode to convert %2F into "/" since Azure Storage returns it encoded. This prevents the folder from being included in the filename.
    Url = WebUtility.UrlDecode(blobClient.Uri.GetLeftPart(UriPartial.Path))
};

DO use a stream upload method to blob storage

There are multiple upload methods available, but make sure you choose one that has an input of a Stream, and use the section.Body stream to send the upload.

var blobClient = blobContainerClient.GetBlobClient(uniqueFileName);

// use a CloudBlockBlob because both BlobBlockClient and BlobClient buffer into memory for uploads
CloudBlockBlob blob = new CloudBlockBlob(blobClient.Uri);
await blob.UploadFromStreamAsync(section.Body);

// set the type after the upload, otherwise will get an error that blob does not exist
await blobClient.SetHttpHeadersAsync(new BlobHttpHeaders { ContentType = fileContentType });

DO performance-profile your results

This may be the most important instruction. After you’ve written your code, run it in Release mode using the Visual Studio Performance Profiling tools. Compare your profiling results to that of a known memory-eating method, such as an IFormFile. Beware that different versions of the Azure Blob Storage library may perform differently. And different implementations may perform differently also! Here were some of my results.

To do this simple profiling, I used PostMan to upload multiple files of around 20MB in several requests. By using a collection, or by opening multiple tabs, you can submit multiple requests at a time to see how the memory of the application is consumed.

postmansetup

First, using an IFormFile. You can see the memory usage increases rapidly for each request using this method.

profile1

Next, using the latest version (v12) of the Azure Blob Storage libraries and a Stream upload method. Notice that it’s not much better than IFormFile! Although BlobStorageClient is the latest way to interact with blob storage, when I look at the memory snapshots of this operation it has internal buffers (at least, at the time of this writing) that cause it to not perform too well when used in this way.

var blobClient = blobContainerClient.GetBlobClient(uniqueFileName);

await blobClient.UploadAsync(section.Body);

profile2

But, using almost identical code and the previous library version that uses CloudBlockBlob instead of BlobClient, we can see a much better memory performance. The same file uploads result in a small increase (due to resource consumption that eventually goes back down with garbage collection), but nothing near the ~600MB consumption like above. I’m sure whatever memory issues exist with the latest libraries will be resolved eventually, but for now, I will use this method.

// use a CloudBlockBlob because both BlobBlockClient and BlobClient buffer into memory for uploads
CloudBlockBlob blob = new CloudBlockBlob(blobClient.Uri);
await blob.UploadFromStreamAsync(section.Body);

profil3

For your reference, here is a version of the upload service methods from that last profiling result:

/// <summary>
/// Upload multipart content from a request body
/// </summary>
/// <param name="requestBody">body stream from the request</param>
/// <param name="contentType">content type from the request</param>
/// <returns></returns>
public async Task<ResultModel<List<DocumentModel>>> UploadMultipartDocumentRequest(Stream requestBody, string contentType)
{
  // configuration values hardcoded here for testing
  var bytes = 104857600;
  var types = new List<string>{ "application/pdf", "image/jpeg", "image/png"};
  var docs = await this.UploadMultipartContent(requestBody, contentType, types, bytes);

  if (docs.Success)
  {
    foreach (var doc in docs.Result)
    {
      // here we could save the document data to a database for tracking
      if (doc?.Url != null)
      {
        Debug.WriteLine($"Document saved: {doc.Url}");
      }
    }
  }

  return docs;
}

/// <summary>
/// Upload multipart content from a request body
/// based on microsoft example https://github.com/dotnet/AspNetCore.Docs/tree/main/aspnetcore/mvc/models/file-uploads/samples/
/// and large file streaming example https://docs.microsoft.com/en-us/aspnet/core/mvc/models/file-uploads?view=aspnetcore-5.0#upload-large-files-with-streaming
/// can accept multiple files in multipart stream
/// </summary>
/// <param name="requestBody">the stream from the request body</param>
/// <param name="contentType">content type from the request</param>
/// <param name="allowedTypes">list of allowed file types</param>
/// <param name="maxBytes">max bytes allowed</param>
/// <returns>a collection of document models</returns>
public async Task<ResultModel<List<DocumentModel>>> UploadMultipartContent(Stream requestBody, string contentType, List<string> allowedTypes, int maxBytes)
{
  // Check if HttpRequest (Form Data) is a Multipart Content Type
  if (!IsMultipartContentType(contentType))
  {
    return new ResultModel<List<DocumentModel>>("requestType", $"Expected a multipart request, but got {contentType}");
  }

  FormOptions defaultFormOptions = new FormOptions();
  // Create a Collection of KeyValue Pairs.
  var formAccumulator = new KeyValueAccumulator();

  // Determine the Multipart Boundary.
  var boundary = GetBoundary(MediaTypeHeaderValue.Parse(contentType), defaultFormOptions.MultipartBoundaryLengthLimit);

  var reader = new MultipartReader(boundary, requestBody);

  var section = await reader.ReadNextSectionAsync();

  List<DocumentModel> docList = new List<DocumentModel>();

  var blobContainerClient = GetBlobContainerClient();

  // Loop through each 'Section', starting with the current 'Section'.
  while (section != null)
  {
    // Check if the current 'Section' has a ContentDispositionHeader.
    var hasContentDispositionHeader = ContentDispositionHeaderValue.TryParse(section.ContentDisposition, out ContentDispositionHeaderValue contentDisposition);

    if (hasContentDispositionHeader)
    {
      if (HasFileContentDisposition(contentDisposition))
      {
        try
        {
          var sectionFileName = contentDisposition.FileName.Value;
          // use an encoded filename in case there is anything weird
          var encodedFileName = WebUtility.HtmlEncode(Path.GetFileName(sectionFileName));
          // now make it unique
          var uniqueFileName = $"{Guid.NewGuid()}_{encodedFileName}";

          // read the section filename to get the content type
          var fileContentType = MimeTypeHelper.GetMimeType(sectionFileName);

          // check the mime type against our list of allowed types
          var enumerable = allowedTypes.ToList();
          if (!enumerable.Contains(fileContentType.ToLower()))
          {
            return new ResultModel<List<DocumentModel>>("fileType", "File type not allowed: " + fileContentType);
          }

          var blobClient = blobContainerClient.GetBlobClient(uniqueFileName);

          // use a CloudBlockBlob because both BlobBlockClient and BlobClient buffer into memory for uploads
          CloudBlockBlob blob = new CloudBlockBlob(blobClient.Uri);
          await blob.UploadFromStreamAsync(section.Body);

          // set the type after the upload, otherwise will get an error that blob does not exist
          await blobClient.SetHttpHeadersAsync(new BlobHttpHeaders { ContentType = fileContentType });

          // trick to get the size without reading the stream in memory
          var size = section.Body.Position;

          // check size limit in case somehow a larger file got through. we can't do it until after the upload because we don't want to put the stream in memory
          if (maxBytes < size)
          {
            await blobClient.DeleteIfExistsAsync();
            return new ResultModel<List<DocumentModel>>("fileSize", "File too large: " + encodedFileName);
          }

          var doc = new DocumentModel()
          {
            FileName = encodedFileName,
            MimeType = fileContentType,
            FileSize = size,
            // Do NOT include Uri query since it has the SAS credentials; This will return the URL without the querystring.
            // UrlDecode to convert %2F into "/" since Azure Storage returns it encoded. This prevents the folder from being included in the filename.
            Url = WebUtility.UrlDecode(blobClient.Uri.GetLeftPart(UriPartial.Path))
          };
          docList.Add(doc);
        }
        catch (Exception e)
        {
          Console.Write(e.Message);
          // could be specific azure error types to look for here
          return new ResultModel<List<DocumentModel>>(null, "Could not upload file: " + e.Message);
        }
      }
      else if (HasFormDataContentDisposition(contentDisposition))
      {
        // if for some reason other form data is sent it would get processed here
        var key = HeaderUtilities.RemoveQuotes(contentDisposition.Name);
        var encoding = GetEncoding(section);
        using (var streamReader = new StreamReader(section.Body, encoding, detectEncodingFromByteOrderMarks: true, bufferSize: 1024, leaveOpen: true))
        {
          var value = await streamReader.ReadToEndAsync();
          if (String.Equals(value, "undefined", StringComparison.OrdinalIgnoreCase))
          {
            value = String.Empty;
          }
          formAccumulator.Append(key.Value, value);

          if (formAccumulator.ValueCount > defaultFormOptions.ValueCountLimit)
          {
            return new ResultModel<List<DocumentModel>>(null, $"Form key count limit {defaultFormOptions.ValueCountLimit} exceeded.");
          }
        }
      }
    }
    // Begin reading the next 'Section' inside the 'Body' of the Request.
    section = await reader.ReadNextSectionAsync();
  }

  return new ResultModel<List<DocumentModel>>(docList);
}

I hope you find this useful as you tackle file upload operations of your own.

Rachel Hagerman

Rachel is a Full-Stack Remote Software Engineer and Architect with over 10 years .NET stack development experience. She started her career in circuitry design and automated test systems before becoming a software consultant. But her true passion has always been for building real, useful, elegant software.

One thought on “Do’s and Don’ts for Streaming File Uploads to Azure Blob Storage with .NET MVC

  • Bernadette

    July 16, 2021 at 10:01 pm

    This is an excellent article and I learned a great deal. Thanks for posting it!

    Reply

Leave a comment

Your email address will not be published. Required fields are marked *