Split PDF Files into Individual Pages using Azure Function App
Managing PDF files often requires splitting them into individual pages for easier processing or sharing. In this blog, we'll walk through creating an Azure Function App that splits a PDF file into individual pages and returns them as a single zip file. This solution leverages the power of Azure's serverless computing and integrates seamlessly with other services.
Prerequisites
- Azure Subscription: Ensure you have an active Azure account.
- Azure Storage Account: Required for storing function app files.
- Visual Studio or VS Code: For developing the function app.
- Azure Functions Core Tools: Install locally to test the function app.
- PDF Library: We'll use a library like
PdfSharporiText7. - Zip Library: The
.NET System.IO.Compressionnamespace provides zip functionality.
1. Create the Azure Function App
1.1. Set Up the Project
- Open Visual Studio or VS Code.
- Create a new Azure Functions project:
- Choose "HTTP trigger" as the template.
- Provide a name like
SplitPdfFunction.
- Select
.NETas the runtime.
1.2. Install Required Libraries
Install the necessary NuGet packages:
Install-Package PdfSharpCore
Install-Package System.IO.Compression
Install-Package System.IO.Compression.ZipFile
2. Writing the Function
2.1. Function Logic
Below is the complete function logic for splitting a PDF file and returning a zip file:
using System;
using System.IO;
using System.IO.Compression;
using System.Net;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Logging;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
namespace PdfSplitterFunction
{
public static class SplitPdfFunction
{
[FunctionName("SplitPdf")]
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Function, "post", Route = null)] HttpRequest req,
ILogger log)
{
log.LogInformation("SplitPdf function triggered.");
try
{
// Read the uploaded PDF file
var pdfFile = req.Form.Files["file"];
if (pdfFile == null || pdfFile.Length == 0)
{
return new BadRequestObjectResult("Please upload a valid PDF file.");
}
using (var memoryStream = new MemoryStream())
{
await pdfFile.CopyToAsync(memoryStream);
memoryStream.Position = 0;
// Load the PDF document
var pdfDocument = PdfReader.Open(memoryStream, PdfDocumentOpenMode.Import);
// Create a temporary directory for individual pages
string tempDir = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString());
Directory.CreateDirectory(tempDir);
// Split PDF into individual pages
for (int i = 0; i < pdfDocument.PageCount; i++)
{
var newPdf = new PdfDocument();
newPdf.AddPage(pdfDocument.Pages[i]);
string pagePath = Path.Combine(tempDir, $"Page-{i + 1}.pdf");
newPdf.Save(pagePath);
}
// Create a zip file
string zipPath = Path.Combine(Path.GetTempPath(), "SplitPages.zip");
ZipFile.CreateFromDirectory(tempDir, zipPath);
// Read zip file into memory
var zipBytes = await File.ReadAllBytesAsync(zipPath);
// Clean up temporary files
Directory.Delete(tempDir, true);
File.Delete(zipPath);
// Return the zip file
return new FileContentResult(zipBytes, "application/zip")
{
FileDownloadName = "SplitPages.zip"
};
}
}
catch (Exception ex)
{
log.LogError($"Error splitting PDF: {ex.Message}");
return new StatusCodeResult(StatusCodes.Status500InternalServerError);
}
}
}
}
2.2. Key Points
- Upload Handling: The function reads the uploaded PDF file using
HttpRequest.Form.Files. - Splitting Pages: Each page is extracted using
PdfSharp.Pdf.IO.PdfReaderand saved as an individual PDF. - Zipping: The
System.IO.Compressionlibrary is used to package the split PDFs into a zip file. - Cleanup: Temporary files and directories are deleted after processing.
3. Deploying the Function App
3.1. Publish to Azure
- Right-click the project in Visual Studio and select "Publish."
- Choose "Azure" as the target.
- Select your Azure subscription and Function App.
- Deploy the project.
3.2. Test the Function App
- Navigate to the Function App URL in the Azure portal.
- Use tools like Postman or a custom HTML form to upload a PDF file to the endpoint.
4. Testing Locally
- Run the function locally using
func start. - Use a tool like Postman to send a POST request with a PDF file to
http://localhost:7071/api/SplitPdf.
5. Enhancements
- Authentication: Add security with Azure AD or API keys.
- Logging: Enhance logging for better monitoring and debugging.
- Error Handling: Implement more granular error handling.
- Blob Storage: Store the zip file in Azure Blob Storage for larger workflows.
Conclusion
This Azure Function App demonstrates how to split a PDF file into individual pages and package them as a zip file. By using Azure Functions, you can create scalable and efficient solutions for PDF processing, integrating seamlessly with other Azure services.
- Submitted By Vibhuti Singh
- Category ms-azure
- Created On 11-Jan-2025