Docs
Unstructured Loader

Unstructured Loader

With the Unstructured Loader, you can load various types of files and extract their content using the Unstructured API.

Installation

Install peer dependencies:

npm install unstructured-client --include=dev

Add Environment Variables

.env
UNSTRUCTURED_API_KEY = 'YOUR_SAMPLE_API_KEY';
/* You can get one from - https://unstructured.io/api-key-hosted */

Copy the code

Add the following code to your utils/unstructuredLoader.ts file:

unstructuredLoader.ts
import { UnstructuredClient } from "unstructured-client";
import { PartitionResponse } from "unstructured-client/sdk/models/operations";
import {
  PartitionParameters,
  Strategy,
} from "unstructured-client/sdk/models/shared";
import * as fs from "fs";
 
interface UnstructuredLoaderProps {
  apiKey: string;
  baseUrl?: string;
}
 
interface LoadUnstructuredDirectoryDataParams {
  filePath: string;
  fileName: string;
  options?: Omit<PartitionParameters, "files">;
  returnText?: boolean;
}
 
interface LoadUnstructuredFileDataParams {
  fileContent: Uint8Array;
  fileName: string;
  returnText?: boolean;
  options?: Omit<PartitionParameters, "files">;
}
 
export class UnstructuredLoader {
  private client: UnstructuredClient;
 
  constructor(props: UnstructuredLoaderProps) {
    const { apiKey, baseUrl } = props;
 
    if (!apiKey || apiKey.trim().length === 0) {
      throw new Error("No API key provided for Unstructured!");
    }
 
    this.client = new UnstructuredClient({
      ...(baseUrl && baseUrl.trim().length !== 0 ? { serverURL: baseUrl } : {}),
      security: {
        apiKeyAuth: apiKey,
      },
    });
  }
 
  async loadUnstructuredDirectoryData(
    params: LoadUnstructuredDirectoryDataParams
  ) {
    const { filePath } = params;
    const fileContent = fs.readFileSync(filePath);
    return this.processFileData({ ...params, fileContent });
  }
 
  async loadUnstructuredFileData(params: LoadUnstructuredFileDataParams) {
    return this.processFileData(params);
  }
 
  private async processFileData({
    fileContent,
    fileName,
    options,
    returnText,
  }: LoadUnstructuredFileDataParams) {
    try {
      const res: PartitionResponse = await this.client.general.partition({
        partitionParameters: {
          files: {
            content: fileContent,
            fileName,
          },
          strategy: options?.strategy ?? Strategy.Auto,
          ...options,
        },
      });
 
      if (res.statusCode !== 200) {
        throw new Error(`Unexpected status code: ${res.statusCode}`);
      }
 
      if (!res.elements || res.elements.length === 0) {
        throw new Error("No elements returned in the response");
      }
 
      return returnText ? this.extractText(res.elements) : res.elements;
    } catch (error: any) {
      throw new Error(`Error processing file data: ${error.message}`);
    }
  }
 
  private extractText(elements: Array<{ [k: string]: any }>): string {
    return elements
      .map((el) => el.text)
      .filter(Boolean)
      .join("\n");
  }
}
 
 

Usage

Initialize client

Initialize the UnstructuredLoader client.

import { UnstructuredLoader } from "@utils/unstructuredLoader";
 
const loader = new UnstructuredLoader({
  apiKey: process.env.UNSTRUCTURED_API_KEY,
});
 

Load files from local directory

Provide the local file path to initiate it's content extraction.

const elementsFromDirectory = await loader.loadUnstructuredDirectoryData({
  filePath: "./sample.png",
  fileName: "Sample_File",
  returnText: true,
});

Load files directly

Files can also be loaded directly, in this example assuming they are received as FormData.

const data = await request.formData();
const file: File | null = data.get("file") as File;
 
const arrayBuffer = await file.arrayBuffer();
const uint8Array = new Uint8Array(arrayBuffer);
 
const elementsFromFile = await loader.loadUnstructuredFileData({
  fileContent: uint8Array,
  fileName: "Sample_File",
  returnText: true,
});

Props

UnstructuredLoader

PropTypeDescriptionDefault
apiKeystringThe API key for Unstructured.io""
baseUrlstring?Server URL in case of self-hosting""

loadUnstructuredDirectoryData

PropTypeDescription
filePathstringThe local file path of the file.
fileNamestringName of the file.
returnTextboolean?If true, the data returned will be a single string.
optionsoptionalAdditional options as specified in the Unstructured documentation.

loadUnstructuredFileData

PropTypeDescription
fileContentUint8ArrayUint8Array content of the file.
fileNamestringName of the file.
returnTextboolean?If true, the data returned will be a single string.
optionsoptionalAdditional options as specified in the Unstructured documentation.

Credits

This component is built on top of Unstructured Typescript SDK