Skip to content

s3_text_loader

Documentation for AwsS3TextLoader

Functionality

The AwsS3TextLoader class provides functionality for loading and decoding text files from AWS S3 storage. It extends the AwsS3DataLoader class to handle text-specific operations with configurable encoding.

Parameters

  • retry_config: Configuration for retry strategies during network calls.
  • features: Dataset features that are expected.
  • encoding: Character encoding for text decoding (default: "utf-8").
  • kwargs: Additional keyword arguments, such as AWS credentials.

Usage

from embedding_studio.data_storage.loaders.cloud_storage.s3.s3_text_loader import AwsS3TextLoader

loader = AwsS3TextLoader(
    encoding="utf-8",
    aws_access_key_id="YOUR_KEY",
    aws_secret_access_key="YOUR_SECRET"
)
text_content = loader.load_items([
    S3FileMeta(bucket="my-bucket", file="document.txt")
])

Documentation for AwsS3TextLoader._get_item

Functionality

This method processes a downloaded BytesIO file by reading its data.

  • file (io.BytesIO): A binary stream of the file from S3 that is to be decoded into a text string.

Return Value

  • str: The decoded text content from the BytesIO stream.

Usage

  • Purpose: Convert binary content from S3 into a text string by applying the loader's decoding mechanism.

Example

from embedding_studio.data_storage.loaders.cloud_storage.s3.s3_text_loader import AwsS3TextLoader
import io

loader = AwsS3TextLoader(
    encoding="utf-8",
    aws_access_key_id="YOUR_KEY",
    aws_secret_access_key="YOUR_SECRET"
)

with open("document.txt", "rb") as f:
    text = loader._get_item(io.BytesIO(f.read()))

print(text)

Parameters

  • file (io.BytesIO): A binary stream of the file from S3 that is to be decoded into a text string.

Return Value

  • str: The decoded text content from the BytesIO stream.