Large Data Volumes (LDV) in Salesforce

If you are working on Salesforce or preparing for Data Architecture & Management Designer certification, understanding Large Data Volumes (LDV) is very important.

As organizations grow, data grows rapidly. What works for thousands of records may fail when your system reaches millions. This is where LDV concepts, performance strategies, and data architecture decisions become critical.

In this blog, we will cover everything step by step in simple English so that you can clearly understand how to handle LDV scenarios in Salesforce.


What is Large Data Volume (LDV)?

Large Data Volume is not a fixed number. It generally refers to situations where your Salesforce org has:

  • Hundreds of thousands to millions of records
  • Complex relationships
  • Heavy reporting and queries

When data grows at this scale, you may face:

  • Slow SOQL queries
  • Slow reports and dashboards
  • Delayed search results
  • Longer data load times

Why is LDV Important?

If LDV is not handled properly, it directly impacts:

  • System performance
  • User experience
  • Data processing speed

A poorly optimized system can frustrate users and slow down business operations. That’s why designing for scale is important from the beginning.


Under the Hood: How Salesforce Works

To understand LDV, you need to know how Salesforce stores and retrieves data.

Salesforce Platform Structure

Salesforce works on three main layers:

  • Metadata Table → Stores object and field definitions
  • Data Table → Stores actual records
  • Virtual Database Layer → Combines metadata + data for users

When you write a SOQL query, Salesforce internally converts it into SQL and fetches data using optimized paths.


How Search Works in Salesforce

  • When a record is created, it takes some time (around 15 minutes) to get indexed
  • Salesforce searches through indexes, not directly the database
  • It filters results based on user access (security)
  • Only a limited result set is returned for performance

This is why sometimes new records don’t appear immediately in search.


Force.com Query Optimizer

The Force.com Query Optimizer decides:

  • Which index to use
  • Whether to scan the full table
  • How to join tables

It ensures that your query runs in the most efficient way possible. Writing optimized queries helps the optimizer make better decisions.


Skinny Tables in Salesforce

What is a Skinny Table?

A skinny table is a special table created by Salesforce Support that combines frequently used fields into one table for faster access.

Key Facts
  • Combines standard + custom fields
  • Improves query performance
  • Best for objects with millions of records
  • Does not include deleted records
  • Automatically updated

Use skinny tables when performance becomes a serious issue.


Indexing Principles

What is an Index?

An index is a sorted structure that helps Salesforce quickly find records without scanning the entire table.

Example

If you search:

SELECT Id FROM Account WHERE CreatedDate > 2024-01-01

Salesforce uses an index on CreatedDate to quickly find matching records.


Standard vs Custom Index

Standard Index Fields

Salesforce automatically indexes:

  • Id
  • Name
  • CreatedDate
  • LastModifiedDate
  • Email
  • Lookup & Master-Detail fields
  • External Id
Custom Index

You can request Salesforce to create indexes on:

  • Frequently queried fields
  • Formula fields (in some cases)

Not supported for:

  • Long text fields
  • Multi-select picklists
  • Currency fields

Indexing Pointers

To make queries faster:

  • Use indexed fields in WHERE clause
  • Avoid NULL conditions
  • Avoid leading wildcard (%abc)
  • Keep query selective

A query is considered selective when it returns a small portion of records.


Best Practices for LDV Performance

  • Always filter data using indexed fields
  • Select only required fields in queries
  • Break large queries into smaller ones
  • Avoid unnecessary joins
  • Use selective filters

These small changes can significantly improve performance.


Common Data Considerations

Ownership Skew

What is it?

When more than 10,000 records are owned by a single user.

Why is it a problem?
  • Heavy sharing calculations
  • Slow role hierarchy updates
How to avoid it?
  • Distribute records across multiple users
  • Avoid using one integration user as owner
  • Use assignment rules

Parenting Skew

What is it?

When more than 10,000 child records are linked to one parent.

Why is it a problem?
  • Record locking issues
  • Slow updates
How to avoid it?
  • Distribute records across multiple parents
  • Use alternative data models

Sharing Considerations

Org Wide Defaults (OWD)
  • Use Public Read/Write where possible
  • Reduce dependency on sharing rules
  • Use “Controlled by Parent” when applicable

Sharing Calculation

Parallel Sharing Rule Recalculation

Allows multiple sharing rules to run in parallel for better performance.

Deferred Sharing Calculation
  • Pause sharing calculations during large updates
  • Resume after changes are complete

This is very useful during data migration.


Granular Locking

Salesforce locks records during updates to maintain data integrity.
Granular locking reduces lock conflicts by allowing parallel updates when possible.


Data Load Strategy (Step-by-Step)

Handling LDV during data migration is critical. Follow this structured approach.


Step 1: Configure Your Organization
  • Enable deferred sharing calculation
  • Set OWD to Public Read/Write temporarily
  • Disable:
    • Triggers
    • Workflows
    • Validation rules

This reduces processing overhead.


Step 2: Prepare the Data
  • Clean and validate data
  • Remove duplicates
  • Ensure correct relationships
  • Test in sandbox

Good data preparation avoids failures later.


Step 3: Execute Data Load
  • Load parent records first
  • Then load child records
  • Use Bulk API for large data
  • Group records by parent Id

Tips:

  • Use insert/update instead of upsert when possible
  • Only update changed fields

Step 4: Configure for Production
  • Re-enable automation
  • Restore OWD settings
  • Apply sharing rules carefully
  • Monitor system performance

Off-Platform Approaches

When data becomes too large, consider storing some data outside Salesforce.


Archiving Data

Why archive?
  • Improve performance
  • Reduce storage cost
  • Maintain clean data

Approaches

1. Middleware-Based Archiving

Use external systems to store old data.

2. Using Heroku

Store large datasets and access them when needed.

3. Big Objects

Used for storing massive historical data inside Salesforce.

4. Buy Tools

Popular tools:

  • Odaseva
  • OwnBackup

Final Thoughts

Large Data Volume in Salesforce is not just a technical topic, it is a design challenge. If you plan your data model, indexing, sharing, and data load strategy properly, you can avoid most performance issues.

For anyone preparing for certification or working on real-time projects, mastering LDV concepts will give you a strong advantage.

Start with basics, test your queries, monitor performance, and continuously optimize your system. That is the key to building scalable Salesforce applications.

Leave a Comment

Your email address will not be published. Required fields are marked *