Skip to main content

Add Dataset to Project

This guide provides instructions on how to add datasets to your labeling projects. Datasets are collections of data files (images, text files, or other media) that need to be labeled by annotators. You can add datasets to both new projects and existing projects.


Adding Dataset to a New Project

After creating a new project, follow these steps to add your first dataset:

Step 1: Navigate to Project Data Section

  1. Go to your Projects page
  2. Select the project where you want to add data
  3. Click on the Data tab or section in the project

alt text

Checking for Existing Data

Look for the Datasets section. If no data exists, this section will be marked in red, indicating you need to add datasets before labeling can begin.

Step 2: Add Your First Dataset

  1. Click the Add dataset button
  2. A file selection dialog will open where you can:
    • Browse your local file system
    • Select single or multiple files
    • Choose from supported file formats

alt text

Important Notes
  • Ensure your files meet the project's format requirements
  • Check file size limits before uploading
  • Verify that files are properly formatted and readable

Step 3: Choose or Create Dataset Name

After selecting your files, you'll be prompted to name your dataset:

  1. If you have existing datasets prepared, select from the dropdown list
  2. Or create a new dataset by entering a name

alt text

Dataset Naming Best Practices
  • Use descriptive names that indicate content or purpose
  • Include version numbers if uploading multiple versions (e.g., "Dataset_v1", "Dataset_2024_01")
  • Use consistent naming conventions across all datasets
  • Avoid special characters that might cause issues

Options:

  • Click Yes to confirm and proceed with adding the data
  • Click No to cancel and return to the previous screen
Cost Estimate

The actual cost may vary slightly based on final processing results, but the estimate should be very close to the final amount.

Step 5: Upload Confirmation

Once the data is successfully uploaded, you will receive an automated email confirmation containing:

  • Data file name: Name of the uploaded file(s)
  • Number of sentences uploaded: Total count of processed data items
  • Number of sentences created: Count of unique items successfully added
  • Number of duplicate sentences: Count of items rejected due to duplication
Email Confirmation

Save the email for your records. Use it to verify the upload completed successfully and reference it if you need to contact support.

Step 6: View Dataset Details

After receiving the confirmation email, return to the project dashboard to view detailed information about your uploaded dataset.

In the Data section, you can view comprehensive statistics about your dataset:

No.Field NameDescription
1Creation TimestampDate and time when the dataset was created and uploaded
2Dataset NameName assigned to the dataset during upload
3PriorityPriority order for processing (higher priority processed first)
4No.WITotal number of Work Items (WI) in the dataset
5NewNumber of work items waiting to be assigned
6In ProgressNumber of items currently being labeled
7FixingNumber of items returned for correction
8DeliveredNumber of completed items ready for review
9DoneNumber of fully completed and approved items
10Delete datasetAction button to delete the entire dataset

alt text

Understanding the Statistics:

  • Monitor progression: New → In Progress → Delivered → Done
  • Use "Fixing" count to identify quality issues
  • The sum of all status counts should equal the total "No.WI" count

Managing Dataset Content

View Individual Work Item Details

To view detailed information about individual work items within your dataset:

  1. Click the View Details icon next to the dataset entry
  2. A detailed view will open showing all work items

Details View Fields:

No.Field NameDescription
1Image NameUnique identifier or filename of the data item
2DataPreview or thumbnail of the actual image/data
3Stage completedNumber of labeling stages completed
4StatusCurrent status (New, In Progress, Delivered, Done, Fixing)
5Delete WIAction button to delete this specific Work Item

alt text

Additional Features:

  • Filter work items by status, annotator, or other criteria
  • Search for specific items by name or identifier
  • Select multiple items for batch operations
  • Export lists or reports of work items

View Data by Labeler (Annotator)

Filter and view work items based on which annotator is assigned or has worked on them.

How to filter by labeler:

  1. Navigate to the dataset details view
  2. Locate the Filter by Labeler section (sidebar or dropdown menu)
  3. Select the annotator's name
  4. The view will update to show only work items for that annotator

Use Cases:

  • Quality control: Review work completed by specific annotators
  • Performance monitoring: Track individual annotator productivity
  • Workload management: Ensure balanced work distribution
  • Issue resolution: Identify and address problems with specific annotators' work

alt text


Deleting Datasets and Work Items

Deleting an Entire Dataset

To remove an entire dataset from your project:

  1. Navigate to the dataset overview section
  2. Locate the dataset you want to delete
  3. Click the Delete Dataset icon (trash can symbol)
  4. Confirm the deletion when prompted
Important Considerations

Refund Policy:

  • Upon successful deletion, the previously calculated cost will be automatically refunded
  • Refund amount matches the original cost charged during upload

Deletion Restrictions:

  • You can only delete datasets that are newly uploaded and not yet labeled
  • If work items have been assigned, started, or completed, deletion may be restricted
  • Datasets in active labeling workflows may have deletion limitations

What Happens When You Delete:

  • All work items within the dataset are removed
  • Associated costs are refunded
  • Dataset is permanently removed from the system (may be irreversible)
  • Any assignments to annotators are automatically cancelled

alt text

alt text

Before Deleting:

  • Verify no annotators are currently working on items in the dataset
  • Ensure you have a backup of the data if needed
  • Confirm deletion is necessary and cannot be resolved through other means

Deleting Individual Work Items

To remove specific items from a dataset:

  1. Click View Details to see all work items in the dataset
  2. Select the item you want to delete
  3. Click the Delete icon associated with that item
  4. Review the refund amount displayed
  5. Confirm the deletion action
Refund Information
  • Refund amount represents the cost returned for that specific work item
  • Refunds are calculated proportionally based on the original dataset cost
  • Refund is automatically processed and added to your account balance

Deletion Requirements:

  • Work items must be newly uploaded and unlabeled
  • Items assigned to annotators or with labeling progress cannot be deleted
  • Items in completed or delivered batches may have deletion restrictions

Use Cases:

  • Remove duplicate or incorrect items accidentally uploaded
  • Clean up test data included in the dataset
  • Remove items that don't meet quality standards
  • Adjust dataset size by removing unnecessary items

alt text

alt text


Adding Dataset to a Project with Existing Data

When your project already contains datasets, you can add additional datasets following similar steps with some important differences.

Step 1: Access Dataset Overview

Navigate to the Dataset overview section. You'll see:

  • List of all uploaded datasets
  • Current status and progress for each dataset
  • Options to manage existing datasets
  • The Add Data button for adding new datasets

Step 2: Add New Dataset

Click the Add Data icon (button, plus sign, or "Add Dataset" link) to begin uploading a new dataset.

The upload process follows the same steps:

  1. Select your data files
  2. Name your dataset
  3. Review estimated costs
  4. Confirm the upload

Step 3: Handle Duplicate Data Detection

When adding datasets to an existing project, the system automatically checks for duplicate data items across all datasets.

What happens with duplicates:

  • Automatic Detection: System scans your new dataset against all existing datasets
  • Error Prevention: If duplicates are detected, the upload process will be blocked
  • Error Message: You'll receive a clear message indicating duplicate data was found
  • Upload Blocked: New data cannot be uploaded until duplicates are resolved
Duplicate Detection
  • The system compares data items based on content, not just filenames
  • Exact matches and near-duplicates may be flagged
  • Duplicate detection helps maintain data quality and prevents redundant labeling

How to resolve duplicate issues:

  1. Review the Error Message: Check which items are duplicates
  2. Remove Duplicates: Clean your dataset by removing duplicate items
  3. Verify Data: Double-check you're not accidentally re-uploading existing data
  4. Re-attempt Upload: Try uploading again after removing duplicates
Best Practices for Multiple Datasets
  • Naming Convention: Use consistent naming (e.g., "Dataset_v1", "Dataset_v2", "Dataset_2024_01")
  • Version Control: Keep track of which datasets contain which data
  • Incremental Uploads: Consider uploading in smaller batches for better management
  • Documentation: Maintain records of what each dataset contains

Step 4: Post-Upload Verification

After successfully adding a new dataset:

  1. Verify Upload: Confirm the new dataset appears in your dataset list
  2. Review Statistics: Check that work item counts match expectations
  3. Check Costs: Review updated cost calculations
  4. Monitor Progress: Track how the new dataset progresses through the labeling workflow

Managing Multiple Datasets:

  • Priority Settings: Adjust priorities if certain datasets should be processed first
  • Status Monitoring: Track progress of each dataset independently
  • Bulk Operations: Some operations allow managing multiple datasets simultaneously

alt text


Additional Considerations

Workflow Integration

  • New datasets are automatically integrated into your existing project workflow
  • Annotators assigned to the project can access items from all datasets
  • Progress tracking works across all datasets in the project

Cost Management

  • Each new dataset adds to your total project cost
  • Review cost estimates before confirming each upload
  • Monitor your overall project budget as you add more datasets

Quality Assurance

  • Maintain consistency across datasets to ensure labeling quality
  • Use similar data formats and structures when possible
  • Consider dataset organization strategies for large projects

Summary

Quick Checklist for Adding Datasets:

  • Navigate to project overview
  • Click "Add dataset" button
  • Select files from your local system
  • Choose or create dataset name
  • Review estimated cost
  • Confirm upload
  • Verify email confirmation
  • Check dataset details and statistics

Remember:

  • Complete your dataset setup before starting labeling
  • Monitor dataset statistics regularly
  • Use consistent naming conventions
  • Handle duplicates before uploading
  • Keep track of costs and budgets