How to import annotations on HTML data and sample import formats.
You can use the Python SDK to import annotations on HTML data.
This page shows how to declare different annotation types (as Python dictionaries and NDJSON objects) and demonstrates the import process.
A Python notebook demonstrates these steps and can be run directly with Google CoLab.
Supported annotations
To import annotations in Labelbox, you need to create the annotations payload. This section shows how to declare annotations for each supported annotation type.
You can declare annotations as Python SDK annotation types (preferred) or as NDJSON objects.
Classification: Radio (single-choice)
text_annotation = lb_types.ClassificationAnnotation(
name="text_html",
value=lb_types.Text(answer="sample text")
)
text_annotation_ndjson = {
'name': 'text_html',
'answer': 'sample text',
}
Classification: Checklist (multi-choice)
checklist_annotation= lb_types.ClassificationAnnotation(
name="checklist_html", # must match your ontology feature's name
value=lb_types.Checklist(
answer = [
lb_types.ClassificationAnswer(
name = "first_checklist_answer"
),
lb_types.ClassificationAnswer(
name = "second_checklist_answer"
)
]
)
)
checklist_annotation_ndjson = {
'name': 'checklist_html',
'answers': [
{'name': 'first_checklist_answer'},
{'name': 'second_checklist_answer'}
]
}
Classification: Free-form text
text_annotation = lb_types.ClassificationAnnotation(
name="text_html",
value=lb_types.Text(answer="sample text")
)
text_annotation_ndjson = {
'name': 'text_html',
'answer': 'sample text',
}
Example: Import prelabels or ground truth
The steps to import annotations as prelabels (machine assisted learning) are very similar to the steps to import annotations as ground truth labels. They vary in Steps 5 and 6, which detail the differences for each scenario.
Before you start
You will need to import these libraries to use the code examples in this section.
import labelbox as lb
import uuid
import labelbox.types as lb_types
Replace API key
API_KEY = ""
client = lb.Client(API_KEY)
Step 1: Import data rows
The data row must be uploaded to Catalog before attaching annotations.
This example shows how to create an HTML data row.
global_key = "sample_html_1.html"
asset = {
"row_data": "https://storage.googleapis.com/labelbox-datasets/html_sample_data/sample_html_1.html",
"global_key": global_key
}
dataset = client.create_dataset(
name="html_annotation_import_demo_dataset",
iam_integration=None # Removing this argument will default to the organziation's default iam integration
)
task = dataset.create_data_rows([asset])
task.wait_till_done()
print("Errors:", task.errors)
print("Failed data rows: ", task.failed_data_rows)
Step 2: Create an ontology
Your project ontology should include all tools and classifications required by your annotations. To ensure schema feature matches, the tool names and classification names should match the name
fields in your annotation.
To illustrate, suppose you set name
to text_html
when you created the text annotation. When creating the ontology, the same value is used in the name
field of the text classification. The same process must be followed for each tool and classification created in the ontology.
ontology_builder = lb.OntologyBuilder(
classifications=[
lb.Classification(
class_type=lb.Classification.Type.TEXT,
name="text_html"),
lb.Classification(
class_type=lb.Classification.Type.CHECKLIST,
name="checklist_html",
options=[
lb.Option(value="first_checklist_answer"),
lb.Option(value="second_checklist_answer")
]
),
lb.Classification(
class_type=lb.Classification.Type.RADIO,
name="radio_html",
options=[
lb.Option(value="first_radio_answer"),
lb.Option(value="second_radio_answer")
]
)
]
)
ontology = client.create_ontology("Ontology HTML Annotations", ontology_builder.asdict(), media_type=lb.MediaType.Html)
Step 3: Create labeling project
Connect the ontology to the labeling project.
project = client.create_project(name="html_project",
media_type=lb.MediaType.Html)
# Setup your ontology
project.setup_editor(ontology)
Step 4: Send data rows to project
batch = project.create_batch(
"first-batch-html-demo", # Each batch in a project must have a unique name
global_keys=[global_key], # Paginated collection of data row objects, list of data row ids or global keys
priority=5 # priority between 1(highest) - 5(lowest)
)
print("Batch: ", batch)
Step 5: Create annotation payloads
Use the earlier examples for help creating annotation payloads.
These examples show each supported annotation format and describe how to compose annotations into labels attached to the data rows.
These examples show how to create each supported annotation type.
label = []
label.append(
lb_types.Label(
data=lb_types.HTMLData(
global_key=global_key
),
annotations=[
text_annotation,
checklist_annotation,
radio_annotation
]
)
)
label_ndjson = []
for annotations in [text_annotation_ndjson,
checklist_annotation_ndjson,
radio_annotation_ndjson]:
annotations.update({
'dataRow': {
'globalKey': global_key
}
})
label_ndjson.append(annotations)
Step 6: Import annotation payload
Whether you're uploading annotations as prelabels (model assisted labeling) or as ground truth labels, pass your annotation payloads the the value of the predictions
or labels
parameters.
Option A: Upload as prelabels (model-assisted labeling)
upload_job = lb.MALPredictionImport.create_from_objects(
client = client,
project_id = project.uid,
name=f"mal_job-{str(uuid.uuid4())}",
predictions=label)
upload_job.wait_until_done();
print("Errors:", upload_job.errors)
print("Status of uploads: ", upload_job.statuses)
Option B: Upload as ground truth
upload_job = lb.LabelImport.create_from_objects(
client = client,
project_id = project.uid,
name="label_import_job"+str(uuid.uuid4()),
labels=label_ndjson)
print("Errors:", upload_job.errors)