Add secure, type-aware custom object serialization#154
Conversation
Rewrite the JSON codec in shared.py to emit plain JSON (no internal type marker) and add type-directed deserialization via an optional expected_type. Custom objects round-trip everywhere: - call_activity/call_sub_orchestrator/call_entity gain return_type; wait_for_external_event gains data_type; these also refine the returned task's static type via overloads. - Inbound payloads (orchestrator/activity/entity inputs) and call_activity results are reconstructed from function type annotations (new internal type_discovery module), best-effort and conservative. - Entity get_state and new client OrchestrationState.get_input/get_output/get_custom_status accessors route through the shared codec. - Fix nested-dataclass round-trip bug; chain serialization errors with the original cause. Legacy AUTO_SERIALIZED payloads still deserialize for in-flight replay.
…m-type-serialization # Conflicts: # CHANGELOG.md
|
Expanding on the The gap todaySerialization is currently hardcoded to A Python-equivalent shape# durabletask/serialization.py (new public API)
from abc import ABC, abstractmethod
from typing import Any
class DataConverter(ABC):
@abstractmethod
def serialize(self, value: Any) -> str | None:
...
@abstractmethod
def deserialize(self, data: str | None, target_type: type | None = None) -> Any:
...The default preserves today's behavior, so it's a non-breaking refactor: class JsonDataConverter(DataConverter):
def serialize(self, value):
return None if value is None else shared.to_json(value)
def deserialize(self, data, target_type=None):
return None if data is None else shared.from_json(data, target_type)Then wire it through the worker/client once (a constructor arg defaulting to Pydantic exampleA team using pydantic models could drop in a converter that gives full validation on the way in and pydantic's JSON encoding on the way out, while still falling back to stdlib JSON for plain dicts/dataclasses: import pydantic
from durabletask.internal import shared
from durabletask.serialization import DataConverter
class PydanticDataConverter(DataConverter):
"""Round-trips pydantic models; falls back to stdlib JSON for everything else."""
def serialize(self, value):
if value is None:
return None
if isinstance(value, pydantic.BaseModel):
return value.model_dump_json() # pydantic v2
return shared.to_json(value) # dataclasses, dicts, ...
def deserialize(self, data, target_type=None):
if data is None:
return None
if (
isinstance(target_type, type)
and issubclass(target_type, pydantic.BaseModel)
):
return target_type.model_validate_json(data) # pydantic v2, validates
return shared.from_json(data, target_type)Register it once on each end: converter = PydanticDataConverter()
worker = TaskHubGrpcWorker(data_converter=converter)
client = TaskHubGrpcClient(data_converter=converter)…and pydantic models round-trip everywhere through the same type-directed flow already in this PR: class Order(pydantic.BaseModel):
customer: str
total: float
class Receipt(pydantic.BaseModel):
confirmation_id: str
def orchestrator(ctx, order: Order): # discovery -> Order (validated)
receipt = yield ctx.call_activity(charge, input=order, return_type=Receipt)
return receipt.confirmation_idHere Why this is worth considering over the duck-typed hooks
This is complementary to the PR, not a blocker: the Note One open question for the design: whether the converter is global (one per worker/client) like .NET, or can also be overridden per call (e.g. |
| return vars(o) | ||
| to_json_hook = getattr(o, "to_json", None) | ||
| if callable(to_json_hook): | ||
| return to_json_hook() |
There was a problem hiding this comment.
The to_json() hook name is a footgun here. We feed its return value straight to json.dumps(default=...), which expects a JSON-serializable object (dict/list/etc.) — but the name strongly implies it returns a JSON string. A type whose to_json() returns a string (the conventional meaning) gets double-encoded into a quoted JSON string. Consider naming the hook to_dict (with from_dict on the read side), or at least documenting loudly that it must return a structure, not a string.
| """``default`` hook for :func:`json.dumps` that emits plain JSON. | ||
|
|
||
| Called only for values the JSON encoder cannot natively serialize. Note that | ||
| namedtuples are handled natively by the encoder (serialized as JSON arrays) |
There was a problem hiding this comment.
Worth calling out that this is a real wire-format change for namedtuples, not just an internal detail. The old encoder emitted {field: value, ...} + marker and decoded to a SimpleNamespace; now a namedtuple serializes as a JSON array and decodes to a list. There's also no clean way to restore it — coerce_to_type([...], MyNamedTuple) calls MyNamedTuple([...]), which assigns the whole list to the first field. examples/human_interaction.py raises a namedtuple("Approval", ["approver"]) external event and then reads .approver, so it will break. The PR summary says the on-the-wire format stays stable, which isn't quite true for namedtuples — might be worth either preserving the dict form or documenting the change explicitly.
| if isinstance(value, expected_type): | ||
| return value | ||
|
|
||
| from_json_hook = getattr(expected_type, "from_json", None) |
There was a problem hiding this comment.
The from_json() hook contract is that it receives an already-parsed value (a dict), not a JSON string. That's the opposite of what many libraries mean by from_json (anything modeled on json.loads, or pydantic v1's parse_raw). A class defining from_json(cls, s: str) will silently get a dict and misbehave. Same naming concern as the to_json side — from_dict would be less ambiguous, or document the "receives a parsed value, not a string" contract explicitly.
| if origin is typing.Union or origin is types.UnionType: | ||
| # Optional[T] / Union[...]: if the value already matches a member type, | ||
| # keep it; otherwise coerce to the first non-None member. | ||
| non_none = [a for a in args if a is not type(None)] |
There was a problem hiding this comment.
Minor: for a non-Optional Union[A, B] where the value matches neither member by isinstance, this always coerces to A. Fine for the dominant Optional[T] case, but it can silently mis-coerce genuine unions. Consider leaving the value untouched when it matches no member rather than forcing the first non-None arg.
| # An explicit return_type takes precedence; otherwise, when an activity | ||
| # function reference is supplied, discover its return annotation. | ||
| if return_type is None and not isinstance(activity, str): | ||
| return_type = type_discovery.activity_output_type(activity) |
There was a problem hiding this comment.
This is where discovery silently opts an orchestration into strict result coercion. Combined with the result path having no fallback (see my comment on the taskCompleted handler), merely adding a return annotation to an activity changes runtime behavior: if a payload fails to coerce to the discovered type, the orchestration now throws during replay instead of returning the raw value. Input discovery falls back to the raw value on failure — the result side should probably match, at least for discovered (implicit) types. An explicit return_type=, where fail-fast is defensible, is a different story.
| if not ph.is_empty(event.taskCompleted.result): | ||
| result = shared.from_json(event.taskCompleted.result.value) | ||
| result = shared.from_json( | ||
| event.taskCompleted.result.value, activity_task._expected_type # pyright: ignore[reportPrivateUsage] |
There was a problem hiding this comment.
Asymmetry to flag: the input-coercion paths (orchestrator/activity/entity inputs) wrap coerce_to_type in try/except and fall back to the raw value with a debug log, but the result paths — this taskCompleted handler plus sub-orchestration / external-event / entity-operation completion — call from_json(value, expected_type) with no fallback, so a coercion failure raises inside replay. Worth making the two sides consistent, or at least documenting that result coercion is strict while input coercion is best-effort.
| return callable(getattr(cast(Any, annotation), "from_json", None)) | ||
|
|
||
|
|
||
| @functools.lru_cache(maxsize=None) |
There was a problem hiding this comment.
lru_cache(maxsize=None) keyed by the function object is unbounded. For the normal case (module-level orchestrators/activities) the key set is small, but a worker that registers dynamically-created functions/closures would accumulate entries for the lifetime of the process. A bounded maxsize, or keying on something stable, would avoid a slow leak.


Summary
Reworks user-payload JSON serialization to be secure and type-aware while
keeping the on-the-wire format stable. Custom objects (dataclasses and
from_json()-capable types) now round-trip everywhere in the programming model,and deserialization is driven by caller-supplied or annotation-discovered types
rather than by the payload itself.
Motivation
The previous codec (
shared.to_json/from_json) had real problems:AUTO_SERIALIZEDmarker, so nested dataclasses/namedtuples decodedinconsistently, and everything came back as a
SimpleNamespacerather than theoriginal type.
reconstruct arbitrary classes named in the payload is classic insecure
deserialization (OWASP A08). The secure model is type-directed decoding,
where the destination type comes from the caller/annotations, never the wire.
This also aligns the SDK with the direction of
azure-functions-durableand the.NET
DataConverter(plain JSON + caller-supplied target type).What changed
Type-directed deserialization
call_activity,call_sub_orchestrator, andcall_entitygain an optionalreturn_type;wait_for_external_eventgainsdata_type. When provided, theresult/event is coerced to that type (dataclasses incl. nested /
Optional/listfields, andfrom_json()-capable types). These also refine the statictype of the returned task via
@overload(e.g.
call_activity(..., return_type=Foo) -> CompletableTask[Foo]).client.OrchestrationState:get_input(),get_output(),get_custom_status()— each takes an optionalexpected_typeand is overloaded to return
T | None. Rawserialized_*fields are retained.get_state(intended_type=...)now routes through the shared codec(dataclass +
from_jsonsupport).Annotation-based discovery (new
durabletask/internal/type_discovery.py)call_activityresults are reconstructed from the function's type annotationswhen no explicit type is supplied. Discovery is best-effort and
conservative: builtins and unannotated/unknown types pass through unchanged,
and a payload that fails to coerce falls back to the raw value.
Codec / behavior
to_jsonnow emits plain JSON (no internal type marker). Objects exposingto_json()are serializable.TypeErrorthat chains the original cause andnames the offending type.
AUTO_SERIALIZEDmarker) still deserialize — into aSimpleNamespacewhen notype is supplied, or stripped and coerced when an
expected_typeis given — soin-flight orchestrations replay cleanly across the upgrade.
Behavior change to note
Decoding without a type now yields a plain
dict/list(previously aSimpleNamespacefor marked payloads). Callers that want the original typeshould pass
return_type/data_type/expected_type, or rely on annotationdiscovery. Documented under
## UnreleasedinCHANGELOG.md.Tests & validation
test_serialization.py,test_type_discovery.py,test_orchestration_state.py, plus additions to the entity/orchestrationexecutor tests (codec round-trips, nested dataclasses,
expected_typeprecedence, legacy-marker back-compat, error chaining, annotation discovery,
static-type inference).
pyright(0 errors),flake8(source + tests),pytest(non-e2e), andpymarkdownall green.durabletask-azuremanagedunit tests pass (it reuses the core client/worker).
Note
Opening as a draft for review. Static-type inference is refined on the task
object; the
result = yield ctx.call_activity(...)pattern still yieldsAnydue to Python generator send-type limitations (would require an await-style API).