Python tracking and troubleshooting¶
Python, another runtime in the Agent process, is also garbage collected. Datadog offers two tools with the Agent that can help you identify memory issues:
- Python memory telemetry (Python 3 only)
- Tracemalloc
- Pympler
Python memory telemetry¶
Python memory telemetry hooks into low-level allocator routines, to provide a coarse view of the total memory allocated by the Python memory manager.
Python memory telemetry is only available when using Python 3 (Python 2 lacks the hooks necessary to implement this).
Python memory telemetry is part of the Agent internal telemetry and is enabled by default. Set telemetry.python_memory: false
to disable.
Internal name | Default metric name | Description |
---|---|---|
pymem__alloc | datadog.agent.pymem.alloc | Total number of bytes allocated since the start of the Agent. |
pymem__inuse | datadog.agent.pymem.inuse | Number of bytes currently allocated by the Python interpreter. |
The Python memory manager internally maintains a small reserve of unused memory, so the numbers provided by this tool may be slightly larger than the memory actually used by the Python code.
This telemetry represents memory allocated by pymalloc and the raw allocator (See Memory management in the Python manual). It does not include memory allocated by native extensions and libraries directly via libc.
Tracemalloc¶
Tracemalloc is part of the CPython interpreter, and tracks allocations and frees. It's implemented efficiently and runs with relatively low overhead. It also allows the user to compare memory in different points in time to help identify issues.
Tracemalloc is disabled by default, and only requires the user to enable a flag in the agent config:
Note:One important caveat with regard to enabling the Tracemalloc feature is that it will reduce the number of check runners to 1. This is enforced by the Agent because otherwise the allocations of multiple checks begin to overlap in time making debugging the Tracemalloc output difficult. Imposing a single runner ensures Python checks are executed sequentially producing a more sensible output for debugging purposes.
Once this feature is enabled, the metricdatadog.agent.profile.memory.check_run_alloc
will begin populating in Datadog. The metric is basic and only reflects the memory allocated by a check over time, in each check run, but it is still helpful for identifying regressions and leaks. The metric itself has two tags associated with it:
check_name
check_version
The two should help identify the sources of leaks and memory usage regressions as well as what version they were introduced in.
For a more granular control of how tracemalloc runs, there are an additional set of flags you may want to apply to your check's config on a check by check basis via their respective config files, by using the following directives in the init_config
section:
frames
: the number of stack frames to consider. Please note that this is the total number of frames considered, not the depth of the call-tree. Therefore, in some cases, you may need to set this value to a considerably high value to get a good enough understanding of how your agent is behaving. Default: 100.gc
: whether or not to run the garbage collector before each snapshot to remove noise. Garbage collections will not run by default (?) while tracemalloc is in action. That is to allow us to more easily identify sources of allocations without the interference of the GC. Note that the GC is not permanently disabled, this is only enforced during the check run while tracemalloc is tracking allocations. Default: disabled.combine
: whether or not to aggregate over all traceback frames. useful only to tell which particular usage of a function triggered areas of interest.sort
: what to group results by between:lineno
|filename
|traceback
. Default:lineno
.limit
: the maximum number of sorted results to show. Default: 30.diff
: how to order diff results between:absolute
: absolute value of the difference between consecutive snapshots. Default.positive
: same as absolute, but memory increases will be shown first.
filters
: comma-separated list of file path glob patterns to filter by.unit
: the binary unit to represent memory usage (kib, mb, etc.). Default: dynamic.verbose
: whether or not to include potentially noisy sources. Default: false.
You may also want to run tracemalloc and take a look at the actual debug information generated by the feature for a particular check, beyond just metrics. To do this you can resort to the check command and its optional -m
flag. Running a check as follows will produce detailed memory allocation output for the check:
That will print out some memory information to screen, for instance:
#1: python3.7/abc.py:143: 10.69 KiB
return _abc_subclasscheck(cls, subclass)
#2: simplejson/decoder.py:400: 6.84 KiB
return self.scan_once(s, idx=_w(s, idx).end())
#3: go_expvar/go_expvar.py:142: 4.85 KiB
metric_tags = list(metric.get(TAGS, []))
#4: go_expvar/go_expvar.py:241: 4.45 KiB
results.extend(self.deep_get(new_content, keys[1:], traversed_path + [str(new_key)]))
...
But will also store the profiling information for futher inspection if necessary.
There are additional hidden flags available when performing the memory profiling. Those flags map directly to the configuration options described above and will define and override the tracemalloc behavior. Because these flags are hidden and not meant for the end-user they will not be listed when issuing a datadog-agent check --help
command. The command flags are:
-m-frames
-m-gc
-m-combine
-m-sort
-m-limit
-m-diff
-m-filters
-m-unit
-m-verbose
Additionally there's other command switch: - -m-dir
: an existing directory in which to store memory profiling data, ignoring clean-up.
The directory above must be writable by the user running the agent, typically the dd-agent
user. Once the check command completes, you will be able to find the memory profile files created in the corresponding directory for your delight and careful inspection :)