Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Static analysis in GCC 10

March 26, 2020
David Malcolm
Related topics:
SecurityC, C#, C++Open source

Share:

    I work at Red Hat on GCC, the GNU Compiler Collection. For the next major release of GCC, GCC 10, I've been implementing a new -fanalyzer option: A static analysis pass to identify various problems at compile-time, rather than at runtime.

    My thinking here is that it's best to catch problems as early as possible as the code is written, using the compiler the code is written in as part of the compile-edit-debug cycle, rather than having static analysis as an extra tool "on the side" (perhaps proprietary). Hence, it seems worthwhile to have a static analyzer built into the compiler that can see exactly the same code as the compiler sees—because it is the compiler.

    This issue is, of course, a huge problem to tackle. For this release, I've focused on the kinds of problems seen in C code—and, in particular double-free bugs—but with a view toward creating a framework that we can expand on in subsequent releases (when we can add more checks and support languages other than C).

    My hope is that the analyzer provides a decent amount of extra checking while not being too expensive. I've aimed for -fanalyzer to "merely" double the compile time as a reasonable trade-off for the extra checks. I haven't succeeded yet, as you'll see below, but I'm working on it.

    Right now the code is in GCC's master branch for GCC 10 and can be tried out on Compiler Explorer, aka godbolt.org. It works well for small and medium-sized examples, but there are bugs that mean it's not ready for production use. I'm working hard on fixing things in the hope that the feature will be meaningfully usable for C code by the time of GCC 10's release (likely in April).

    Diagnostic paths

    Here's the simplest possible example of a double-free bug:

    #include <stdlib.h>
    
    void test(void *ptr)
    {
      free(ptr);
      free(ptr);
    }
    

    GCC 10 with -fanalyzer reports it as follows:

    $ gcc -c -fanalyzer double-free-1.c
    double-free-1.c: In function ‘test’:
    double-free-1.c:6:3: warning: double-‘free’ of ‘ptr’ [CWE-415] [-Wanalyzer-double-free]
        6 |   free(ptr);
          |   ^~~~~~~~~
      ‘test’: events 1-2
        |
        |    5 |   free(ptr);
        |      |   ^~~~~~~~~
        |      |   |
        |      |   (1) first ‘free’ here
        |    6 |   free(ptr);
        |      |   ~~~~~~~~~
        |      |   |
        |      |   (2) second ‘free’ here; first ‘free’ was at (1)
        |
    

    This response shows that GCC has learned some new tricks; first, the ability for diagnostics to have Common Weakness Enumeration (CWE) identifiers. In this example, the double-free diagnostic is tagged with CWE-415. This tag hopefully makes the output more clear, improves precision, and gives you something simple to type into search engines. So far, only diagnostics from -fanalyzer have been tagged with CWE weakness identifiers.

    If you're using GCC 10 with a suitable terminal (e.g. recent gnome-terminal), the CWE identifier is a clickable hyperlink, taking you to a description of the problem. Speaking of hyperlinks, for many releases, when GCC emits a warning it prints the option controlling that warning. As of GCC 10, that option text is now a clickable hyperlink (again, assuming a sufficiently capable terminal), which should take you to the documentation for that option (for any warning, not just the ones relating to the analyzer).

    Second, GCC diagnostics can now have a chain of events associated with them, describing a path through the code that triggers the problem. Given the lack of control flow in the above example, it has just two events, but you can see how the second event refers to the first event in its description.

    Here's a more involved example. Can you see the issue in the following code? (Hint: It's not a double-free this time):

    #include <setjmp.h>
    #include <stdlib.h>
    
    static jmp_buf env;
    
    static void inner(void)
    {
      longjmp(env, 1);
    }
    
    static void middle(void)
    {
      void *ptr = malloc(1024);
      inner();
      free(ptr);
    }
    
    void outer(void)
    {
      int i;
    
      i = setjmp(env);
      if (i == 0)
        middle();
    }
    

    Here's what GCC's -fanalyzer reports, which shows the interprocedural control flow via ASCII art:

    $ gcc -c -fanalyzer longjmp-demo.c
    longjmp-demo.c: In function ‘inner’:
    longjmp-demo.c:8:3: warning: leak of ‘ptr’ [CWE-401] [-Wanalyzer-malloc-leak]
        8 |   longjmp(env, 1);
          |   ^~~~~~~~~~~~~~~
      ‘outer’: event 1
        |
        |   18 | void outer(void)
        |      |      ^~~~~
        |      |      |
        |      |      (1) entry to ‘outer’
        |
      ‘outer’: event 2
        |
        |   22 |   i = setjmp(env);
        |      |       ^~~~~~
        |      |       |
        |      |       (2) ‘setjmp’ called here
        |
      ‘outer’: events 3-5
        |
        |   23 |   if (i == 0)
        |      |      ^
        |      |      |
        |      |      (3) following ‘true’ branch (when ‘i == 0’)...
        |   24 |     middle();
        |      |     ~~~~~~~~
        |      |     |
        |      |     (4) ...to here
        |      |     (5) calling ‘middle’ from ‘outer’
        |
        +--> ‘middle’: events 6-8
               |
               |   11 | static void middle(void)
               |      |             ^~~~~~
               |      |             |
               |      |             (6) entry to ‘middle’
               |   12 | {
               |   13 |   void *ptr = malloc(1024);
               |      |               ~~~~~~~~~~~~
               |      |               |
               |      |               (7) allocated here
               |   14 |   inner();
               |      |   ~~~~~~~
               |      |   |
               |      |   (8) calling ‘inner’ from ‘middle’
               |
               +--> ‘inner’: events 9-11
                      |
                      |    6 | static void inner(void)
                      |      |             ^~~~~
                      |      |             |
                      |      |             (9) entry to ‘inner’
                      |    7 | {
                      |    8 |   longjmp(env, 1);
                      |      |   ~~~~~~~~~~~~~~~
                      |      |   |
                      |      |   (10) ‘ptr’ leaks here; was allocated at (7)
                      |      |   (11) rewinding from ‘longjmp’ in ‘inner’...
                      |
        <-------------+
        |
      ‘outer’: event 12
        |
        |   22 |   i = setjmp(env);
        |      |       ^~~~~~
        |      |       |
        |      |       (12) ...to ‘setjmp’ in ‘outer’ (saved at (2))
        |
    

    The above is rather verbose, though perhaps it needs to be to convey what's going on, given the use of setjmp and longjmp. I hope the description is reasonably clear: There's a memory leak that occurs when the call to longjmp unwinds the stack back to outer past the cleanup point in middle, without invoking the cleanup.

    If you don't like the ASCII art above, you can view the events as separate "note" diagnostics with -fdiagnostics-path-format=separate-events:

    $ gcc -c -fanalyzer -fdiagnostics-path-format=separate-events longjmp-demo.c
    longjmp-demo.c: In function ‘inner’:
    longjmp-demo.c:8:3: warning: leak of ‘ptr’ [CWE-401] [-Wanalyzer-malloc-leak]
        8 |   longjmp(env, 1);
          |   ^~~~~~~~~~~~~~~
    longjmp-demo.c:18:6: note: (1) entry to ‘outer’
       18 | void outer(void)
          |      ^~~~~
    In file included from longjmp-demo.c:1:
    longjmp-demo.c:22:7: note: (2) ‘setjmp’ called here
       22 |   i = setjmp(env);
          |       ^~~~~~
    longjmp-demo.c:23:6: note: (3) following ‘true’ branch (when ‘i == 0’)...
       23 |   if (i == 0)
          |      ^
    longjmp-demo.c:24:5: note: (4) ...to here
       24 |     middle();
          |     ^~~~~~~~
    longjmp-demo.c:24:5: note: (5) calling ‘middle’ from ‘outer’
    longjmp-demo.c:11:13: note: (6) entry to ‘middle’
       11 | static void middle(void)
          |             ^~~~~~
    longjmp-demo.c:13:15: note: (7) allocated here
       13 |   void *ptr = malloc(1024);
          |               ^~~~~~~~~~~~
    longjmp-demo.c:14:3: note: (8) calling ‘inner’ from ‘middle’
       14 |   inner();
          |   ^~~~~~~
    longjmp-demo.c:6:13: note: (9) entry to ‘inner’
        6 | static void inner(void)
          |             ^~~~~
    longjmp-demo.c:8:3: note: (10) ‘ptr’ leaks here; was allocated at (7)
        8 |   longjmp(env, 1);
          |   ^~~~~~~~~~~~~~~
    longjmp-demo.c:8:3: note: (11) rewinding from ‘longjmp’ in ‘inner’...
    In file included from longjmp-demo.c:1:
    longjmp-demo.c:22:7: note: (12) ...to ‘setjmp’ in ‘outer’ (saved at (2))
       22 |   i = setjmp(env);
          |       ^~~~~~
    

    or turn them off altogether with -fdiagnostics-path-format=none. There's also a JSON output format.

    All of the new diagnostics have a -Wanalyzer-SOMETHING name: We've already seen -Wanalyzer-double-free and -Wanalyzer-malloc-leak above. These diagnostics are all enabled when -fanalyzer is enabled, but they can be selectively disabled via the -Wno-analyzer-SOMETHING variants (e.g., via pragmas).

    What are the new warnings?

    As well as double-free detection, there are checks for malloc and fopen leaks:

    #include <stdio.h>
    #include <stdlib.h>
    
    void test(const char *filename)
    {
      FILE *f = fopen(filename, "r");
      void *p = malloc(1024);
      /* do stuff */
    }
    
    $ gcc -c -fanalyzer leak.c
    leak.c: In function ‘test’:
    leak.c:9:1: warning: leak of ‘p’ [CWE-401] [-Wanalyzer-malloc-leak]
        9 | }
          | ^
      ‘test’: events 1-2
        |
        |    7 |   void *p = malloc(1024);
        |      |             ^~~~~~~~~~~~
        |      |             |
        |      |             (1) allocated here
        |    8 |   /* do stuff */
        |    9 | }
        |      | ~
        |      | |
        |      | (2) ‘p’ leaks here; was allocated at (1)
        |
    leak.c:9:1: warning: leak of FILE ‘f’ [CWE-775] [-Wanalyzer-file-leak]
        9 | }
          | ^
      ‘test’: events 1-2
        |
        |    6 |   FILE *f = fopen(filename, "r");
        |      |             ^~~~~~~~~~~~~~~~~~~~
        |      |             |
        |      |             (1) opened here
        |......
        |    9 | }
        |      | ~
        |      | |
        |      | (2) ‘f’ leaks here; was opened at (1)
        |
    

    For using memory after it has been freed:

    #include <stdlib.h>
    
    struct link { struct link *next; };
    
    int free_a_list_badly(struct link *n)
    {
      while (n) {
        free(n);
        n = n->next;
      }
    }
    
    $ gcc -c -fanalyzer use-after-free.c
    use-after-free.c: In function ‘free_a_list_badly’:
    use-after-free.c:9:7: warning: use after ‘free’ of ‘n’ [CWE-416] [-Wanalyzer-use-after-free]
        9 |     n = n->next;
          |     ~~^~~~~~~~~
      ‘free_a_list_badly’: events 1-4
        |
        |    7 |   while (n) {
        |      |         ^
        |      |         |
        |      |         (1) following ‘true’ branch (when ‘n’ is non-NULL)...
        |    8 |     free(n);
        |      |     ~~~~~~~
        |      |     |
        |      |     (2) ...to here
        |      |     (3) freed here
        |    9 |     n = n->next;
        |      |     ~~~~~~~~~~~
        |      |       |
        |      |       (4) use after ‘free’ of ‘n’; freed at (3)
        |
    

    For freeing a non-heap pointer:

    #include <stdlib.h>
    
    void test(int n)
    {
      int buf[10];
      int *ptr;
    
      if (n < 10)
        ptr = buf;
      else
        ptr = (int *)malloc(sizeof (int) * n);
    
      /* do stuff.  */
    
      /* oops; this free should be conditionalized.  */
      free(ptr);
    }
    
    $ gcc -c -fanalyzer heap-vs-stack.c
    heap-vs-stack.c: In function ‘test’:
    heap-vs-stack.c:16:3: warning: ‘free’ of ‘ptr’ which points to memory not on the heap [CWE-590] [-Wanalyzer-free-of-non-heap]
       16 |   free(ptr);
          |   ^~~~~~~~~
      ‘test’: events 1-4
        |
        |    8 |   if (n < 10)
        |      |      ^
        |      |      |
        |      |      (1) following ‘true’ branch (when ‘n <= 9’)...
        |    9 |     ptr = buf;
        |      |     ~~~~~~~~~
        |      |         |
        |      |         (2) ...to here
        |      |         (3) pointer is from here
        |......
        |   16 |   free(ptr);
        |      |   ~~~~~~~~~
        |      |   |
        |      |   (4) call to ‘free’ here
        |
    

    For using a function that's known to be unsafe to use inside a signal handler:

    #include <stdio.h>
    #include <signal.h>
    
    extern void body_of_program(void);
    
    void custom_logger(const char *msg)
    {
      fprintf(stderr, "LOG: %s", msg);
    }
    
    static void handler(int signum)
    {
      custom_logger("got signal");
    }
    
    int main(int argc, const char *argv)
    {
      custom_logger("started");
    
      signal(SIGINT, handler);
    
      body_of_program();
    
      custom_logger("stopped");
    
      return 0;
    }
    
    $ gcc -c -fanalyzer signal.c
    signal.c: In function ‘custom_logger’:
    signal.c:8:3: warning: call to ‘fprintf’ from within signal handler [CWE-479] [-Wanalyzer-unsafe-call-within-signal-handler]
        8 |   fprintf(stderr, "LOG: %s", msg);
          |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      ‘main’: events 1-2
        |
        |   16 | int main(int argc, const char *argv)
        |      |     ^~~~
        |      |     |
        |      |     (1) entry to ‘main’
        |......
        |   20 |   signal(SIGINT, handler);
        |      |   ~~~~~~~~~~~~~~~~~~~~~~~
        |      |   |
        |      |   (2) registering ‘handler’ as signal handler
        |
      event 3
        |
        |cc1:
        | (3): later on, when the signal is delivered to the process
        |
        +--> ‘handler’: events 4-5
               |
               |   11 | static void handler(int signum)
               |      |             ^~~~~~~
               |      |             |
               |      |             (4) entry to ‘handler’
               |   12 | {
               |   13 |   custom_logger("got signal");
               |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~~~
               |      |   |
               |      |   (5) calling ‘custom_logger’ from ‘handler’
               |
               +--> ‘custom_logger’: events 6-7
                      |
                      |    6 | void custom_logger(const char *msg)
                      |      |      ^~~~~~~~~~~~~
                      |      |      |
                      |      |      (6) entry to ‘custom_logger’
                      |    7 | {
                      |    8 |   fprintf(stderr, "LOG: %s", msg);
                      |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                      |      |   |
                      |      |   (7) call to ‘fprintf’ from within signal handler
                      |
    

    Along with other warnings.

    What's left to do?

    As it stands, the checker works well on small- and medium-sized examples, but there are two problem areas I'm running into as I scale it up to real-world C code. First, there are bugs in my state-management code. Within the checker are classes for describing program state in an abstract way. The checker explores the program, building a directed graph of (point, state) pairs with logic for simplifying state and merging state at control flow join-points.

    In theory, if the state gets too complicated, the checker is meant to go into a least-defined state, but there are bugs with this approach that lead to the number of states at a given point exploding, which then leads to the checker running slowly, eventually hitting a safety limit, and not fully exploring the program. To fix this, I've been rewriting the guts of the state-management code. I hope to land the rewrite in "master" next week.

    Second, even if we do fully explore the program, the paths through the code generated by -fanalyzer are sometimes ludicrously verbose. The worst I've seen is a 110-event path for the use of uninitialized data reported when compiling GCC itself. I think this one was a false positive, but clearly it's unreasonable to expect users to wade through something like that.

    The analyzer tries to find the shortest feasible path through the (point, state) graph, generates a chain of events from it, and then tries to simplify the chain. Effectively, it's applying a series of peephole optimizations to the chain of events to come up with a minimal chain that expresses the problem.

    I recently implemented a way of filtering irrelevant control-flow edges from the path, which ought to help, and I'm working on a similar patch to eliminate redundant interprocedural edges.

    To give a concrete example, I tried the analyzer on a real bug (albeit one from fifteen years ago)—CVE-2005-1689, a double-free vulnerability in krb5 1.4.1. It correctly identifies the bug with no false positives, but the output is currently 170 lines of stderr. Rather than showing the output inline here, you can see it at this link.

    Initially, the above was 1187 lines of stderr. I fixed various bugs and implemented more simplifications to get it down to 170 lines. Part of the problem is that the free is being done through a krb5_xfree macro and the path-printing code shows how each macro is expanded each time an event occurs within a macro. Perhaps the output should only show each macro expansion once per diagnostic. Also, the first few events in each diagnostic are interprocedural logic that's not really relevant to the user (I'm working on a fix for that). With these changes, the output should be considerably shorter.

    Perhaps a better interface might write out a separate HTML file, one per warning, and emit a "note" giving the location of the additional information?

    I want to give the end-user enough information to act on a warning, but without overwhelming them. Are there better ways of presenting this? Let me know in the comments.

    Trying it out

    GCC 10 will be in Fedora 32, which should be out in a couple of months.

    For simple code examples, you can play around with the new GCC online at godbolt.org (select gcc "trunk" and add -fanalyzer to the compiler options).

    Have fun!

    Last updated: June 29, 2020

    Recent Posts

    • Speech-to-text with Whisper and Red Hat AI Inference Server

    • How to use Splunk as an event source for Event-Driven Ansible

    • Integrate vLLM inference on macOS/iOS with Llama Stack APIs

    • Optimize model serving at the edge with RawDeployment mode

    • Introducing Red Hat build of Cryostat 4.0

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue