Scaling with Argo CD: Introducing the Apps Repo Architecture

April 22, 2024 · 20 min read

Tidligere Produkteier og Plattformutvikler

A screenshot of Argo CD

Argo CD is an awesome tool. It helps teams de-mystify the deployment process on Kubernetes by providing a visual representation of the deployments in the cluster, and GitOps methodologies gives a consistent and understandable structure to your configuration files.

But what's the best way to scale when adding more teams? How can we make sure that we're building our GitOps in a way that facilitates for self service and security? That's what we'll discuss in this blog post.

Kartverket has been using Argo CD and GitOps for several years, and we've built an architecture that solves our needs for scale and self-service. Here we'll share our learnings and discuss why our teams are so happy with our Argo setup.

Multi-tenancy in Argo CD

So you've deployed Argo CD on your multi-tenant cluster and given your teams access to the user interface. Let's imagine we now have tens of teams and hundreds of applications in the Argo UI. When we start scaling out to more than a handful of users we get into some issues with scale. Examples of these issues can be:

How do you organize your apps and projects?
How do you make sure no two teams accidentally (or maliciously) use the same namespace?
How can we make sure teams clean up unused deployment resources?
How do you seamlessly deploy to multiple clusters?

As a platform team we often find ourselves thinking that everyone loves infrastructure and Kubernetes as much as we do. This is not the case! Most people have not had the joy of having their childhood ruined by installing Linux on their school laptops and configuring WLAN drivers using ndiswrapper. Believe it or not, most people just want tools to get out of their way and let them do their job, be that programming, testing or anything else. Not every team is going to be experts in Kubernetes and Argo. So should we expect all teams to know what a deletion finalizer is? What about the intricacies of serverside apply vs. clientside apply?

It's our responsibility as a platform team to make the user experience of deploying to Kubernetes as user friendly as possible. After implementing an architecture built with UX in mind we've had the joy of seeing people who are extremely skeptical of Kubernetes and the cloud be won over by how easy it is to get your workloads running on Kubernetes. This is thanks to the consistent user experience and built-in best practices of the apps-repo architecture. But we're getting ahead of ourselves, first we need to talk about a few abstractions that make this possible.

What are ApplicationSets?

In Argo CD there's an advanced feature that allows for automating creation of Argo CD Applications called ApplicationSets. Using an ApplicationSet we can essentially make a template that generates Argo CD applications based on files or folders in a Git repository, sort of like a ReplicaSet for Pods. Using ApplicationSets we can build in features and assumptions and provide the teams with a user experience that essentially boils down to "add a file to a repo and it gets deployed to the cluster". The purest form of GitOps. No messing around with Argo CD applications and projects.

A core Argo CD component called the ApplicationSet controller will detect any ApplicationSet resources deployed to the cluster and read them. After this, it will periodically scan the a repo configured in the ApplicationSet resource and generate Application resources, which in turn scan a repo for manifest files and sync them to the cluster. So in other words: ApplicationSet -> Application -> Deployments

For this to work you need a Git repo containing manifest files. You could have the teams put these manifest files into their source code repositories, but this is not considered best practice. Usually you would put your manifests into a separate repo so that changes to the manifests don't conflict with changes in the source code. At Kartverket we call this manifest repo an apps repo.

Introducing apps repositories

A cartoon character standing on a stage pointing to large letters saying "Apps repo"

The apps repo is where the product teams put their manifests. It has a consistent structure and is designed to be read by an Argo CD ApplicationSet. It also has a lot of nifty features that enable self-service which we'll get back to.

First, let's have a look at the structure of an apps repo.

teamname-apps/
  env/
    clustername/
      namespace/
        example.yaml

In the simplest of terms, this tree describes where to deploy a given manifest. By using a directory tree it makes setting up an ApplicationSet for this repo trivial.

Consider this example ApplicationSet:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: exampleteam-apps
  namespace: argocd
spec:
  generators:
    - git:
        directories:
          - path: env/*/*
        repoURL: 'https://github.com/kartverket/exampleteam-apps.git'
        revision: HEAD
  goTemplate: true
  goTemplateOptions:
    - missingkey=error
  template:
    metadata:
      name: '{{.path.basename}}'
    spec:
      destination:
        namespace: '{{ index .path.segments 2 }}'
        name: '{{ index .path.segments 1 }}'
      project: exampleteam
      source:
        path: '{{.path.path}}'
        repoURL: 'https://github.com/kartverket/exampleteam-apps.git'
        targetRevision: HEAD
      syncPolicy:
        syncOptions:
          - CreateNamespace=true
        automated:
          prune: true
          allowEmpty: true
          selfHeal: true

With this ApplicationSet any directory within env/*/* will be picked up by the ApplicationSet controller and a new Argo CD Application will be created based on the template in the template object. This enables a product team to create any number of applications for their products.

A chart showing apps repos synced into Kubernetes

An example use for this is a product team wanting a namespace for each of their products. Instead of having to order a new namespace from the platform team when they create a new product, they can simply create it themselves by adding a new directory with the same name as the namespace they want. A new Kubernetes namespace will be automatically created thanks to the CreateNamespace=true sync option.

Ephemeral namespaces, aka. preview namespaces, is another usecase. Say a team wants to review a change before merging it to main. They could review the change in the Pull Request, but this removes us from the end user's perspective and is not suitable for non-technical people. With a preview environment the team will automatically create a new directory in the apps repo when a PR is created, and thus get a complete deployment with the change in question. This enables end-to-end testing in a browser, and also allows non-technical people to do QA before a change is merged. When it is merged another workflow can automatically delete the directory, which cleans up and deletes the preview environment.

Our convention is that namespaces are formatted with productname-branch. This allows teams to have multiple deploys per product, and also multiple products per team. So when a new PR is created all a team needs to do to automate the creation of a new directory using CI tools like GitHub actions to create a new commit in the apps-repo. This also enables the flexibility to create it as a PR in the apps-repo, but for ephemeral namespaces, this is usually not necessary.

For example:

footeam-apps/
  env/
    foo-cluster/
      foo-main/
        app.yaml
      foo-feature-123/
        app.yaml

Automating and avoiding duplication

Depending on the complexity of the apps repo, the amount of products and branches and a subjective "ickyness" with duplicating files (can you spell DRY?), you have several options on how to automate creating new namespaces.

Simple repos will probably be fine with directories containing simple yaml-files that are synced to the cluster. Newer product teams especially appreciate the simplicity of this approach. To optimize for this you may consider using a template directory at the base containing some example files that are copied into the sub-directories. A pseudo-coded GitHub action that uses a frontend.yaml template from the templates directory could look like the following:

jobs:
  build:
    # Build a container image and push it

  deploy:
    strategy:
      matrix:
        env: ['dev', 'test', 'prod']
    steps:
      # .. Checkout repo & other setup ..

      - name: Deploy to ${{ matrix.version }}
        run: |
          namespace="myapp-${{ github.ref_name }}"
          path="./env/atkv3-${{ matrix.env }}/$namespace"
          mkdir -p $path
          cp -r templates/frontend.yaml $path/frontend.yaml
          kubectl patch --local \
            -f $path/frontend.yaml \
            -p '{"spec":{"image":"${{needs.build.outputs.container_image_tag}}"}}' \
            -o yaml
          git config --global user.email "github-actions@github.com"
          git config --global user.name "GitHub Actions"
          git commit -am "Deploy ${{ matrix.env }} version ${{ github.ref_name }}"
          git push

This works for most simple apps. Our experience, however, is that as a team matures and gets more experienced with Kubernetes and Argo CD, they add more complexity and want more control. At this point most teams will migrate to using jsonnet to enable referencing and extending a reusable library shared between multiple components. SKIP also provides some common manifests via ArgoKit, a jsonnet library.

Kustomize is also a common choice, widely used by SKIP for our own infrastructure, but not really widespread with other teams.

Despite Argo supporting Helm we mostly avoid using it to create reusable templates due to the complexity of templating YAML. Jsonnet is superior in this regard.

Fixing indentation errors in YAML templates in a Helm chart pic.twitter.com/Dv2JUkCdiM
— memenetes (@memenetes) December 8, 2022

Security considerations

You may be wondering: "This seems great and all, but what about the security implications of allowing teams to create and edit namespaces in a multi-tenant cluster? That seems really dangerous!".

First of all, I love you for thinking about security. We need more people like you. Second, Argo CD has some great features we can leverage to make this work without removing the self-service nature of the apps repo architecture.

Prefixes

In order to make this work we need to give each team a set of prefixes. A prefix will usually be the name of a product that a product team has responsibility for maintaining. The only important part is that it is unique and that no other teams have been allocated the same prefix. At Kartverket this is done by the platform team as part of the team onboarding process.

The prefix is used as part of all namespaces that are created by the teams. In the example namespace product-feature-123, product is the prefix. By giving each team a set of prefixes it helps them separate products into easily identifiable namespaces and it ensures that a product team does not accidentally use another team's namespace.

Since each product team has an apps repo with the ability to name their directories as they wish, how can we enforce this? This is where Argo CD's Projects come into play.

Argo CD Projects provide a logical grouping of applications, which is useful when Argo CD is used by multiple teams. It also contains a field that allows allowlisting which clusters and namespaces are usable by a project.

Add the following to a Project to only allow this project to create and sync to namespaces prefixed with myprefix-.

metadata:
  name: exampleteam
spec:
  destinations:
  - namespace: 'myprefix-*'
    server: '*'

If you scroll back up to the ApplicationSet example above, you will see that it only creates applications with the project exampleteam. This will automatically wire any applications created to the destination rules we've defined in this project and therefore deny any attempts by a team to use prefixes that they have not been allocated.

The crucial part here is that ApplicationSets and Projects are provisioned by the platform team, and therefore build in these security features. These resources must not be accessible to the teams, or an attacker can simply add exclusions.

Namespace resources

Another way this could be abused is if a team is able to create Namespace resources in their apps repository. This should be denied using Argo and/or cluster policies.

If a team is able to create namespace resources (or other cluster scoped resources) in their namespace an attacker can use this to break their namespace "encapsulation". Imagine for example if one could use their apps repo to sync a namespace resource named kube-system into their env/foo-cluster/foo-main directory. Argo CD would allow this, as the manifests are read into an Argo CD application. Then the attacker could delete the namespace and take down the cluster.

It's useful in this multi-tenancy scenario to think of namespaces as resources owned by the platform team and namespace-scoped resources as owned by the product teams. This is considered a best practice, and was reiterated at KubeCon Europe 2024 by Marco De Benedictis. Allowing product teams to edit namespaces can open up a ton of attack vectors, like disabling Pod Security Admission controllers, allowing an attacker to create privileged containers which can compromise the host node.

Friends don't let friends edit namespaces!

Self service customization

So we set up an ApplicationSet that configures best practices and secure defaults for product teams! Great! But now that team with experienced cloud engineers really wants to customize their Argo configuration. Maybe they want to configure that one app has auto sync on, but another app has it turned off. Maybe they want to disable self-healing for a short period to manually edit in the cluster. In any case, how can we let teams change this configuration self-service when applications are provisioned by the ApplicationSet resource?

We could let the teams edit the ApplicationSet. In our case this would mean the teams need to learn about the ApplicationSet abstraction, gotemplate and SKIP's internal GitOps repo structure. This is overkill when a team usually just wants to flip a flag between true or false for a directory. There could also be security implications with allowing teams to edit ApplicationSet resources that could break encapsulation, which we want to avoid.

Another option would be to contact the platform team and tell us to change some config for them. This is not in line with our thinking, as we want the teams to be able to work autonomously for most operations like this. It would also mean we were given a lot of menial tasks which would mean we have less time to do other more meaningful things or become a bottleneck for the teams.

A third option is setting the ApplicationSet sync policy to create-only. This would confifure the ApplicationSet controller to create Application resources, but prevent any further modification, such as deletion, or modification of Application fields. This would allow a team to edit the application in the UI after creation, for example disabling auto sync. This last option is user friendly, but in violation of GitOps principles where config lives in git and not in a database. If you run Argo stateless like we do this would also mean the changes disappear when the pod restarts.

Because none of these options seemed to be the best, we created a better solution. By using a combination of generators and the new template patch feature in Argo CD 2.8 we can look through every directory in the apps repo for a configuration file called config.json.

Let's look at an example config.json file. This example file is commited in the apps repo to the env/foo-cluster/foo-main directory.

{
  "tool": "kustomize",
  "autoSync": false
}

This file is not required, but if this file is found the values configured there overrides a set of default values in the ApplicationSet template. These flags are then used to determine how the resulting Application will behave. This means the team is able to change the values they care about per directory of their apps repo

footeam-apps/
  env/
    foo-cluster/
      foo-main/
        config.json
        app.yaml
      foo-feature-123/
        config.json
        app.yaml
      foo-feature-with-default-config/
        app.yaml

Additionaly, since the platform team is in control of the template we can eliminate the ability to maliciously change the template by parsing the inputs in a secure way.

Example ApplicationSet

Let's look at how we can write an ApplicationSet that allows us to use config.json files.

First, we need to configure the ApplicationSet to look through all directories, and at the same time use a config.json file if it is found. This is perhaps the least intuitive part of this new ApplicationSet, so let's walk through it step by step.

First we create a merge generator, which will merge two generators. The key thing here is that it only merges if the key matches in both generators, so this allows us to first find all directories (the default), then directories that contain config.json files (the override).

 generators:
  - merge:
      generators:
      - # default
      - # override
      mergeKeys:
      - key

Now we're going to add the generator from before into the default. The only difference is we're doing this using a matrix generator. Doing this combines the parameters generated by the two child generators, which gives us the values from the git generator like before, but also a set of default values we can use in our template later if the config.json file is not provided.

We're also using a value from the git generator to assign a key that will uniquely identify this directory for the merge generator later.

 generators:
  - merge:
      generators:
      - matrix:
          generators:
          - git:
              directories:
                - path: env/*/*
              repoURL: 'https://github.com/kartverket/exampleteam-apps.git'
              revision: HEAD
          - list:
              elements:
              - allowEmpty: false
                autoSync: true
                key: '{{ .path.basenameNormalized}}'
                prune: true
                selfHeal: true
                tool: directory
      - # override
      mergeKeys:
      - key

Now we use a variant of the git generator to find all config.json files in the same repo and extract the values from it. Again we're using the key field to uniquely identify this directory so that it will be merged with the correct directory in the merge generator.

We're repeating the default values here as well, since not all fields are required and we don't want them to be overwritten as null in the resulting merge.

 generators:
  - merge:
      generators:
      - matrix:
          generators:
          - git:
              directories:
                - path: env/*/*
              repoURL: 'https://github.com/kartverket/exampleteam-apps.git'
              revision: HEAD
          - list:
              elements:
              - allowEmpty: false
                autoSync: true
                key: '{{ .path.basenameNormalized}}'
                prune: true
                selfHeal: true
                tool: directory
      - matrix:
          generators:
          - git:
              files:
              - path: env/*/*/config.json
              repoURL: 'https://github.com/kartverket/exampleteam-apps.git'
              revision: HEAD
          - list:
              elements:
              - allowEmpty: false
                autoSync: true
                key: '{{ .path.basenameNormalized}}'
                prune: true
                selfHeal: true
                tool: directory
      mergeKeys:
      - key

That's it for the generator! Now we can use these variables in the templatePatch field (and other fields). In this case we want to set syncPolicy options, so we need to use the templatePatch, as gotemplates don't work for objects.

We're also adding a special case where for directory sources (the default) we exclude config.json files, as we don't want to sync the config file with Argo. This allows us to extend it later to add options for other tools like Kustomize or Helm.

Keep in mind that we don't want users to inject maliciously formed patches, so we cast booleans to booleans.

  templatePatch: |
    spec:
      source:
        directory:
      {{- if eq .tool "directory" }}
          exclude: config.json
      {{- end }}
    {{- if .autoSync }}
      syncPolicy:
        automated:
          allowEmpty: {{ .allowEmpty | toJson }}
          prune: {{ .prune | toJson }}
          selfHeal: {{ .selfHeal | toJson }}
    {{- end }}

Complete ApplicationSet

Here is a complete ApplicationSet containing all the features we've discussed so far.

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: exampleteam-apps
  namespace: argocd
spec:
  generators:
  - merge:
      generators:
      - matrix:
          generators:
          - git:
              directories:
                - path: env/*/*
              repoURL: 'https://github.com/kartverket/exampleteam-apps.git'
              revision: HEAD
          - list:
              elements:
              - allowEmpty: false
                autoSync: true
                key: '{{ .path.basenameNormalized}}'
                prune: true
                selfHeal: true
                tool: directory
      - matrix:
          generators:
          - git:
              files:
              - path: env/*/*/config.json
              repoURL: https://github.com/kartverket/exampleteam-apps.git
              revision: HEAD
          - list:
              elements:
              - allowEmpty: false
                autoSync: true
                key: '{{ .path.basenameNormalized}}'
                prune: true
                selfHeal: true
                tool: directory
      mergeKeys:
      - key
  goTemplate: true
  goTemplateOptions:
  - missingkey=error
  template:
    metadata:
      name: '{{.path.basenameNormalized}}'
    spec:
      destination:
        namespace: '{{ index .path.segments 2 }}'
        name: '{{ index .path.segments 1 }}'
      project: exampleteam
      source:
        path: '{{.path.path}}'
        repoURL: 'https://github.com/kartverket/exampleteam-apps.git'
        targetRevision: HEAD
      syncPolicy:
        managedNamespaceMetadata:
          labels:
            app.kubernetes.io/managed-by: argocd
            pod-security.kubernetes.io/audit: restricted
            team: exampleteam
        syncOptions:
        - CreateNamespace=true
        - ServerSideApply=true
        - PrunePropagationPolicy=background
  templatePatch: |
    spec:
      source:
        directory:
      {{- if eq .tool "directory" }}
          exclude: config.json
      {{- end }}
    {{- if .autoSync }}
      syncPolicy:
        automated:
          allowEmpty: {{ .allowEmpty | toJson }}
          prune: {{ .prune | toJson }}
          selfHeal: {{ .selfHeal | toJson }}
    {{- end }}

Results

With Argo CD and the apps repo architecture, we've seen some real improvements in our deploy system. Teams find it to be incredibly intuitive to just update a file in Git and have it be instantly reflected in Argo CD and Kubernetes, especially when combined with Argo CD auto-sync.

Onboarding new teams is quick and easy, since just putting files into a Git repo is something most developers are already familiar with. We just show them the structure of the apps repo and they're good to go. A team can go from not having any experience with Kubernetes to deploying their first application in a matter of minutes.

Migrating from one cluster to another is also a breeze. Just move manifests from one directory under env to another, and the ApplicationSet will take care of the rest. This is especially useful for teams that want to start developing with new cloud native principles on-premises, modernizing the application and eventually moving to the cloud.

I feel the key part of this architecture is the config.json file. It allows a degree of customization that is not possible with the default ApplicationSet template and was to us the last missing piece. It allows teams to change configuration without needing to know about the ApplicationSet abstraction, and it allows the platform team to enforce security and best practices.

Tradeoffs

But of course, there are some drawbacks. Like always, it's tradeoffs all the way down.

Since a product team uses an apps repo to organize their apps, moving apps from one team to another will require migrating files from one repo to another. This will require some manual work to prevent Argo deleting the entire namespace when the directory is removed from the old repo. Usually this is not a big issue, and moving projects between teams happens very rarely, but it's something to keep in mind.

There is also a risk that a team could accidentally delete a namespace by removing a directory in the apps repo. We have mitigated this by disabling auto-sync for most mission critical applications in production.

And finally, projects that don't have clear ownership or shared ownership can be tricky to place into a repo. You could make an apps repo for a "pseudo-team" consisting of the teams that need access, but generally we find that it's better that all products have a clear singular main owner. This also prevents diffusion of responsibility.

Thank you for reading!

We hope you found this article helpful and informative. Getting into ApplicationSets can be a bit tricky, so we hope we managed to convey the most important parts in a clear and understandable way. Thanks for reading!

We recently created a Mastodon account @kv_plattform! If you want to contact us or discuss this article, feel free to reach out to us there.

Multi-tenancy in Argo CD​

What are ApplicationSets?​

Introducing apps repositories​

Automating and avoiding duplication​

Security considerations​

Prefixes​

Namespace resources​

Self service customization​

Example ApplicationSet​

Complete ApplicationSet​

Results​

Tradeoffs​

Thank you for reading!​