Jets 101
Overview
Teaching: 40 min
Exercises: 20 minQuestions
What is a jet?
Are there different types of jets? What is a recluster algorithm?
Which types of jets do we use in CMS?
Objectives
Learn about jets, their properties, types and reclustering algorithms.
Learn about the difference between gen, calo, pfjet.
After following the instructions in the setup:
cd <YOUR WORKING DIRECTORY>/notebooks/DAS/ source /cvmfs/sft.cern.ch/lcg/views/LCG_104/x86_64-centos7-gcc11-opt/setup.sh jupyter notebook --no-browser --port=8888 --ip 127.0.0.1
This will open a jupyter notebook tree with various notebooks.
Jet Basics
Jets as signatures of quarks and gluons
Most collisions at hadron colliders involve quarks scattering. Proton collisions are really gluon and quark collisions. Proton contain quarks and gluons.
Only particles in “color singlet” state are observed in nature, due to color confinement. The kinematic properties of the jet intent to resemble that of the initial partons. What we try to do in our detectors is to measure the decay products to get access to particle/parton level:
What is the composition of jets?
Energy composition: About 65% charged hadrons, 25% neutral pions (photons), 10% neutral hadrons.
What is a jet?
Looking at an event display from our data
How do you determine which particles are included in a jet?
From a list of particles one can form jets, an object to reconstruct the shower of particles produced from a quark or gluon. Each particle belonging to a jet is known as a constituent. Each has a 4-vector that can be used for further studies. This give us a more generalised picture: Almost everything becomes a jet: g/q/t/W/Z/H/PU
We need a jet algorithm to collect the particles in a shower. This defines a Clustering Algorithm. A good jet algorithm is infrared and collinear safe. The set of hard jets should be unchanged by soft emission and collinear splitting.
Jet Clustering Algorithms
Most jet algorithms at hadron colliders use a so-called “clustering sequence”. This is essentially a pairwise examination of the input four vectors. If the pair satisfy some criteria, they are merged. The process is repeated until the entire list of constituents is exhausted.
These algorithms follow this recipe:
- iteratively find the two particles in the event which are closest in some distance measure and combine them.
- Defining \(d_{ij} = min(p^{2p}_{ti},p^{2p}_{tj}) \Delta R^{2}_{ij}/R^2\) and $d_{iB} = p_{ti}^{2p}$. We combine two particles if $d_{ij} < d_{iB}$.
- if $p=1$ then kt algorithm (KT)
- if $p=0$ then Cambridge Aachen algorithm (CA)
- if $p=-1$ then antikt algorithm (AK)
- Stop when $d_{ij} > d_{iB}$.
How the different jet algorithms look like in our events?
Comparison of jet areas for four different jet algorithms, from “The anti-kt Clustering Algorithm” by Cacciari, Salam, and Soyez [JHEP04, 063 (2008), arXiv:0802.1189].
Some excellent references about jet algorithms can be found here:
- Toward Jetography by Gavin Salam.
- Jets in Hadron-Hadron Collisions by Ellis, Huston, Hatakeyama, Loch, and Toennesmann
- The Catchment Area of Jets by Cacciari, Salam, and Soyez.
- The anti-kt Clustering Algorithm by Cacciari, Salam, and Soyez.
Fastjet
The package used to implement the clustering algorithms in modern colliders is called Fastjet. This package is used ubiquously in all reconstruction of jets, even though is sometimes hidden in our reconstruction code. If you want to know more about Fastjet we encourange you to check their website www.fastjet.fr in your free time.
Jet types at the LHC
Jets are reconstructed physics objects representing the hadronization and fragmentation of quarks and gluons. CMS primarily uses anti-$k_{\mathrm{T}}$ jets with a cone-size of $R=0.4$ to reconstruct this jet type. We have algorithms that distinguish heavy-flavour (b or c) quarks (which are in the domain of the BTV POG), quark- vs gluon-originated jets, and jets from the main $pp$ collision versus jets formed primarily from pileup particles.
However, quarks and gluons are only part of the story! At the LHC, the typical collision energy is much greater than the mass scale of the known SM particles, and hence, even heavier particles like top quarks, W/Z/Higgs bosons, and heavy beyond-the-Standard-Model particles can be produced with large Lorentz boosts. When these particles decay to quarks and gluons, their decay products are collimated and overlap in the detector, making them difficult to reconstruct as individual AK4 jets.
Therefore, LHC analyses use jet algorithms with a large radius parameter to reconstruct these objects, called “large radius” or “fat” jets. CMS uses anti-$k_{\mathrm{T}}$ jets with $R=0.8$ (AK8) as the standard large-radius jet, while ATLAS uses AK10.
You can also read these excellent overviews of jet substructure techniques:
- Boosted objects: a probe of beyond the Standard Model physics by Abdesselam et al.
- Looking inside jets: an introduction to jet substructure and boosted-object phenomenology by Marzani, Soyez, and Spannowsky.
Exercise 1.1
Open a notebook
Several ways exist to determine the “area” of the jet over which the input constituents lay. This is very important in correcting pileup, as we will see, because some algorithms tend to “consume” more constituents than others and hence are more susceptible to pileup. Furthermore, the amount of energy inside a jet due to pileup is proportional to the area, so it is essential to know the jet area to correct this effect.
In the first exercise we will compare jet areas for different types of jets.
For this part, open the notebook called
Jets_101.ipynb
(if it is not opened) and run Exercise 1.1
Discussion 1.1
Before you run the exercise 1.1, what type of distribution do you expect for the areas of the AK4 and AK8 jets?
Question 1.1
After exercise 1.1: Try modifying the plotting cell to add vertical lines at area values corresponding to $\pi R^2$. Do the histogram peaks line up with these values?
Solution 1.1
Add these lines in the plotting cell:
plt.axvline(x=np.pi*(0.4*0.4), color='b', linestyle='--') plt.axvline(x=np.pi*(0.8*0.8), color='r', linestyle='--')
Jet Inputs and the CMS jet nomenclature
The jet algorithms take as input a set of 4-vectors. At CMS, the most popular jet type is the “Particle Flow Jet,” which attempts to use the entire detector at once and derive single four vectors representing specific particles. For this reason, it is very comparable (ideally) to clustering generator-level four-vectors.
Monte Carlo Generator-level Jets (GenJets)
GenJets are pure Monte Carlo simulated jets. They are helpful for analysis with MC samples. GenJets are formed by clustering the four-momenta of Monte Carlo truth particles. This may include “invisible” particles (muons, neutrinos, WIMPs, etc.).
As no detector effects are involved, the jet response (or jet energy scale) is 1, and the jet resolution is perfect, by definition.
GenJets include information about the 4-vectors of constituent particles, the energy’s hadronic and electromagnetic components, etc.
Calorimeter Jets (CaloJets)
CaloJets are formed from energy deposits in the calorimeters (hadronic and electromagnetic), with no tracking information considered. In the barrel region, a calorimeter tower consists of a single HCAL cell and the associated 5x5 array of ECAL crystals (the HCAL-ECAL association is similar but more complicated in the endcap region). The four-momentum of a tower is assigned from the energy of the tower, assuming zero mass, with the direction corresponding to the tower position from the interaction point.
In CMS, CaloJets are used less often than PFJets. Their use includes performance studies to disentangle tracker and calorimeter effects and trigger-level analyses where the tracker is neglected to reduce the event processing time. ATLAS makes much more use of CaloJets, as their version of particle flow is less mature than CMS’s.
Particle Flow Jets (PFJets)
Particle Flow candidates (PFCandidates) combine information from various detectors to estimate particle properties based on their assigned identities (photon, electron, muon, charged hadron, neutral hadron). PFJets are created by clustering PFCandidates into jets and contain information about contributions of every particle class: Electromagnetic/hadronic, Charged/neutral, etc. The jet response is high. The jet pT resolution is good, starting at 15–20% at low pT and asymptotically reaching 5% at high pT.
In CMS we recluster two types of PFJets:
- CHS jets = “Charge Hadron Subtracted” jets = remove charged PF particles associated to non-primary vertices (remove charged pileup). These are the default in Run 2.
- PUPPI jets = PF constituents have been weighted/removed by an algorithm (PUPPI) which is designed to remove pileup contamination (more info in PU section). These are the default in Run 3.
Full jet and MET reconstruction in CMS
Exercise 1.2
Open a notebook
For this part, open the notebook called
Jets_101.ipynb
(if it is not opened) and run Exercise 1.2
Question 1.2
After running the notebook’s Exercise 1.2: As you can see, the agreement between Calo, Gen, and Pfjet could be better! Can you guess why?
Solution 1.2
We need to apply the jet energy corrections (JEC) described in the next exercise. But before doing that, we’ll review the jet clustering algorithms used in CMS.
Jet types and algorithms in CMS
The standard jet algorithms are all implemented in the CMS reconstruction software, CMSSW. However, a few algorithms with specific parameters (namely AK4, AK8, and CA15) have become standard tools in CMS; these jet types are extensively studied by the JetMET POG, and are highly recommended. These algorithms are included in the centrally produced CMS samples, at the AOD, miniAOD, and nanoAOD data tiers (note that miniAOD and nanoAOD are most commonly used for analysis, while AOD is much less common these days, and is not widely available on the grid). Other algorithms can be implemented and tested using the JetToolbox (more in the following link).
In this part of the tutorial, you will learn how to access the jet collection included in the CMS datasets, compare the different jet types, and create your own collections.
AOD
This twiki summarizes the respective labels by which each jet collection can be retrieved from the event record for general AOD files. This format is currently used for specialized studies, but you can use the other formats for most analyses.
MiniAOD
Three main jet collections are stored in the MiniAOD format, as described here.
- slimmedJets: are AK4 energy-corrected jets using charged hadron subtraction (CHS) as the pileup removal algorithm. Jets are selected with $p_T >10$ GeV (typically analysis cut will be at least pT>20). This is the default jet collection for CMS analyses for Run II. In this collection, you can find the following jet algorithms, as well as other jet-related quantities:
- b-tagging
- Pileup jet ID
- Quark/gluon likelihood info embedded.
- slimmedJetsPUPPI: are AK4 energy-corrected jets using the PUPPI algorithm for pileup removal. This collection will be the default for Run III analyses.
- slimmedJetsAK8: ak4 AK8 energy-corrected jets using the PUPPI algorithm for pileup removal. Jets are selected iwth pT >170 GeV with all information, including PF candidate links(typically analysis cut will be at least pT>200). This has been the default collection for boosted jets in Run II. In this collection, you can find the following jet algorithms, as well as other jet-related quantities:
- Softdrop mass
- n-subjettiness and energy correlation variables
- Access to softdrop subjets with pT >30 GeV: minimal information for 3 leading jets.
- Access to the associated AK8 CHS jet four-momentum, including soft drop and pruned mass, and n-subjectness.
Examples of how to access jet collections in miniAOD samples
Below are two examples of how to access jet collections from these samples. This exercise does not intend for you to modify code in order to access these collections, but rather for you to look at the code and get an idea about how you could access this information if needed.
In C++
Please take a look at the file
jmedas_miniAODAnalyzer.C
with your favourite code viewer. You can run this code by using the python config filejmedas_miniAODtest.py
from your terminal once you have set a CMSSW environment and download this JMEDAS package. This script will only print out some information about the jets in that sample. Again, the most important part of this exercise is to get familiar with how to access jet collections from miniAOD. Take a good look at the prints this script produces to your terminal.cmsRun $CMSSW_BASE/src/Analysis/JMEDAS/scripts/jmedas_miniAODtest.py
In Python
Now take a look at the file
jmedas_miniAODtest_purePython.py
. This code can be run with simple python in your terminal. Similar as in the case for C++, the output of this job is some information about jets. The most important part of the exercise is to get familiar with how to access jet collections using python from miniAOD.python $CMSSW_BASE/src/Analysis/JMEDAS/scripts/jmedas_miniAODtest_purePython.py
NanoAOD
NanoAOD is a “flat tree” format, meaning you can access the information directly with simple ROOT or even simple Python tools (like numpy or pandas). This format is recommended for analyses in CMS, unless one needs to access other variables not stored in nanoAOD. This tutorial will only use nanoAOD files.
In nanoAOD, only AK4 CHS jets ( Jet ) and AK8 PUPPI jets ( FatJet ) are stored in Run 2. For Run 3, AK4 and AK8 jets are PUPPI jets. The jets in nanoAOD are similar to those in miniAOD, but not identical (for example, the $p_{\mathrm{T}}$ cuts might be different). In short:
- Jet = ak4PFJetsCHS
- pT >15 GeV
- Similar to miniAOD content, but many more (up-to-date) quantities (e.g. JEC)
- FatJet = ak8PFJetsPUPPI
- Similar content to miniAOD, but many more (up-to-date) quantities such as DeepXXX taggers
A full set of variables for each jet collection can be found in this website.
Also possible to customize nanoAOD. JME/BTV have their extended format with more jet collections and/or PF candidates. It is a common format for “automatised” workflows and ML training.
Note
There are several advanced tools on the market which allow you to do sophisticated analysis using nanoAOD format, including RDataFrame, NanoAOD-tools, or Coffea. We encourage you to look at them and use the one you like the most. However, we are going to use coffea for this tutorial.
Jet properties
A short list of jet properties that we can find in nanoAOD are:
- Jet 4-vector = sum of all constituent particle 4-vectors: energy, pT, η, Φ
- Jet mass
- Jet constituent multiplicities (PF) ex. charged multiplicity
- Jet constituent fractions, ex. charged hadron energy fraction
- Jet area = area in η-Φ plane in which an infinitely soft particle will be clustered with the jet
- Jet tagging information
- and many more
Exercise 1.3
Open a notebook
This preliminary exercise will illustrate some of the basic properties of jets, like the four-momentum quantities: pt, eta, phi, and mass. We will use nanoAOD files currently widely used with the CMS Collaborators. For more information about nanoAOD follow this link. At the end of the notebook, you will be able to see all the quantities stored in the
Jet
collection.For this part, open the notebook called
Jets_101.ipynb
and run Exercise 1.3
Discussion 1.2
Have you seen these jet quantities before? Were you expecting something different?
Discussion 1.3
Did you plot other jet quantities stored in nanoAOD? Do you understand the meaning of them?
Key Points
Jet is a physical object representing hadronic showers interacting with our detectors. A jet is usually associated with the physical representation of quark and gluons, but they can be more than that depending of their origin and the algorithm used to define them.
A jet is defined by its reclustering algorithm and its constituents. In current experiments, jets are reclusted using the anti-kt algorithm. Depending on their constituents, in CMS, we called jets reclustered from genparticles as GenJets, calorimeter clusters as CaloJets, and particle flow candidates as PFJets.