Jets 101
Overview
Teaching: 40 min
Exercises: 20 minQuestions
What is a jet?
Are there different types of jets? What is a recluster algorithm?
Which types of jets do we use in CMS?
Objectives
Learn about jets, their properties, types and reclustering algorithms.
Learn about the difference between gen, calo, pfjet.
After following the instructions in the setup:
cd <YOUR WORKING DIRECTORY>/notebooks/DAS/ source /cvmfs/sft.cern.ch/lcg/views/LCG_104/x86_64-centos7-gcc11-opt/setup.sh jupyter notebook --no-browser --port=8888 --ip 127.0.0.1
This will open a jupyter notebook tree with various notebooks.
Jet Basics
Jets as signatures of quarks and gluons
Most collisions at hadron colliders involve quarks scattering. Proton collisions are really gluon and quark collisions. Proton contain quarks and gluons.
Only particles in “color singlet” state are observed in nature, due to color confinement. The kinematic properties of the jet intent to resemble that of the initial partons. What we try to do in our detectors is to measure the decay products to get access to particle/parton level:
What is the composition of jets?
Energy composition: About 65% charged hadrons, 25% neutral pions (photons), 10% neutral hadrons.
What is a jet?
Looking at an event display from our data
How do you determine which particles are included in a jet?
From a list of particles one can form jets, an object to reconstruct the shower of particles produced from a quark or gluon. Each particle belonging to a jet is known as a constituent. Each has a 4-vector that can be used for further studies. This give us a more generalised picture: Almost everything becomes a jet: g/q/t/W/Z/H/PU
We need a jet algorithm to collect the particles in a shower. This defines a Clustering Algorithm. A good jet algorithm is infrared and collinear safe. The set of hard jets should be unchanged by soft emission and collinear splitting.
Jet Clustering Algorithms
Most jet algorithms at hadron colliders use a so-called “clustering sequence”. This is essentially a pairwise examination of the input four vectors. If the pair satisfy some criteria, they are merged. The process is repeated until the entire list of constituents is exhausted.
These algorithms follow this recipe:
- iteratively find the two particles in the event which are closest in some distance measure and combine them.
- Defining \(d_{ij} = min(p^{2p}_{ti},p^{2p}_{tj}) \Delta R^{2}_{ij}/R^2\) and $d_{iB} = p_{ti}^{2p}$. We combine two particles if $d_{ij} < d_{iB}$.
- if $p=1$ then kt algorithm (KT)
- if $p=0$ then Cambridge Aachen algorithm (CA)
- if $p=-1$ then antikt algorithm (AK)
- Stop when $d_{ij} > d_{iB}$.
How the different jet algorithms look like in our events?
Comparison of jet areas for four different jet algorithms, from “The anti-kt Clustering Algorithm” by Cacciari, Salam, and Soyez [JHEP04, 063 (2008), arXiv:0802.1189].
Some excellent references about jet algorithms can be found here:
- Toward Jetography by Gavin Salam.
- Jets in Hadron-Hadron Collisions by Ellis, Huston, Hatakeyama, Loch, and Toennesmann
- The Catchment Area of Jets by Cacciari, Salam, and Soyez.
- The anti-kt Clustering Algorithm by Cacciari, Salam, and Soyez.
Fastjet
The package used to implement the clustering algorithms in modern colliders is called Fastjet. This package is used ubiquously in all reconstruction of jets, even though is sometimes hidden in our reconstruction code. If you want to know more about Fastjet we encourange you to check their website www.fastjet.fr in your free time.
Jet types at the LHC
Jets are reconstructed physics objects representing the hadronization and fragmentation of quarks and gluons. CMS primarily uses anti-$k_{\mathrm{T}}$ jets with a cone-size of $R=0.4$ to reconstruct this jet type. We have algorithms that distinguish heavy-flavour (b or c) quarks (which are in the domain of the BTV POG), quark- vs gluon-originated jets, and jets from the main $pp$ collision versus jets formed primarily from pileup particles.
However, quarks and gluons are only part of the story! At the LHC, the typical collision energy is much greater than the mass scale of the known SM particles, and hence, even heavier particles like top quarks, W/Z/Higgs bosons, and heavy beyond-the-Standard-Model particles can be produced with large Lorentz boosts. When these particles decay to quarks and gluons, their decay products are collimated and overlap in the detector, making them difficult to reconstruct as individual AK4 jets.
Therefore, LHC analyses use jet algorithms with a large radius parameter to reconstruct these objects, called “large radius” or “fat” jets. CMS uses anti-$k_{\mathrm{T}}$ jets with $R=0.8$ (AK8) as the standard large-radius jet, while ATLAS uses AK10.
You can also read these excellent overviews of jet substructure techniques:
- Boosted objects: a probe of beyond the Standard Model physics by Abdesselam et al.
- Looking inside jets: an introduction to jet substructure and boosted-object phenomenology by Marzani, Soyez, and Spannowsky.
Exercise 1.1
Open a notebook
Several ways exist to determine the “area” of the jet over which the input constituents lay. This is very important in correcting pileup, as we will see, because some algorithms tend to “consume” more constituents than others and hence are more susceptible to pileup. Furthermore, the amount of energy inside a jet due to pileup is proportional to the area, so it is essential to know the jet area to correct this effect.
In the first exercise we will compare jet areas for different types of jets.
For this part, open the notebook called
Jets_101.ipynb
(if it is not opened) and run Exercise 1.1
Discussion 1.1
Before you run the exercise 1.1, what type of distribution do you expect for the areas of the AK4 and AK8 jets?
Question 1.1
After exercise 1.1: Try modifying the plotting cell to add vertical lines at area values corresponding to $\pi R^2$. Do the histogram peaks line up with these values?
Solution 1.1
Add these lines in the plotting cell:
plt.axvline(x=np.pi*(0.4*0.4), color='b', linestyle='--') plt.axvline(x=np.pi*(0.8*0.8), color='r', linestyle='--')
Jet Inputs and the CMS jet nomenclature
The jet algorithms take as input a set of 4-vectors. At CMS, the most popular jet type is the “Particle Flow Jet,” which attempts to use the entire detector at once and derive single four vectors representing specific particles. For this reason, it is very comparable (ideally) to clustering generator-level four-vectors.
Monte Carlo Generator-level Jets (GenJets)
GenJets are pure Monte Carlo simulated jets. They are helpful for analysis with MC samples. GenJets are formed by clustering the four-momenta of Monte Carlo truth particles. This may include “invisible” particles (muons, neutrinos, WIMPs, etc.).
As no detector effects are involved, the jet response (or jet energy scale) is 1, and the jet resolution is perfect, by definition.
GenJets include information about the 4-vectors of constituent particles, the energy’s hadronic and electromagnetic components, etc.
Calorimeter Jets (CaloJets)
CaloJets are formed from energy deposits in the calorimeters (hadronic and electromagnetic), with no tracking information considered. In the barrel region, a calorimeter tower consists of a single HCAL cell and the associated 5x5 array of ECAL crystals (the HCAL-ECAL association is similar but more complicated in the endcap region). The four-momentum of a tower is assigned from the energy of the tower, assuming zero mass, with the direction corresponding to the tower position from the interaction point.
In CMS, CaloJets are used less often than PFJets. Their use includes performance studies to disentangle tracker and calorimeter effects and trigger-level analyses where the tracker is neglected to reduce the event processing time. ATLAS makes much more use of CaloJets, as their version of particle flow is less mature than CMS’s.
Particle Flow Jets (PFJets)
Particle Flow candidates (PFCandidates) combine information from various detectors to estimate particle properties based on their assigned identities (photon, electron, muon, charged hadron, neutral hadron). PFJets are created by clustering PFCandidates into jets and contain information about contributions of every particle class: Electromagnetic/hadronic, Charged/neutral, etc. The jet response is high. The jet pT resolution is good, starting at 15–20% at low pT and asymptotically reaching 5% at high pT.
In CMS we recluster two types of PFJets:
- CHS jets = “Charge Hadron Subtracted” jets = remove charged PF particles associated to non-primary vertices (remove charged pileup). These are the default in Run 2.
- PUPPI jets = PF constituents have been weighted/removed by an algorithm (PUPPI) which is designed to remove pileup contamination (more info in PU section). These are the default in Run 3.
Full jet and MET reconstruction in CMS
Exercise 1.2
Open a notebook
For this part, open the notebook called
Jets_101.ipynb
(if it is not opened) and run Exercise 1.2
Question 1.2
After running the notebook’s Exercise 1.2: As you can see, the agreement between Calo, Gen, and Pfjet could be better! Can you guess why?
Solution 1.2
We need to apply the jet energy corrections (JEC) described in the next exercise. But before doing that, we’ll review the jet clustering algorithms used in CMS.
Jet types and algorithms in CMS
The standard jet algorithms are all implemented in the CMS reconstruction software, CMSSW. However, a few algorithms with specific parameters (namely AK4, AK8, and CA15) have become standard tools in CMS; these jet types are extensively studied by the JetMET POG, and are highly recommended. These algorithms are included in the centrally produced CMS samples, at the AOD, miniAOD, and nanoAOD data tiers (note that miniAOD and nanoAOD are most commonly used for analysis, while AOD is much less common these days, and is not widely available on the grid). Other algorithms can be implemented and tested using the JetToolbox (more in the following link).
In this part of the tutorial, you will learn how to access the jet collection included in the CMS datasets, compare the different jet types, and create your own collections.
AOD
This twiki summarizes the respective labels by which each jet collection can be retrieved from the event record for general AOD files. This format is currently used for specialized studies, but you can use the other formats for most analyses.
MiniAOD
Three main jet collections are stored in the MiniAOD format, as described here.
- slimmedJets: are AK4 energy-corrected jets using charged hadron subtraction (CHS) as the pileup removal algorithm. Jets are selected with $p_T >10$ GeV (typically analysis cut will be at least pT>20). This is the default jet collection for CMS analyses for Run II. In this collection, you can find the following jet algorithms, as well as other jet-related quantities:
- b-tagging
- Pileup jet ID
- Quark/gluon likelihood info embedded.
- slimmedJetsPUPPI: are AK4 energy-corrected jets using the PUPPI algorithm for pileup removal. This collection will be the default for Run III analyses.
- slimmedJetsAK8: ak4 AK8 energy-corrected jets using the PUPPI algorithm for pileup removal. Jets are selected iwth pT >170 GeV with all information, including PF candidate links(typically analysis cut will be at least pT>200). This has been the default collection for boosted jets in Run II. In this collection, you can find the following jet algorithms, as well as other jet-related quantities:
- Softdrop mass
- n-subjettiness and energy correlation variables
- Access to softdrop subjets with pT >30 GeV: minimal information for 3 leading jets.
- Access to the associated AK8 CHS jet four-momentum, including soft drop and pruned mass, and n-subjectness.
Examples of how to access jet collections in miniAOD samples
Below are two examples of how to access jet collections from these samples. This exercise does not intend for you to modify code in order to access these collections, but rather for you to look at the code and get an idea about how you could access this information if needed.
In C++
Please take a look at the file
jmedas_miniAODAnalyzer.C
with your favourite code viewer. You can run this code by using the python config filejmedas_miniAODtest.py
from your terminal once you have set a CMSSW environment and download this JMEDAS package. This script will only print out some information about the jets in that sample. Again, the most important part of this exercise is to get familiar with how to access jet collections from miniAOD. Take a good look at the prints this script produces to your terminal.cmsRun $CMSSW_BASE/src/Analysis/JMEDAS/scripts/jmedas_miniAODtest.py
In Python
Now take a look at the file
jmedas_miniAODtest_purePython.py
. This code can be run with simple python in your terminal. Similar as in the case for C++, the output of this job is some information about jets. The most important part of the exercise is to get familiar with how to access jet collections using python from miniAOD.python $CMSSW_BASE/src/Analysis/JMEDAS/scripts/jmedas_miniAODtest_purePython.py
NanoAOD
NanoAOD is a “flat tree” format, meaning you can access the information directly with simple ROOT or even simple Python tools (like numpy or pandas). This format is recommended for analyses in CMS, unless one needs to access other variables not stored in nanoAOD. This tutorial will only use nanoAOD files.
In nanoAOD, only AK4 CHS jets ( Jet ) and AK8 PUPPI jets ( FatJet ) are stored in Run 2. For Run 3, AK4 and AK8 jets are PUPPI jets. The jets in nanoAOD are similar to those in miniAOD, but not identical (for example, the $p_{\mathrm{T}}$ cuts might be different). In short:
- Jet = ak4PFJetsCHS
- pT >15 GeV
- Similar to miniAOD content, but many more (up-to-date) quantities (e.g. JEC)
- FatJet = ak8PFJetsPUPPI
- Similar content to miniAOD, but many more (up-to-date) quantities such as DeepXXX taggers
A full set of variables for each jet collection can be found in this website.
Also possible to customize nanoAOD. JME/BTV have their extended format with more jet collections and/or PF candidates. It is a common format for “automatised” workflows and ML training.
Note
There are several advanced tools on the market which allow you to do sophisticated analysis using nanoAOD format, including RDataFrame, NanoAOD-tools, or Coffea. We encourage you to look at them and use the one you like the most. However, we are going to use coffea for this tutorial.
Jet properties
A short list of jet properties that we can find in nanoAOD are:
- Jet 4-vector = sum of all constituent particle 4-vectors: energy, pT, η, Φ
- Jet mass
- Jet constituent multiplicities (PF) ex. charged multiplicity
- Jet constituent fractions, ex. charged hadron energy fraction
- Jet area = area in η-Φ plane in which an infinitely soft particle will be clustered with the jet
- Jet tagging information
- and many more
Exercise 1.3
Open a notebook
This preliminary exercise will illustrate some of the basic properties of jets, like the four-momentum quantities: pt, eta, phi, and mass. We will use nanoAOD files currently widely used with the CMS Collaborators. For more information about nanoAOD follow this link. At the end of the notebook, you will be able to see all the quantities stored in the
Jet
collection.For this part, open the notebook called
Jets_101.ipynb
and run Exercise 1.3
Discussion 1.2
Have you seen these jet quantities before? Were you expecting something different?
Discussion 1.3
Did you plot other jet quantities stored in nanoAOD? Do you understand the meaning of them?
Key Points
Jet is a physical object representing hadronic showers interacting with our detectors. A jet is usually associated with the physical representation of quark and gluons, but they can be more than that depending of their origin and the algorithm used to define them.
A jet is defined by its reclustering algorithm and its constituents. In current experiments, jets are reclusted using the anti-kt algorithm. Depending on their constituents, in CMS, we called jets reclustered from genparticles as GenJets, calorimeter clusters as CaloJets, and particle flow candidates as PFJets.
Pileup Reweighting and Pileup Mitigation
Overview
Teaching: 40 min
Exercises: 20 minQuestions
What is pileup and how does it afffect to jets?
What is the basic jet quality criteria?
Objectives
Learn about the pileup mitigation techniques used at CMS.
Learn about about the basic jet quality criteria.
After following the instructions in the setup (if you have not done it yet) :
cd <YOUR WORKING DIRECTORY>/notebooks/DAS/ source /cvmfs/sft.cern.ch/lcg/views/LCG_104/x86_64-centos7-gcc11-opt/setup.sh jupyter notebook --no-browser --port=8888 --ip 127.0.0.1
This will open a jupyter notebook tree with various notebooks.
What is pileup?
The additional interactions that occur in each bunch crossing because the instantaneous bunch-by-bunch luminosity is very high. Here additional implies that there is a hard-scatter interaction that has caused the event to fire the trigger. The total inelastic cross section is approximately 80mb, so if the luminosity per crossing is of the order 80mb-1 you will get one interaction per crossing, on average.
Types of pileup
We can define two types of pileup:
- In-time pileup: the interactions which occur in the bunch crossing that fired the trigger
- Out-of-time pileup: the interactions which occur in the bunch crossings which precede or follow the one which fired the trigger
We need to simulate out-of-time interactions, time structure of detector sensitivity and read-out, and bunch train structure. According to the detector elements used for measuring pileup:
- Tracker: only sensitive to in-time pileup
- Calorimeters: sensitive to out-of-time pileup
- Muon chambers: sensitive to out-of-time pileup
Pileup mitigation algorithms
Many clever ways have been devised to remove the effects of pileup from physics analyses and objects. Pileup affects all objects (MET, muons, etc.). We are focusing on jets today.
$\rho$ pileup correction
Imagine making a grid out of your detector, then $\rho$ is the median patch value (pT/area). Therefore, the corrected jet momentum is: \(p_T^{corr} = p_T^{raw} - (\rho \times area)\)
This works because pileup is expected to be isotropic. This is a simplistic version of what the L1 JECs do to remove pileup. More about JECs later.
Exercise 2.1
Before we get into mitigating pileup effects, let’s first examine measures of pileup in more detail. We will discuss event-by-event variables that can be used to characterize the pileup and this will give us some hints into thinking about how to deal with it. We can define:
- NPU: the number of pileup interactions that have been added to the event in the current bunch crossing
- mu: the true mean number of the poisson distribution for this event from which the number of interactions each bunch crossing has been sampled
- $\rho$: rho from all PF Candidates, used e.g. for JECs
- NPV: total number of reconstructed primary vertices
Open a notebook
For the first part, open the notebook called
Pileup.ipynb
and run exercise 2.1.
Question 2.1
Why are there a different amount of pileup interactions than primary vertices?
Solution 2.1
There is a vertex finding efficiency, which in Run I was about 72%. This means that $N_{PV}\simeq0.72{\cdot}N_{PU}$
Question 2.2
Rho is the measure of the density of the pileup in the event. It’s measured in terms of GeV per unit area. Can you think of ways we can use this information the correct for the effects of pileup?
Solution 2.2
From the jet $p_{T}$ simply subtract off the average amount of pileup expected in a jet of that size. Thus $p_{T}^{corr}{\simeq}p_{T}^{reco}-\rho{\cdot}area$
Question 2.3
This plot shows the jet composition. Generally, why do we see the mixture of photons, neutral hadrons and charged hadrons that we see?</font>
Solution 2.3
A majority of the constituents in a jet come from pions. Pions come in neutral ($\pi^{0}$) and charged ($\pi^{\pm}$) varieties. Naively you would expect the composition to be two thirds charged hadrons and one third neutral hadrons. However, we know that $\pi^{0}$ decays to two photons, which leads to a large photon fraction.
Charged Hadron Subtraction (CHS)
Tracking is a major tool in CMS. We can identify most charged particles from non-leading primary vertices, CHS removes these particles.
PileUp Per Particle Identification (PUPPI)
Unfortunately, pileup is not really isotropic, it is uneven:
PUPPI is trying to have an inherently local correction based on the following information: A particle from the hard scatter process is likely near (geometrically) other particles from the same interaction and have a generally higher pT. We expect particles from pileup to have no shower structure, have generally lower pT, and be uncorrelated with particles from the leading vertex.
Exercise 2.2
Open a notebook
For this part open the notebook called
Pileup.ipynb
and run the Exercise 2.2
Discussion 2.1
Do you see any difference in the jet pt for CHS and PUPPI jets? Where you expecting these results?
Pileup reweighting
Start with chosen input distribution – the instantaneous luminosity for a given event is sampled from this distribution to obtain the mean number of interactions in each beam crossing. The number of interactions for each beam crossing that will be part of the event (in- and out-of-time) is taken from a poisson distribution with the predetermined mean. The input distribution is thus smeared by convolving with a poisson distribution in each bin. This is what the observed distribution should look like after the poisson fluctuations of each interaction
The Goal of the pileup reweighting procedure is to match the generated pileup distribution to the one found in data:
- Step 1: Create the weights
- Step 2: Apply the event-by-event weights
Exercise 2.3
Here we are going to produce a file containing the weights used for pileup reweighting using
json-pog
and correctionlib
.
Open a notebook
For this part open the notebook called
Pileup.ipynb
and run the Exercise 2.3
Question 2.4
Ask yourself what pileup reweighting is doing. How large do you expect the pileup weights to be?
Question 2.5
In what unit will the x-axis be plotted? Another way of asking this is what pileup variable can be measured in both data and MC and is fairly robust?
Solution 2.5
The x-axis is plotted as a function of $\mu$ as this is a true measurement of pileup (additional interactions) and not just some variable which is correlated with pileup. Other options might have been $N_{PV}$, which has an efficiency which is less than 100%, and $\rho$, which assumes that the pileup energy density is uniform. We also get different values of $\rho$ if we measure it for different regions in $\eta$ (i.e. $|\eta|<3$ or $|\eta|<5$).
</details>
Question 2.6
Why do the green and red histograms end arount $\mu\approx38$?
More information
To learn more about pileup, you can follow the CMSDAS short exercise about pileup here: (FIXME)
Noise Jet ID
In order to avoid using fake jets, which can originate from a hot calorimeter cell or electronic read-out box, we need to require some basic quality criteria for jets. These criteria are collectively called “jet ID”. Details on the jet ID for PFJets can be found in the following twiki:
https://twiki.cern.ch/twiki/bin/viewauth/CMS/JetID
The JetMET POG recommends a single jet ID for most physics analysess in CMS, which corresponds to what used to be called the tight Jet ID. Some important observations from the above twiki:
- Jet ID is defined for uncorrected jets only. Never apply jet ID on corrected jets. This means that in your analysis you should apply jet ID first, and then apply JECs on those jets that pass jet ID.
- Jet ID is necessary for most analyses.
- It is complementary to “MET filters” (hit level noise rejection)
- Jet ID is fully efficient (>99%) for real, high-$p_{\mathrm{T}}$ jets used in most physics analysis. Its background rejection power is similarly high.
Exercise 2.4
Open a notebook
For this part open the notebook called
Pileup.ipynb
and run the Exercise 3.
In nanoAOD is trivial to apply jetID. They are stored as Flags, where events.Jet.jetId>=2
corresponds to tightID and events.Jet.jetId>=6
corresponds to tightLepVetoID.
If you want to know how this flags are stored in nanoAOD, the next block shows the implementation in C++ from a miniAOD file:
Implementation in c++
There are several ways to apply jet ID. In our above exercises, we have run the cuts “on-the-fly” in our python FWLite macro (the first option here). Others are listed for your convenience.
The following examples use somewhat out of date numbers. See the above link to the JetID twiki for the current numbers.
To apply the cuts on pat::Jet (like in miniAOD) in python then you can do :
# Apply jet ID to uncorrected jet nhf = jet.neutralHadronEnergy() / uncorrJet.E() nef = jet.neutralEmEnergy() / uncorrJet.E() chf = jet.chargedHadronEnergy() / uncorrJet.E() cef = jet.chargedEmEnergy() / uncorrJet.E() nconstituents = jet.numberOfDaughters() nch = jet.chargedMultiplicity() goodJet = \ nhf < 0.99 and \ nef < 0.99 and \ chf > 0.00 and \ cef < 0.99 and \ nconstituents > 1 and \ nch > 0
To apply the cuts on pat::Jet (like in miniAOD) in C++ then you can do:
// Apply jet ID to uncorrected jet double nhf = jet.neutralHadronEnergy() / uncorrJet.E(); double nef = jet.neutralEmEnergy() / uncorrJet.E(); double chf = jet.chargedHadronEnergy() / uncorrJet.E(); double cef = jet.chargedEmEnergy() / uncorrJet.E(); int nconstituents = jet.numberOfDaughters(); int nch = jet.chargedMultiplicity(); bool goodJet = nhf < 0.99 && nef < 0.99 && chf > 0.00 && cef < 0.99 && nconstituents > 1 && nch > 0;
To create selected jets in cmsRun:
from PhysicsTools.SelectorUtils.pfJetIDSelector_cfi import pfJetIDSelector process.tightPatJetsPFlow = cms.EDFilter("PFJetIDSelectionFunctorFilter", filterParams = pfJetIDSelector.clone(quality=cms.string("TIGHT")), src = cms.InputTag("slimmedJets") )
It is also possible to use the
PFJetIDSelectionFunctor
C++ selector (actually, either in C++ or python), but this was primarily developed in the days before PF when applying CaloJet ID was not possible very easily. Nevertheless, the functionality of more complicated selection still exists for PFJets, but is almost never used other than the few lines above. If you would still like to use that C++ class, it is documented as an example here.
Question 2.7
What do the jets with jetId represent? Were you expecting more or less jets with jetId==0?
Key Points
We call pileup to the amount of other processes not coming from the main interaction point. We must mitigates its effects to reduce the amount of noise in our events.
Many event variables help us to learn how different pileup was during the data taking period, compared to the pileup that we use in our simulations. The pileup reweighting procedure help us to calibrate the pileup profile in our simulations.
The so-called jetID is the basic jet quality criteria to remove fake jets.
Jet energy corrections and resolution
Overview
Teaching: 40 min
Exercises: 20 minQuestions
What are jet energy correction?
What is jet energy resolution?
Objectives
Learn about how we calibrate jets in CMS.
Learn about the resolution of the jets and its effect.
After following the instructions in the setup (if you have not done it yet) :
cd <YOUR WORKING DIRECTORY>/notebooks/DAS/ source /cvmfs/sft.cern.ch/lcg/views/LCG_104/x86_64-centos7-gcc11-opt/setup.sh jupyter notebook --no-browser --port=8888 --ip 127.0.0.1
This will open a jupyter notebook tree with various notebooks.
Jet Energy Corrections
Let’s define the jet pt response $R$ as the ratio between the measured and the true pt of a jet from simulation. We expect that the average response is different from 1 because of pileup adding energy or non-linear calorimeter response.
Jet energy corrections (JEC) corrects reconstructed jets (on average) back to particle level. This is done against many useful metrics, like $p_T^{gen}$, $\eta$, area, pileup. CMS uses a factorized approach to JECs:
- Pileup corrections to correct for offset energy (noPU vs. PU jet matching). This is usually called L1FastJet.
- Correction to particle level jet vs. 𝑝𝑇 and η from simulation. This is called L2Relative and L3Absolute, or L2L3 together.
- Only for data: Small residual corrections (Pileup/relative and absolute) to correct for differences between data and simulation. This is called L2L3Residuals.
Jet energy scale determination in data
Reminder for PUPPI jets
PUPPI jets do not need the L1 Pileup corrections. Starting Run3, PUPPI jets are the primary jet collection.
Exercise 3.1
Open a notebook
For this part open the notebook called
Jet_Energy_Corrections.ipynb
and run the Exercise 3.1
Discussion 1.1
After running Exercise 1 of the notebook, were you expecting differences between these two distributions? Do you think the differences are large or small?
After running the Exercise 1 of the notebook, we can notice that the $p_{\mathrm{T}}$ distributions disagree quite a bit between the GenJets and PFJets. We need to apply the jet energy corrections (JECs), a sequence of corrections that address non-uniform responses in $p_{\mathrm{T}}$ and $\eta$, as well as an average correction for pileup. The JECs are often updated fairly late in the analysis cycle, simply due to the fact that the JEC experts start deriving the JECs at the same time the analyzers start developing their analyses. For this reason, it is imperative for analyzers to maintain flexibility in the JEC, and the software reflects this.
For more information and technical details on the jet energy scale calibration in CMS, look at the following link: https://cms-jerc.web.cern.ch/JEC/.
It is possible to run the JEC software “on the fly” after you’ve done your heavy processing (Ntuple creation, skimming, etc). We will now show one example on how this is done using the latest correctionlib
package and the JME json-pog
in the Exercise 2.
json-pog and correctionlib
Currently CMS and the jetMET POG is supporting the use of the so-called
json-pog
with thecorrectionlib
python package, in a way to make the implementation of corrections more uniform.Specifically JECs were delivered in the past in a zip file containing txt files where the users could find the corrections. The
json-pog
makes this process more generic between CMS POGs, and the correctionlib makes the implementation of this corrections also more generic.More about
json-pog
in this link and correctionlib in this link.
In the notebook, using the json-pog and the correctionlib package, you find the following lines:
jerc_file = '/cvmfs/cms.cern.ch/rsync/cms-nanoAOD/jsonpog-integration/POG/JME/2018_UL/jet_jerc.json.gz'
jerc_corr = correctionlib.CorrectionSet.from_file(jerc_file)
corr = jerc_corr.compound["Summer19UL18_V5_MC_L1L2L3Res_AK4PFchs"]
where the string Summer19UL18_V5_MC_L1L2L3Res_AK4PFchs
contains the jetMET nomenclature for
labeling the JECs. In this example:
Summer19UL18_V5_MC
corresponds to the JECs campaing; including data processing campaign, JEC version, and if is MC or DATA.L1L2L3Res
is the JEC type. In this case corresponds to the set ofL1FastJet
,L2Relative
,L2L3Residual
,L3Absolute
AK4PFchs
is the type of jet: ak4 pfjet using CHS as a pileup removal algorithm.
Discussion 1.2
After running Exercise 2 of the notebook, how big is the difference in pt for corrected and uncorrected jets? Do you think it is larger at low or high pt?
Discussion 1.3
Why do we need to calibrate jet energy? Why is “jet response” not equal to 1? Can you think of a physics process in nature that can help us calibrate the jet response to 1?
Discussion 1.4
The amount of material in front of the CMS calorimeter varies by $\eta$. Therefore, the calorimeter response to jet is also a function of jet $\eta$. Can you think of a physics process in nature that can help us calibrate the jet response in $\eta$ to be uniform ?
JEC Uncertainties
Since we’ve applied the JEC corrections to the distributions, we should also assign a systematic uncertainty to the procedure. The procedure is explained in this link, and this is part of the Exercise 2.3 of the notebook.
Exercise 3.2
Open a notebook
For this part open the notebook called
Jet_Energy_Corrections.ipynb
and run the Exercise 3.
Question 1.1
After running the Exercise 3 of the notebook, does the result make sense? Is the nominal histogram always between the up and down variations, and should it be?
Jet Energy Resolution
Jets are stochastic objects. The content of jets fluctuates quite a lot, and the content also depends on what actually caused the jet (uds quarks, gluons, etc). In addition, there are experimental limitations to the measurement of jets. Both of these aspects limit the accuracy to which we can measure the 4-momentum of a jet. The way to quantify our accuracy of measuring jet energy is called the jet energy resolution (JER). If you have a group of single pions that have the same energy, the energy measured by CMS will not be exactly the same every time, but will typically follow a (roughly) Gaussian distribution with a mean and a width. The mean is corrected using the jet energy corrections. It is impossible to “correct” for all resolution effects on a jet-by-jet basis, although regression techniques can account for many effects.
As such, there will always be some experimental and theoretical uncertainty in the jet energy measurement, and this is seen as non-zero jet energy resolution. There is also other jet-related resolutions such as jet angular resolution and jet mass resolution, but JER is what we most often have to deal with. Jets measured from data have typically worse resolution than simulated jets. Because of this, it is important to ‘smear’ the MC jets with jet energy resolution (JER) scale factors, so that measured and simulated jets are on equal footing in analyses. We will demonstrate how to apply the JER scale factors, since that is applicable for all analyses that use jets.
More information can be found at theand jet resolution guide.
The resolution is measured in data for different eta bins, and was approximately 10% with a 10% uncertainty for 7 TeV and 8 TeV data. For precision, it is important to use the correctly measured resolutions, but a reasonable calculation is to assume a flat 10% uncertainty for simplicity.
Open a notebook
For this part open the notebook called
Jet_Energy_Corrections.ipynb
and run the Exercise 4.
In the notebook we will use the coffea
implementation to apply JER to nanoAOD events. Notice that
the function used to apply corrections will be updated soon to be compatible with json-pog
.
Discussion
Let’s look at a simple dijet resonance peak shown below.
It corresponds to a dijet resonance peaks analysis. The plot was produced an MC sample of Randall-Sundrum gravitons (RSGs) with m=3 TeV decaying to two quarks. The resulting signature is two high-$p_{\mathrm{T}}$ jets, with a truth-level invariant mass of 3 TeV.
Can you see the effect the correction and the smearing has?
Key Points
The energy of jets in data and simulations is different, for many reasons, and in CMS we calibrate them in a series of steps.
Jets are stochastic objects which its content fluctuates a lot. We measure the jet energy resolution to mitigate this effects.
Jet Substructure
Overview
Teaching: 40 min
Exercises: 20 minQuestions
What is jet substructure?
How to distinguished jets originating from W or top quarks?
Objectives
Learn about high pt ak8jets (FatJet)
Learn about the different substructure variables and taggers
Learn ways to identify boosted W and top quarks
After following the instructions in the setup:
cd <YOUR WORKING DIRECTORY>/notebooks/DAS/ source /cvmfs/sft.cern.ch/lcg/views/LCG_104/x86_64-centos7-gcc11-opt/setup.sh jupyter notebook --no-browser --port=8888 --ip 127.0.0.1
This will open a jupyter notebook tree with various notebooks.
What is a jet?
In the previous episodes we discussed that the jet is a physical object representing the hadronization of quakrs and gluons. Perhaps we have encounter that a jet can be formed from random noise or pileup particles in our detectors, not necessarily coming from hard scattered quarks and gluons, but jets can be so much more:
The internal structure of the jet constituents help us to understand their origin.
Boosted Objects
Heavy particles which are created not at rest but with some momentum are referred as boosted objects. Let’s analyze the example of a top quark. If the top quarks are boosted, e.g. when coming from a new massive particle, what happens?. Hadronic decay products collimated so then they can be reconstructed in the same final-state object! Hadronic final states now become accessible with a dijet final state (in this case)
Jet mass
QCD jet mass is a perturbative quantity. From the initial (almost) massless partons, pQCD gives rise to a jet mass of order:
[\left< M^2 \right> \simeq C \cdot \frac{\alpha^2}{\pi} p_T^2 R^2]
Jet mass is proportional to R and pT. C is a form factor related to originating parton and clustering algorithm. For non-cone algorithms:
[\left< M^2 \right> \simeq a \times \alpha_S p^2_T R^2]
where $a$ is 0.16 for quarks and 0.37 for gluons. For heavy objects, the LO mass scale is the heavy object mass.
The mass of QCD jets changes as a function of momentum, but the mass of heavy particle jets is relatively stable. For a given mass and pT scale, choose an appropriate jet radius:
[\Delta R \sim \frac{2m}{p_T}]
CMS uses R = 0.8 for heavy object reconstruction. That is merged W/Z at pT ~200 GeV and merged top at pT ~400 GeV.
Rho parameter
A useful variable for massive, fat jets is the QCD scaling parameter $\rho$, defined as:
$\rho=\log(m^2/(p_{\mathrm{T}}R)^2)$.
(Sometimes $\rho$ is defined without the log). One useful feature of this variable is that QCD jet mass grows with $p_{\mathrm{T}}$, i.e. the two quantities are strongly correlated, while $\rho$ is much less correlated with $p_{\mathrm{T}}$.
Exercise 4.1
We can use jet mass to distinguish our boosted W and top jets from QCD. Let’s compare the AK8 jet mass of the boosted top quarks from the RS KK sample and the jets from the QCD sample. Let’s also look at the and the softdrop groomed jet mass combined with the PUPPI pileup subtraction algorithm for different samples.
Open a notebook
For this part, open the notebook called
Jet_Substructure.ipynb
and run Exercise 4.1.
Question 4.1
Do you think the jet mass alone can be used to identify boosted W and top jets?
Question 4.2
After running Exercise 3, in which cases do you think the $\rho$ variable can be used?
Solution 4.2
The following two plots show what QCD events look like in different $p_{T}$ ranges. It’s clear that the mass depends very strongly on $p_{T}$, while the $\rho$ shape is fairly constant vs. $p_{T}$ (ignoring $\rho<7$ or so, which is the non-perturbative region). Having a stable shape is useful when studying QCD across a wide $p_{T}$ range.
Jet Substructure
Because boosted jets represent the hadronic products of a heavy particle produced with high momentum, some tools have been developed to study the internal structure of these jets. This topic is usually called Jet Substructure.
Jet substructure algorithms can be divided into three main tools:
- grooming algorithms attempt to reduce the impact of soft contributions to clustering sequence by adding some other criteria. Examples of these algorimths are softdrop, trimming, pruning.
- subtructure variables are observables that try to quantify how many cores or prongs can be identify within the structure of the boosted jet. Examples of these variables are n-subjetiness or energy correlation functions.
- taggers are more sofisticated algorithms that attempt to identify the origin of the boosted jet. Currently taggers are based on sofisticated machine-learning techniques which try to use as much information as possible in order to efficiency identify boosted W/Z/Higgs/top jets. Examples of these taggers in CMS are deepAK8/ParticleNet or deepDoubleB.
For further reading, several measurements have been performed about jet substructure:
- Studies of jet mass in dijet and W/Z+jet events (CMS).
- Jet mass and substructure of inclusive jets in sqrt(s) = 7 TeV pp collisions with the ATLAS experiment (ATLAS).
- Theory slides
- More theory slides
- Talk from Phil Harris on searching for boosted $W$ bosons.
Jet Grooming Algorithms
There has been many different approaches to jet grooming during the years. The standard idea is to remove soft and wide-angle radiation from within the jet, then recluster with smaller R, remove subjets and then remove constituents during clustering.
The next cartoon provides a good summary of all these algorithms:
The softdrop algorithm is the one choosen at CMS by default. Softdrop recursively decluster jet. Remove the softer component unless the soft drop condition is satisfied.
Soft wide angle radiation fails the condition:
- As $z_{cut}$ increases, then more aggressive grooming
- As $\beta$ decreases, then more aggressive grooming
Example (zcut = 0.1) :
- If $\beta =0$, remove softer subjet if pT fraction < 0.1 (~equivalent to MMDT)
- If $\beta > 0$, remove softer subjet if pT fraction < x, where x increases with ΔR and has maximum value 0.1
- If $\beta \lim \infty$ no grooming
- If $\beta <0$ soft drop becomes a tagger instead of a groomer (finds jets with hard, large angle subjets)
Jet grooming algorithms dramatically improves the separation of QCD and top quark jets. Merged top quarks can be identified with a window around the top quark mass.
Exercise 4.2
In this part of the tutorial, we will compare different subtructure algorithms as well as some usually subtructure variables.
Open a notebook
For this part, open the notebook called
Jet_Substructure.ipynb
and run Exercise 4.2.
Question 4.3
Look at the following histogram, which compares ungroomed, pruned, soft drop (SD), PUPPI, and SD+PUPPI jets. Note that the histogram has two peaks. What do these correspond to? How do the algorithms affect the relative size of the two populations?
Substructure variables
Knowing how many final state objects to expect from these decays we can look inside the jet for the expected substructure:
- Top decays → 3 subjets
- W/Z/H decays → 2 subjets * A quantity called N-subjettiness is a measure of how consistent a jet is with a hypothesized number of subjets. N-subjetiness is defined as:
[\tau_N = \frac{1}{\sum_i P_{T,i} \cdot R} \sum_i p_{T,i} \cdot min ( \Delta R_{1,i}, … \Delta R_{N,i} )]
The variable $\tau_N$ gives a sense of how many N prongs or cores can be find inside the jet. It is known that the n-subjetiness variables itself ($\tau_{N}$) do not provide good discrimination power, but its ratios do. Then, a $\tau_{MN} = \dfrac{\tau_M}{\tau_N}$ basically tests if the jet is more M-prong compared to N-prong. For instance, we expect 2 prongs for boosted jets originated from hadronic Ws, while we expect 1 prongs for high-pt jets from QCD multijet processes. The most common nsubjetiness ratio are $\tau_{21}$ and $\tau_{32}$.
Another subtructure variable commonly used is the energy correlation function $N2$. Similarly than $\tau_{21}$, $N2$ tests if the boosted jet is compatible with a 2-prong jet hypothesis.
Exercise 4.3
Open a notebook
For this part, open the notebook called
Jet_Substructure.ipynb
and run Exercise 4.3.
Question 4.4
Look at the histogram comparing $\tau_{21}$. What can you say about the histogram? Is $\tau_{21}$ telling you something about the nature of the boosted jets selected?
Question 4.5
Look at the histogram comparing $\tau_{32}$. What can you say about the histogram? Is $\tau_{32}$ telling you something about the nature of the boosted jets selected?
Question 4.6
Look at the histograms comparing $N2$ and $N3. What can you say about the histogram? Are these variables telling you something about the nature of the boosted jets selected?
Taggers
In this part of the tutorial, we will look at how different substructure algorithms can be used to identify jets originating from boosted W’s and tops. Specifically, we’ll see how these identification tools are used to separate these boosted jets from those originating from Standard Model QCD, a dominant process at the LHC.
W tagging
top tagging
Tagging with machine learning
W/Top tagging was one of the first places where ML was adopted in CMS. We have study several of these algorithms (JME-18-002), being “deepAK8/ParticleNet” the most used within CMS.
Exercise 4.4
Open a notebook
For this part, open the notebook called
Jet_Substructure.ipynb
and run Exercise 4.4.
Question 4.7
- Why can we use a ttbar sample to talk about W-tagging?
- What cuts would you place on these variables to distinguish W bosons from QCD?
- So far, which variable looks more promising?
Question 4.8
- What cut would you apply to select boosted top quarks?
- For both the W and top selections, what other variable(s) could we cut on in addition?
Go Further
- You can learn more about jet grooming from the jet substructure exercise and PUPPI from the pileup mitigation exercise.
- We briefly mentioned that you can combine variables for even better discrimination. In CMS, we do this to build our jet taggers. For the simple taggers, we often combine cuts on jet substructure variables and jet mass. The more sophisticated taggers, which are used more and more widely within CMS, use deep neural networks. To learn about building a machine learning tagger, check out the machine learning short exercise. (FIXME)
What about boosted Higgs?
CMS has also a rich program for booted Higgs to bb/cc taggers, however they are usually studied by the btagging group (BTV). Look at their documentation for more information.
Key Points
Jet substructure is the field study the internakl structure of high pt jets, usually clustered with a bigger jet radius (AK8).
Grooming algorithms like softdrop, and substructure variables like the nsubjettiness ratio help us to identify the origin of these jets.
Over the years more state-of-the-art taggers involving ML have been implemented in CMS. Those help us indentify more effectively boosted jets.