This lesson is being piloted (Beta version)

HATS@LPC 2024: Jets

Jets 101

Overview

Teaching: 40 min
Exercises: 20 min
Questions
  • What is a jet?

  • Are there different types of jets? What is a recluster algorithm?

  • Which types of jets do we use in CMS?

Objectives
  • Learn about jets, their properties, types and reclustering algorithms.

  • Learn about the difference between gen, calo, pfjet.

After following the instructions in the setup:

cd <YOUR WORKING DIRECTORY>/notebooks/DAS/
source /cvmfs/sft.cern.ch/lcg/views/LCG_104/x86_64-centos7-gcc11-opt/setup.sh
jupyter notebook --no-browser --port=8888 --ip 127.0.0.1

This will open a jupyter notebook tree with various notebooks.

Jet Basics

Jets as signatures of quarks and gluons

Most collisions at hadron colliders involve quarks scattering. Proton collisions are really gluon and quark collisions. Proton contain quarks and gluons.

Parton showering → Hadronization → Jet of color neutral particles

Only particles in “color singlet” state are observed in nature, due to color confinement. The kinematic properties of the jet intent to resemble that of the initial partons. What we try to do in our detectors is to measure the decay products to get access to particle/parton level:

After the particles interact with our detector we can reconstruct other stable particles.

What is the composition of jets?

Energy composition: About 65% charged hadrons, 25% neutral pions (photons), 10% neutral hadrons.

What is a jet?

Looking at an event display from our data

How do you determine which particles are included in a jet?

From a list of particles one can form jets, an object to reconstruct the shower of particles produced from a quark or gluon. Each particle belonging to a jet is known as a constituent. Each has a 4-vector that can be used for further studies. This give us a more generalised picture: Almost everything becomes a jet: g/q/t/W/Z/H/PU

We need a jet algorithm to collect the particles in a shower. This defines a Clustering Algorithm. A good jet algorithm is infrared and collinear safe. The set of hard jets should be unchanged by soft emission and collinear splitting.

Jet Clustering Algorithms

Most jet algorithms at hadron colliders use a so-called “clustering sequence”. This is essentially a pairwise examination of the input four vectors. If the pair satisfy some criteria, they are merged. The process is repeated until the entire list of constituents is exhausted.

These algorithms follow this recipe:

A more visual way of think about the recluster algorithm.

How the different jet algorithms look like in our events?

Comparison of jet areas for four different jet algorithms, from “The anti-kt Clustering Algorithm” by Cacciari, Salam, and Soyez [JHEP04, 063 (2008), arXiv:0802.1189].

Some excellent references about jet algorithms can be found here:

Fastjet

The package used to implement the clustering algorithms in modern colliders is called Fastjet. This package is used ubiquously in all reconstruction of jets, even though is sometimes hidden in our reconstruction code. If you want to know more about Fastjet we encourange you to check their website www.fastjet.fr in your free time.

Jet types at the LHC

Jets are reconstructed physics objects representing the hadronization and fragmentation of quarks and gluons. CMS primarily uses anti-$k_{\mathrm{T}}$ jets with a cone-size of $R=0.4$ to reconstruct this jet type. We have algorithms that distinguish heavy-flavour (b or c) quarks (which are in the domain of the BTV POG), quark- vs gluon-originated jets, and jets from the main $pp$ collision versus jets formed primarily from pileup particles.

However, quarks and gluons are only part of the story! At the LHC, the typical collision energy is much greater than the mass scale of the known SM particles, and hence, even heavier particles like top quarks, W/Z/Higgs bosons, and heavy beyond-the-Standard-Model particles can be produced with large Lorentz boosts. When these particles decay to quarks and gluons, their decay products are collimated and overlap in the detector, making them difficult to reconstruct as individual AK4 jets.

Therefore, LHC analyses use jet algorithms with a large radius parameter to reconstruct these objects, called “large radius” or “fat” jets. CMS uses anti-$k_{\mathrm{T}}$ jets with $R=0.8$ (AK8) as the standard large-radius jet, while ATLAS uses AK10.

You can also read these excellent overviews of jet substructure techniques:

Exercise 1.1

Open a notebook

Several ways exist to determine the “area” of the jet over which the input constituents lay. This is very important in correcting pileup, as we will see, because some algorithms tend to “consume” more constituents than others and hence are more susceptible to pileup. Furthermore, the amount of energy inside a jet due to pileup is proportional to the area, so it is essential to know the jet area to correct this effect.

In the first exercise we will compare jet areas for different types of jets.

For this part, open the notebook called Jets_101.ipynb (if it is not opened) and run Exercise 1.1

Discussion 1.1

Before you run the exercise 1.1, what type of distribution do you expect for the areas of the AK4 and AK8 jets?

Question 1.1

After exercise 1.1: Try modifying the plotting cell to add vertical lines at area values corresponding to $\pi R^2$. Do the histogram peaks line up with these values?

Solution 1.1

Add these lines in the plotting cell:

plt.axvline(x=np.pi*(0.4*0.4), color='b', linestyle='--')
plt.axvline(x=np.pi*(0.8*0.8), color='r', linestyle='--')

Jet Inputs and the CMS jet nomenclature

The jet algorithms take as input a set of 4-vectors. At CMS, the most popular jet type is the “Particle Flow Jet,” which attempts to use the entire detector at once and derive single four vectors representing specific particles. For this reason, it is very comparable (ideally) to clustering generator-level four-vectors.

Monte Carlo Generator-level Jets (GenJets)

GenJets are pure Monte Carlo simulated jets. They are helpful for analysis with MC samples. GenJets are formed by clustering the four-momenta of Monte Carlo truth particles. This may include “invisible” particles (muons, neutrinos, WIMPs, etc.).

As no detector effects are involved, the jet response (or jet energy scale) is 1, and the jet resolution is perfect, by definition.

GenJets include information about the 4-vectors of constituent particles, the energy’s hadronic and electromagnetic components, etc.

Calorimeter Jets (CaloJets)

CaloJets are formed from energy deposits in the calorimeters (hadronic and electromagnetic), with no tracking information considered. In the barrel region, a calorimeter tower consists of a single HCAL cell and the associated 5x5 array of ECAL crystals (the HCAL-ECAL association is similar but more complicated in the endcap region). The four-momentum of a tower is assigned from the energy of the tower, assuming zero mass, with the direction corresponding to the tower position from the interaction point.

In CMS, CaloJets are used less often than PFJets. Their use includes performance studies to disentangle tracker and calorimeter effects and trigger-level analyses where the tracker is neglected to reduce the event processing time. ATLAS makes much more use of CaloJets, as their version of particle flow is less mature than CMS’s.

Particle Flow Jets (PFJets)

Particle Flow candidates (PFCandidates) combine information from various detectors to estimate particle properties based on their assigned identities (photon, electron, muon, charged hadron, neutral hadron). PFJets are created by clustering PFCandidates into jets and contain information about contributions of every particle class: Electromagnetic/hadronic, Charged/neutral, etc. The jet response is high. The jet pT resolution is good, starting at 15–20% at low pT and asymptotically reaching 5% at high pT.

In CMS we recluster two types of PFJets:

Full jet and MET reconstruction in CMS

Exercise 1.2

Open a notebook

For this part, open the notebook called Jets_101.ipynb (if it is not opened) and run Exercise 1.2

Question 1.2

After running the notebook’s Exercise 1.2: As you can see, the agreement between Calo, Gen, and Pfjet could be better! Can you guess why?

Solution 1.2

We need to apply the jet energy corrections (JEC) described in the next exercise. But before doing that, we’ll review the jet clustering algorithms used in CMS.

Jet types and algorithms in CMS

The standard jet algorithms are all implemented in the CMS reconstruction software, CMSSW. However, a few algorithms with specific parameters (namely AK4, AK8, and CA15) have become standard tools in CMS; these jet types are extensively studied by the JetMET POG, and are highly recommended. These algorithms are included in the centrally produced CMS samples, at the AOD, miniAOD, and nanoAOD data tiers (note that miniAOD and nanoAOD are most commonly used for analysis, while AOD is much less common these days, and is not widely available on the grid). Other algorithms can be implemented and tested using the JetToolbox (more in the following link).

In this part of the tutorial, you will learn how to access the jet collection included in the CMS datasets, compare the different jet types, and create your own collections.

AOD

This twiki summarizes the respective labels by which each jet collection can be retrieved from the event record for general AOD files. This format is currently used for specialized studies, but you can use the other formats for most analyses.

MiniAOD

Three main jet collections are stored in the MiniAOD format, as described here.

Examples of how to access jet collections in miniAOD samples

Below are two examples of how to access jet collections from these samples. This exercise does not intend for you to modify code in order to access these collections, but rather for you to look at the code and get an idea about how you could access this information if needed.

In C++

Please take a look at the file jmedas_miniAODAnalyzer.C with your favourite code viewer. You can run this code by using the python config file jmedas_miniAODtest.py from your terminal once you have set a CMSSW environment and download this JMEDAS package. This script will only print out some information about the jets in that sample. Again, the most important part of this exercise is to get familiar with how to access jet collections from miniAOD. Take a good look at the prints this script produces to your terminal.

cmsRun $CMSSW_BASE/src/Analysis/JMEDAS/scripts/jmedas_miniAODtest.py

In Python

Now take a look at the file jmedas_miniAODtest_purePython.py. This code can be run with simple python in your terminal. Similar as in the case for C++, the output of this job is some information about jets. The most important part of the exercise is to get familiar with how to access jet collections using python from miniAOD.

python $CMSSW_BASE/src/Analysis/JMEDAS/scripts/jmedas_miniAODtest_purePython.py

NanoAOD

NanoAOD is a “flat tree” format, meaning you can access the information directly with simple ROOT or even simple Python tools (like numpy or pandas). This format is recommended for analyses in CMS, unless one needs to access other variables not stored in nanoAOD. This tutorial will only use nanoAOD files.

In nanoAOD, only AK4 CHS jets ( Jet ) and AK8 PUPPI jets ( FatJet ) are stored in Run 2. For Run 3, AK4 and AK8 jets are PUPPI jets. The jets in nanoAOD are similar to those in miniAOD, but not identical (for example, the $p_{\mathrm{T}}$ cuts might be different). In short:

A full set of variables for each jet collection can be found in this website.

Also possible to customize nanoAOD. JME/BTV have their extended format with more jet collections and/or PF candidates. It is a common format for “automatised” workflows and ML training.

Note

There are several advanced tools on the market which allow you to do sophisticated analysis using nanoAOD format, including RDataFrame, NanoAOD-tools, or Coffea. We encourage you to look at them and use the one you like the most. However, we are going to use coffea for this tutorial.

Jet properties

A short list of jet properties that we can find in nanoAOD are:

Exercise 1.3

Open a notebook

This preliminary exercise will illustrate some of the basic properties of jets, like the four-momentum quantities: pt, eta, phi, and mass. We will use nanoAOD files currently widely used with the CMS Collaborators. For more information about nanoAOD follow this link. At the end of the notebook, you will be able to see all the quantities stored in the Jet collection.

For this part, open the notebook called Jets_101.ipynb and run Exercise 1.3

Discussion 1.2

Have you seen these jet quantities before? Were you expecting something different?

Discussion 1.3

Did you plot other jet quantities stored in nanoAOD? Do you understand the meaning of them?

Key Points

  • Jet is a physical object representing hadronic showers interacting with our detectors. A jet is usually associated with the physical representation of quark and gluons, but they can be more than that depending of their origin and the algorithm used to define them.

  • A jet is defined by its reclustering algorithm and its constituents. In current experiments, jets are reclusted using the anti-kt algorithm. Depending on their constituents, in CMS, we called jets reclustered from genparticles as GenJets, calorimeter clusters as CaloJets, and particle flow candidates as PFJets.


Pileup Reweighting and Pileup Mitigation

Overview

Teaching: 40 min
Exercises: 20 min
Questions
  • What is pileup and how does it afffect to jets?

  • What is the basic jet quality criteria?

Objectives
  • Learn about the pileup mitigation techniques used at CMS.

  • Learn about about the basic jet quality criteria.

After following the instructions in the setup (if you have not done it yet) :

cd <YOUR WORKING DIRECTORY>/notebooks/DAS/
source /cvmfs/sft.cern.ch/lcg/views/LCG_104/x86_64-centos7-gcc11-opt/setup.sh
jupyter notebook --no-browser --port=8888 --ip 127.0.0.1

This will open a jupyter notebook tree with various notebooks.

What is pileup?

The additional interactions that occur in each bunch crossing because the instantaneous bunch-by-bunch luminosity is very high. Here additional implies that there is a hard-scatter interaction that has caused the event to fire the trigger. The total inelastic cross section is approximately 80mb, so if the luminosity per crossing is of the order 80mb-1 you will get one interaction per crossing, on average.

Types of pileup

We can define two types of pileup:

We need to simulate out-of-time interactions, time structure of detector sensitivity and read-out, and bunch train structure. According to the detector elements used for measuring pileup:

Pileup mitigation algorithms

Many clever ways have been devised to remove the effects of pileup from physics analyses and objects. Pileup affects all objects (MET, muons, etc.). We are focusing on jets today.

$\rho$ pileup correction

Imagine making a grid out of your detector, then $\rho$ is the median patch value (pT/area). Therefore, the corrected jet momentum is: \(p_T^{corr} = p_T^{raw} - (\rho \times area)\)

This works because pileup is expected to be isotropic. This is a simplistic version of what the L1 JECs do to remove pileup. More about JECs later.

Exercise 2.1

Before we get into mitigating pileup effects, let’s first examine measures of pileup in more detail. We will discuss event-by-event variables that can be used to characterize the pileup and this will give us some hints into thinking about how to deal with it. We can define:

Open a notebook

For the first part, open the notebook called Pileup.ipynb and run exercise 2.1.

Question 2.1

Why are there a different amount of pileup interactions than primary vertices?

Solution 2.1

There is a vertex finding efficiency, which in Run I was about 72%. This means that $N_{PV}\simeq0.72{\cdot}N_{PU}$

Question 2.2

Rho is the measure of the density of the pileup in the event. It’s measured in terms of GeV per unit area. Can you think of ways we can use this information the correct for the effects of pileup?

Solution 2.2

From the jet $p_{T}$ simply subtract off the average amount of pileup expected in a jet of that size. Thus $p_{T}^{corr}{\simeq}p_{T}^{reco}-\rho{\cdot}area$

Question 2.3

This plot shows the jet composition. Generally, why do we see the mixture of photons, neutral hadrons and charged hadrons that we see?</font> Jet Composition Vs. Pt

Solution 2.3

A majority of the constituents in a jet come from pions. Pions come in neutral ($\pi^{0}$) and charged ($\pi^{\pm}$) varieties. Naively you would expect the composition to be two thirds charged hadrons and one third neutral hadrons. However, we know that $\pi^{0}$ decays to two photons, which leads to a large photon fraction. Jet Composition MC

Charged Hadron Subtraction (CHS)

Tracking is a major tool in CMS. We can identify most charged particles from non-leading primary vertices, CHS removes these particles.

PileUp Per Particle Identification (PUPPI)

Unfortunately, pileup is not really isotropic, it is uneven:

PUPPI is trying to have an inherently local correction based on the following information: A particle from the hard scatter process is likely near (geometrically) other particles from the same interaction and have a generally higher pT. We expect particles from pileup to have no shower structure, have generally lower pT, and be uncorrelated with particles from the leading vertex.

Exercise 2.2

Open a notebook

For this part open the notebook called Pileup.ipynb and run the Exercise 2.2

Discussion 2.1

Do you see any difference in the jet pt for CHS and PUPPI jets? Where you expecting these results?

Pileup reweighting

Start with chosen input distribution – the instantaneous luminosity for a given event is sampled from this distribution to obtain the mean number of interactions in each beam crossing. The number of interactions for each beam crossing that will be part of the event (in- and out-of-time) is taken from a poisson distribution with the predetermined mean. The input distribution is thus smeared by convolving with a poisson distribution in each bin. This is what the observed distribution should look like after the poisson fluctuations of each interaction

The Goal of the pileup reweighting procedure is to match the generated pileup distribution to the one found in data:

Exercise 2.3

Here we are going to produce a file containing the weights used for pileup reweighting using json-pog and correctionlib.

Open a notebook

For this part open the notebook called Pileup.ipynb and run the Exercise 2.3

Question 2.4

Ask yourself what pileup reweighting is doing. How large do you expect the pileup weights to be?

Question 2.5

In what unit will the x-axis be plotted? Another way of asking this is what pileup variable can be measured in both data and MC and is fairly robust?

Solution 2.5

The x-axis is plotted as a function of $\mu$ as this is a true measurement of pileup (additional interactions) and not just some variable which is correlated with pileup. Other options might have been $N_{PV}$, which has an efficiency which is less than 100%, and $\rho$, which assumes that the pileup energy density is uniform. We also get different values of $\rho$ if we measure it for different regions in $\eta$ (i.e. $|\eta|<3$ or $|\eta|<5$).

Zmumu_npv Zmumu_rho Zmumu_npv_nputruth Zmumu_rho_nputruth</details>

Question 2.6

Why do the green and red histograms end arount $\mu\approx38$?

More information

To learn more about pileup, you can follow the CMSDAS short exercise about pileup here: (FIXME)

Noise Jet ID

In order to avoid using fake jets, which can originate from a hot calorimeter cell or electronic read-out box, we need to require some basic quality criteria for jets. These criteria are collectively called “jet ID”. Details on the jet ID for PFJets can be found in the following twiki:

https://twiki.cern.ch/twiki/bin/viewauth/CMS/JetID

The JetMET POG recommends a single jet ID for most physics analysess in CMS, which corresponds to what used to be called the tight Jet ID. Some important observations from the above twiki:

Exercise 2.4

Open a notebook

For this part open the notebook called Pileup.ipynb and run the Exercise 3.

In nanoAOD is trivial to apply jetID. They are stored as Flags, where events.Jet.jetId>=2 corresponds to tightID and events.Jet.jetId>=6 corresponds to tightLepVetoID.

If you want to know how this flags are stored in nanoAOD, the next block shows the implementation in C++ from a miniAOD file:

Implementation in c++

There are several ways to apply jet ID. In our above exercises, we have run the cuts “on-the-fly” in our python FWLite macro (the first option here). Others are listed for your convenience.

The following examples use somewhat out of date numbers. See the above link to the JetID twiki for the current numbers.

To apply the cuts on pat::Jet (like in miniAOD) in python then you can do :

# Apply jet ID to uncorrected jet
nhf = jet.neutralHadronEnergy() / uncorrJet.E()
nef = jet.neutralEmEnergy() / uncorrJet.E()
chf = jet.chargedHadronEnergy() / uncorrJet.E()
cef = jet.chargedEmEnergy() / uncorrJet.E()
nconstituents = jet.numberOfDaughters()
nch = jet.chargedMultiplicity()
goodJet = \
  nhf < 0.99 and \
  nef < 0.99 and \
  chf > 0.00 and \
  cef < 0.99 and \
  nconstituents > 1 and \
  nch > 0

To apply the cuts on pat::Jet (like in miniAOD) in C++ then you can do:

// Apply jet ID to uncorrected jet
double nhf = jet.neutralHadronEnergy() / uncorrJet.E();
double nef = jet.neutralEmEnergy() / uncorrJet.E();
double chf = jet.chargedHadronEnergy() / uncorrJet.E();
double cef = jet.chargedEmEnergy() / uncorrJet.E();
int nconstituents = jet.numberOfDaughters();
int nch = jet.chargedMultiplicity();
bool goodJet =
  nhf < 0.99 &&
  nef < 0.99 &&
  chf > 0.00 &&
  cef < 0.99 &&
  nconstituents > 1 &&
  nch > 0;

To create selected jets in cmsRun:

from PhysicsTools.SelectorUtils.pfJetIDSelector_cfi import pfJetIDSelector
process.tightPatJetsPFlow = cms.EDFilter("PFJetIDSelectionFunctorFilter",
                                         filterParams = pfJetIDSelector.clone(quality=cms.string("TIGHT")),
                                         src = cms.InputTag("slimmedJets")
                                         )

It is also possible to use the PFJetIDSelectionFunctor C++ selector (actually, either in C++ or python), but this was primarily developed in the days before PF when applying CaloJet ID was not possible very easily. Nevertheless, the functionality of more complicated selection still exists for PFJets, but is almost never used other than the few lines above. If you would still like to use that C++ class, it is documented as an example here.

Question 2.7

What do the jets with jetId represent? Were you expecting more or less jets with jetId==0?

Key Points

  • We call pileup to the amount of other processes not coming from the main interaction point. We must mitigates its effects to reduce the amount of noise in our events.

  • Many event variables help us to learn how different pileup was during the data taking period, compared to the pileup that we use in our simulations. The pileup reweighting procedure help us to calibrate the pileup profile in our simulations.

  • The so-called jetID is the basic jet quality criteria to remove fake jets.


Jet energy corrections and resolution

Overview

Teaching: 40 min
Exercises: 20 min
Questions
  • What are jet energy correction?

  • What is jet energy resolution?

Objectives
  • Learn about how we calibrate jets in CMS.

  • Learn about the resolution of the jets and its effect.

After following the instructions in the setup (if you have not done it yet) :

cd <YOUR WORKING DIRECTORY>/notebooks/DAS/
source /cvmfs/sft.cern.ch/lcg/views/LCG_104/x86_64-centos7-gcc11-opt/setup.sh
jupyter notebook --no-browser --port=8888 --ip 127.0.0.1

This will open a jupyter notebook tree with various notebooks.

Jet Energy Corrections

Let’s define the jet pt response $R$ as the ratio between the measured and the true pt of a jet from simulation. We expect that the average response is different from 1 because of pileup adding energy or non-linear calorimeter response.

Jet energy corrections (JEC) corrects reconstructed jets (on average) back to particle level. This is done against many useful metrics, like $p_T^{gen}$, $\eta$, area, pileup. CMS uses a factorized approach to JECs:

Jet energy scale determination in data

Reminder for PUPPI jets

PUPPI jets do not need the L1 Pileup corrections. Starting Run3, PUPPI jets are the primary jet collection.

Exercise 3.1

Open a notebook

For this part open the notebook called Jet_Energy_Corrections.ipynb and run the Exercise 3.1

Discussion 1.1

After running Exercise 1 of the notebook, were you expecting differences between these two distributions? Do you think the differences are large or small?

After running the Exercise 1 of the notebook, we can notice that the $p_{\mathrm{T}}$ distributions disagree quite a bit between the GenJets and PFJets. We need to apply the jet energy corrections (JECs), a sequence of corrections that address non-uniform responses in $p_{\mathrm{T}}$ and $\eta$, as well as an average correction for pileup. The JECs are often updated fairly late in the analysis cycle, simply due to the fact that the JEC experts start deriving the JECs at the same time the analyzers start developing their analyses. For this reason, it is imperative for analyzers to maintain flexibility in the JEC, and the software reflects this.

For more information and technical details on the jet energy scale calibration in CMS, look at the following link: https://cms-jerc.web.cern.ch/JEC/.

It is possible to run the JEC software “on the fly” after you’ve done your heavy processing (Ntuple creation, skimming, etc). We will now show one example on how this is done using the latest correctionlib package and the JME json-pog in the Exercise 2.

json-pog and correctionlib

Currently CMS and the jetMET POG is supporting the use of the so-called json-pog with the correctionlib python package, in a way to make the implementation of corrections more uniform.

Specifically JECs were delivered in the past in a zip file containing txt files where the users could find the corrections. The json-pog makes this process more generic between CMS POGs, and the correctionlib makes the implementation of this corrections also more generic.

More about json-pog in this link and correctionlib in this link.

In the notebook, using the json-pog and the correctionlib package, you find the following lines:

jerc_file = '/cvmfs/cms.cern.ch/rsync/cms-nanoAOD/jsonpog-integration/POG/JME/2018_UL/jet_jerc.json.gz'
jerc_corr = correctionlib.CorrectionSet.from_file(jerc_file)

corr = jerc_corr.compound["Summer19UL18_V5_MC_L1L2L3Res_AK4PFchs"]

where the string Summer19UL18_V5_MC_L1L2L3Res_AK4PFchs contains the jetMET nomenclature for labeling the JECs. In this example:

Discussion 1.2

After running Exercise 2 of the notebook, how big is the difference in pt for corrected and uncorrected jets? Do you think it is larger at low or high pt?

Discussion 1.3

Why do we need to calibrate jet energy? Why is “jet response” not equal to 1? Can you think of a physics process in nature that can help us calibrate the jet response to 1?

Discussion 1.4

The amount of material in front of the CMS calorimeter varies by $\eta$. Therefore, the calorimeter response to jet is also a function of jet $\eta$. Can you think of a physics process in nature that can help us calibrate the jet response in $\eta$ to be uniform ?

JEC Uncertainties

Since we’ve applied the JEC corrections to the distributions, we should also assign a systematic uncertainty to the procedure. The procedure is explained in this link, and this is part of the Exercise 2.3 of the notebook.

Exercise 3.2

Open a notebook

For this part open the notebook called Jet_Energy_Corrections.ipynb and run the Exercise 3.

Question 1.1

After running the Exercise 3 of the notebook, does the result make sense? Is the nominal histogram always between the up and down variations, and should it be?

Jet Energy Resolution

Jets are stochastic objects. The content of jets fluctuates quite a lot, and the content also depends on what actually caused the jet (uds quarks, gluons, etc). In addition, there are experimental limitations to the measurement of jets. Both of these aspects limit the accuracy to which we can measure the 4-momentum of a jet. The way to quantify our accuracy of measuring jet energy is called the jet energy resolution (JER). If you have a group of single pions that have the same energy, the energy measured by CMS will not be exactly the same every time, but will typically follow a (roughly) Gaussian distribution with a mean and a width. The mean is corrected using the jet energy corrections. It is impossible to “correct” for all resolution effects on a jet-by-jet basis, although regression techniques can account for many effects.

As such, there will always be some experimental and theoretical uncertainty in the jet energy measurement, and this is seen as non-zero jet energy resolution. There is also other jet-related resolutions such as jet angular resolution and jet mass resolution, but JER is what we most often have to deal with. Jets measured from data have typically worse resolution than simulated jets. Because of this, it is important to ‘smear’ the MC jets with jet energy resolution (JER) scale factors, so that measured and simulated jets are on equal footing in analyses. We will demonstrate how to apply the JER scale factors, since that is applicable for all analyses that use jets.

More information can be found at theand jet resolution guide.

The resolution is measured in data for different eta bins, and was approximately 10% with a 10% uncertainty for 7 TeV and 8 TeV data. For precision, it is important to use the correctly measured resolutions, but a reasonable calculation is to assume a flat 10% uncertainty for simplicity.

Open a notebook

For this part open the notebook called Jet_Energy_Corrections.ipynb and run the Exercise 4.

In the notebook we will use the coffea implementation to apply JER to nanoAOD events. Notice that the function used to apply corrections will be updated soon to be compatible with json-pog.

Discussion

Let’s look at a simple dijet resonance peak shown below.

Jet Resolution plot for a dijet resonance analysis.

It corresponds to a dijet resonance peaks analysis. The plot was produced an MC sample of Randall-Sundrum gravitons (RSGs) with m=3 TeV decaying to two quarks. The resulting signature is two high-$p_{\mathrm{T}}$ jets, with a truth-level invariant mass of 3 TeV.

Can you see the effect the correction and the smearing has?

Key Points

  • The energy of jets in data and simulations is different, for many reasons, and in CMS we calibrate them in a series of steps.

  • Jets are stochastic objects which its content fluctuates a lot. We measure the jet energy resolution to mitigate this effects.


Jet Substructure

Overview

Teaching: 40 min
Exercises: 20 min
Questions
  • What is jet substructure?

  • How to distinguished jets originating from W or top quarks?

Objectives
  • Learn about high pt ak8jets (FatJet)

  • Learn about the different substructure variables and taggers

  • Learn ways to identify boosted W and top quarks

After following the instructions in the setup:

cd <YOUR WORKING DIRECTORY>/notebooks/DAS/
source /cvmfs/sft.cern.ch/lcg/views/LCG_104/x86_64-centos7-gcc11-opt/setup.sh
jupyter notebook --no-browser --port=8888 --ip 127.0.0.1

This will open a jupyter notebook tree with various notebooks.

What is a jet?

In the previous episodes we discussed that the jet is a physical object representing the hadronization of quakrs and gluons. Perhaps we have encounter that a jet can be formed from random noise or pileup particles in our detectors, not necessarily coming from hard scattered quarks and gluons, but jets can be so much more:

The internal structure of the jet constituents help us to understand their origin.

Boosted Objects

Heavy particles which are created not at rest but with some momentum are referred as boosted objects. Let’s analyze the example of a top quark. If the top quarks are boosted, e.g. when coming from a new massive particle, what happens?. Hadronic decay products collimated so then they can be reconstructed in the same final-state object! Hadronic final states now become accessible with a dijet final state (in this case)

Jet mass

QCD jet mass is a perturbative quantity. From the initial (almost) massless partons, pQCD gives rise to a jet mass of order:

[\left< M^2 \right> \simeq C \cdot \frac{\alpha^2}{\pi} p_T^2 R^2]

Jet mass is proportional to R and pT. C is a form factor related to originating parton and clustering algorithm. For non-cone algorithms:

[\left< M^2 \right> \simeq a \times \alpha_S p^2_T R^2]

where $a$ is 0.16 for quarks and 0.37 for gluons. For heavy objects, the LO mass scale is the heavy object mass.

The mass of QCD jets changes as a function of momentum, but the mass of heavy particle jets is relatively stable. For a given mass and pT scale, choose an appropriate jet radius:

[\Delta R \sim \frac{2m}{p_T}]

CMS uses R = 0.8 for heavy object reconstruction. That is merged W/Z at pT ~200 GeV and merged top at pT ~400 GeV.

Rho parameter

A useful variable for massive, fat jets is the QCD scaling parameter $\rho$, defined as:

$\rho=\log(m^2/(p_{\mathrm{T}}R)^2)$.

(Sometimes $\rho$ is defined without the log). One useful feature of this variable is that QCD jet mass grows with $p_{\mathrm{T}}$, i.e. the two quantities are strongly correlated, while $\rho$ is much less correlated with $p_{\mathrm{T}}$.

Exercise 4.1

We can use jet mass to distinguish our boosted W and top jets from QCD. Let’s compare the AK8 jet mass of the boosted top quarks from the RS KK sample and the jets from the QCD sample. Let’s also look at the and the softdrop groomed jet mass combined with the PUPPI pileup subtraction algorithm for different samples.

Open a notebook

For this part, open the notebook called Jet_Substructure.ipynb and run Exercise 4.1.

Question 4.1

Do you think the jet mass alone can be used to identify boosted W and top jets?

Question 4.2

After running Exercise 3, in which cases do you think the $\rho$ variable can be used?

Solution 4.2

The following two plots show what QCD events look like in different $p_{T}$ ranges. It’s clear that the mass depends very strongly on $p_{T}$, while the $\rho$ shape is fairly constant vs. $p_{T}$ (ignoring $\rho<7$ or so, which is the non-perturbative region). Having a stable shape is useful when studying QCD across a wide $p_{T}$ range.

Jet Substructure

Because boosted jets represent the hadronic products of a heavy particle produced with high momentum, some tools have been developed to study the internal structure of these jets. This topic is usually called Jet Substructure.

Jet substructure algorithms can be divided into three main tools:

For further reading, several measurements have been performed about jet substructure:

Jet Grooming Algorithms

There has been many different approaches to jet grooming during the years. The standard idea is to remove soft and wide-angle radiation from within the jet, then recluster with smaller R, remove subjets and then remove constituents during clustering.

The next cartoon provides a good summary of all these algorithms:

The softdrop algorithm is the one choosen at CMS by default. Softdrop recursively decluster jet. Remove the softer component unless the soft drop condition is satisfied.

Soft wide angle radiation fails the condition:

Example (zcut = 0.1) :

Jet grooming algorithms dramatically improves the separation of QCD and top quark jets. Merged top quarks can be identified with a window around the top quark mass.

Exercise 4.2

In this part of the tutorial, we will compare different subtructure algorithms as well as some usually subtructure variables.

Open a notebook

For this part, open the notebook called Jet_Substructure.ipynb and run Exercise 4.2.

Question 4.3

Look at the following histogram, which compares ungroomed, pruned, soft drop (SD), PUPPI, and SD+PUPPI jets. Note that the histogram has two peaks. What do these correspond to? How do the algorithms affect the relative size of the two populations?

Substructure variables

Knowing how many final state objects to expect from these decays we can look inside the jet for the expected substructure:

[\tau_N = \frac{1}{\sum_i P_{T,i} \cdot R} \sum_i p_{T,i} \cdot min ( \Delta R_{1,i}, … \Delta R_{N,i} )]

The variable $\tau_N$ gives a sense of how many N prongs or cores can be find inside the jet. It is known that the n-subjetiness variables itself ($\tau_{N}$) do not provide good discrimination power, but its ratios do. Then, a $\tau_{MN} = \dfrac{\tau_M}{\tau_N}$ basically tests if the jet is more M-prong compared to N-prong. For instance, we expect 2 prongs for boosted jets originated from hadronic Ws, while we expect 1 prongs for high-pt jets from QCD multijet processes. The most common nsubjetiness ratio are $\tau_{21}$ and $\tau_{32}$.

Another subtructure variable commonly used is the energy correlation function $N2$. Similarly than $\tau_{21}$, $N2$ tests if the boosted jet is compatible with a 2-prong jet hypothesis.

Exercise 4.3

Open a notebook

For this part, open the notebook called Jet_Substructure.ipynb and run Exercise 4.3.

Question 4.4

Look at the histogram comparing $\tau_{21}$. What can you say about the histogram? Is $\tau_{21}$ telling you something about the nature of the boosted jets selected?

Question 4.5

Look at the histogram comparing $\tau_{32}$. What can you say about the histogram? Is $\tau_{32}$ telling you something about the nature of the boosted jets selected?

Question 4.6

Look at the histograms comparing $N2$ and $N3. What can you say about the histogram? Are these variables telling you something about the nature of the boosted jets selected?

Taggers

In this part of the tutorial, we will look at how different substructure algorithms can be used to identify jets originating from boosted W’s and tops. Specifically, we’ll see how these identification tools are used to separate these boosted jets from those originating from Standard Model QCD, a dominant process at the LHC.

W tagging

top tagging

Tagging with machine learning

W/Top tagging was one of the first places where ML was adopted in CMS. We have study several of these algorithms (JME-18-002), being “deepAK8/ParticleNet” the most used within CMS.

Exercise 4.4

Open a notebook

For this part, open the notebook called Jet_Substructure.ipynb and run Exercise 4.4.

Question 4.7

  • Why can we use a ttbar sample to talk about W-tagging?
  • What cuts would you place on these variables to distinguish W bosons from QCD?
  • So far, which variable looks more promising?

Question 4.8

  • What cut would you apply to select boosted top quarks?
  • For both the W and top selections, what other variable(s) could we cut on in addition?

Go Further

What about boosted Higgs?

CMS has also a rich program for booted Higgs to bb/cc taggers, however they are usually studied by the btagging group (BTV). Look at their documentation for more information.

Key Points

  • Jet substructure is the field study the internakl structure of high pt jets, usually clustered with a bigger jet radius (AK8).

  • Grooming algorithms like softdrop, and substructure variables like the nsubjettiness ratio help us to identify the origin of these jets.

  • Over the years more state-of-the-art taggers involving ML have been implemented in CMS. Those help us indentify more effectively boosted jets.