MeasureIABase
measureia.MeasureIABase
Bases: SimInfo
Base class for MeasureIA package that includes some general methods used throughout the package.
Attributes: |
|
---|
Methods:
Name | Description |
---|---|
calculate_dot_product_arrays |
Calculates dot product of elements of two arrays |
get_ellipticity |
Given e and phi, e_+ and e_x components of ellipticity are returned. |
get_random_pairs |
Analytical RR for a (rp,pi) bin. |
get_volume_spherical_cap |
Volume of an (r,mu_r) bin. |
get_random_pairs_r_mur |
Analytical RR for a (r,mu_r) bin. |
setdiff2D |
Compares each row of a1 and a2 and returns the elements that do not overlap. |
setdiff_omit |
For rows in nested list a1, whose index is included in incl_ind, returns elements that do not overlap between the row in a1 and a2. |
_measure_w_g_i |
Measure wgg or wg+ from xi grid provided by MeasureWBox or MeasureWLightcone class methods. |
_measure_multipoles |
Measure multipoles (gg or g+) from xi grid provided by MeasureMultipolesBox or MeasureMultipolesLightcone class methods. |
_obs_estimator |
Combines elements (DD, RR, etc) of xi estimators into xi_gg or xi_g+ for MeasureIALightcone. |
assign_jackknife_patches |
Given positions of multiple samples, defines jackknife patches and returns index of every object in the sample. |
Notes
Inherits attributes from 'SimInfo', where 'boxsize', 'L_0p5' and 'snap_group' are used in this class.
Source code in src/measureia/measure_IA_base.py
class MeasureIABase(SimInfo):
"""Base class for MeasureIA package that includes some general methods used throughout the package.
Attributes
----------
Num_position : int
Number of objects in the position sample. This value is updated in jackknife realisations.
Num_shape : int
Number of objects in the shape sample. This value is updated in jackknife realisations.
r_min : float
Minimum bound of (projected) separation length; bin edge. Default is 0.1.
r_max : float
Maximum bound of (projected) separation length; bin edge. Default is 20.
r_bins : ndarray
Bin edges of the (projected) separation length (r_p or r).
pi_bins : ndarray
Bin edges of the line of sight (pi).
mu_r_bins : ndarray
Bin edges of the mu_r.
Methods
-------
calculate_dot_product_arrays()
Calculates dot product of elements of two arrays
get_ellipticity()
Given e and phi, e_+ and e_x components of ellipticity are returned.
get_random_pairs()
Analytical RR for a (rp,pi) bin.
get_volume_spherical_cap()
Volume of an (r,mu_r) bin.
get_random_pairs_r_mur()
Analytical RR for a (r,mu_r) bin.
setdiff2D()
Compares each row of a1 and a2 and returns the elements that do not overlap.
setdiff_omit()
For rows in nested list a1, whose index is included in incl_ind, returns elements that do not overlap between
the row in a1 and a2.
_measure_w_g_i()
Measure wgg or wg+ from xi grid provided by MeasureWBox or MeasureWLightcone class methods.
_measure_multipoles()
Measure multipoles (gg or g+) from xi grid provided by MeasureMultipolesBox or MeasureMultipolesLightcone
class methods.
_obs_estimator()
Combines elements (DD, RR, etc) of xi estimators into xi_gg or xi_g+ for MeasureIALightcone.
assign_jackknife_patches()
Given positions of multiple samples, defines jackknife patches and returns index of every object in the sample.
Notes
-----
Inherits attributes from 'SimInfo', where 'boxsize', 'L_0p5' and 'snap_group' are used in this class.
"""
def __init__(
self,
data,
output_file_name,
simulation=None,
snapshot=None,
separation_limits=[0.1, 20.0],
num_bins_r=8,
num_bins_pi=20,
pi_max=None,
boxsize=None,
periodicity=True,
):
"""
The __init__ method of the MeasureIABase class.
Parameters
----------
data : dict or NoneType
Dictionary with data needed for calculations.
For cartesian coordinates, the keywords are:
'Position' and 'Position_shape_sample': (N_p,3), (N_s,3) ndarrays with the x, y, z coordinates
of the N_p, N_s objects in the position and shape samples, respectively.
'Axis_Direction': (N_s,2) ndarray with the two elements of the unit vectors describing the
axis direction of the projected axis of the object shape.
'LOS': index referring back to the column number in the 'Position' samples that contains the
line-of-sight coordinate. (e.g. if the shapes are projected over the z-axis, LOS=2)
'q': (N_s) array containing the axis ratio q=b/a for each object in the shape sample.
For lightcone coordinates, the keywords are:
'Redshift' and 'Redshift_shape_sample': (N_p) and (N_s) ndarray with redshifts of position and shape samples.
'RA' and 'RA_shape_sample': (N_p) and (N_s) ndarray with RA coordinate of position and shape samples.
'DEC' and 'DEC_shape_sample': (N_p) and (N_s) ndarray with DEC coordinate of position and shape samples.
'e1' and 'e2': (N_s) arrays with the two ellipticity components e1 and e2 of the shape sample objects.
output_file_name : str
Name and filepath of the file where the output should be stored. Needs to be hdf5-type.
simulation : str or NoneType, optional
Indicator of simulation, obtaining correct boxsize in cMpc/h automatically.
Choose from [TNG100, TNG100_2, TNG300, EAGLE, HorizonAGN, FLAMINGO_L1, FLAMINGO_L2p8].
Default is None, in which case boxsize needs to be added manually; or in the case of observational data,
the pi_max.
snapshot : int or str or NoneType, optional
Number of the snapshot, which, if given, will ensure that the output file to contains a group
'Snapshot_[snapshot]'. If None, the group is omitted from the output file structure. Default is None.
separation_limits : iterable of 2 entries, optional
Bounds of the (projected) separation vector length bins in cMpc/h (so, r or r_p). Default is [0.1,20].
num_bins_r : int, optional
Number of bins for (projected) separation vector. Default is 8.
num_bins_pi : int, optional
Number of bins for line of sight (LOS) vector, pi or mu_r when multipoles are measured. Default is 20.
pi_max : int or float, optional
Bound for line of sight bins. Bounds will be [-pi_max, pi_max]. Default is None, in which case half the
boxsize will be used.
boxsize : int or float or NoneType, optional
If simulation is not included in SimInfo, a manual boxsize can be added here. Make sure simulation=None
and the boxsize units are equal to those in the data dictionary. Default is None.
periodicity : bool, optional
If True, the periodic boundary conditions of the simulation box are taken into account. If False, they are
ignored. Note that because this code used analytical randoms for the simulations, the correlations will not
be correct in this case and only DD and S+D terms should be studied. Non-periodic randoms can be measured by
providing random data to the code and considering the DD term that is measured. Correlations and covariance
matrix will need to be reconstructed from parts. [Please add a request for teh integration of this method of
this if you would like to use this option often.] Default is True.
"""
SimInfo.__init__(self, simulation, snapshot, boxsize)
self.data = data
self.output_file_name = output_file_name
self.periodicity = periodicity
if periodicity:
periodic = "periodic "
else:
periodic = ""
try:
self.Num_position = len(data["Position"]) # number of halos in position sample
self.Num_shape = len(data["Position_shape_sample"]) # number of halos in shape sample
except:
try:
self.Num_position = len(data["RA"])
self.Num_shape = len(data["RA_shape_sample"])
except:
self.Num_position = 0
self.Num_shape = 0
print("Warning: no Postion or Position_shape_sample given.")
if self.Num_position > 0:
try:
weight = self.data["weight"]
except:
self.data["weight"] = np.ones(self.Num_position)
try:
weight = self.data["weight_shape_sample"]
except:
self.data["weight_shape_sample"] = np.ones(self.Num_shape)
self.r_min = separation_limits[0] # cMpc/h
self.r_max = separation_limits[1] # cMpc/h
self.num_bins_r = num_bins_r
self.num_bins_pi = num_bins_pi
self.r_bins = np.logspace(np.log10(self.r_min), np.log10(self.r_max), self.num_bins_r + 1)
if pi_max == None:
if self.L_0p5 is None:
raise ValueError(
"Both pi_max and boxsize are None. Provide input on one of them to determine the integration limit pi_max.")
else:
pi_max = self.L_0p5
self.pi_bins = np.linspace(-pi_max, pi_max, self.num_bins_pi + 1)
self.mu_r_bins = np.linspace(-1, 1, self.num_bins_pi + 1)
if simulation == False:
print(f"MeasureIA object initialised with:\n \
observational data.\n \
There are {self.Num_shape} galaxies in the shape sample and {self.Num_position} galaxies in the position sample.\n\
The separation bin edges are given by {self.r_bins} Mpc.\n \
There are {num_bins_r} r or r_p bins and {num_bins_pi} pi bins.\n \
The maximum pi used for binning is {pi_max}.\n \
The data will be written to {self.output_file_name}")
else:
print(f"MeasureIA object initialised with:\n \
simulation {simulation} that has a {periodic}boxsize of {self.boxsize} cMpc/h.\n \
There are {self.Num_shape} galaxies in the shape sample and {self.Num_position} galaxies in the position sample.\n\
The separation bin edges are given by {self.r_bins} cMpc/h.\n \
There are {num_bins_r} r or r_p bins and {num_bins_pi} pi bins.\n \
The maximum pi used for binning is {pi_max}.\n \
The data will be written to {self.output_file_name}")
return
@staticmethod
def calculate_dot_product_arrays(a1, a2):
"""Calculates the dot product over 2 2D arrays across axis 1 so that dot_product[i] = np.dot(a1[i],a2[i])
Parameters
----------
a1 : ndarray
First array
a2 : ndarray
Second array
Returns
-------
ndarray
Dot product of columns of arrays
"""
dot_product = np.zeros(np.shape(a1)[0])
for i in np.arange(0, np.shape(a1)[1]):
dot_product += a1[:, i] * a2[:, i]
return dot_product
@staticmethod
def get_ellipticity(e, phi):
"""Calculates the radial and tangential components of the ellipticity, given the size of the ellipticty vector
and the angle between the semimajor or semiminor axis and the separation vector.
Parameters
----------
e : ndarray
size of the ellipticity vector
phi : ndarray
angle between semimajor/semiminor axis and separation vector
Returns
-------
ndarray
e_+ and e_x
"""
e_plus, e_cross = e * np.cos(2 * phi), e * np.sin(2 * phi)
return e_plus, e_cross
@staticmethod
def get_random_pairs(rp_max, rp_min, pi_max, pi_min, L3, corrtype, Num_position, Num_shape):
"""Returns analytical value of the number of pairs expected in an r_p, pi bin for a random uniform distribution.
(Singh et al. 2023)
Parameters
----------
rp_max : float
Upper bound of projected separation vector bin
rp_min : float
Lower bound of projected separation vector bin
pi_max : float
Upper bound of line of sight vector bin
pi_min : float
Lower bound of line of sight vector bin
L3 : float or int
Volume of the simulation box
corrtype : str
Correlation type, auto or cross. RR for auto is RR_cross/2.
Num_position : int
Number of objects in the position sample.
Num_shape : int
Number of objects in the shape sample.
Returns
-------
float
number of pairs in r_p, pi bin
"""
if corrtype == "auto":
RR = (
(Num_position - 1.0) * Num_shape / 2.0
* np.pi
* (rp_max ** 2 - rp_min ** 2)
* abs(pi_max - pi_min)
/ L3
) # volume is cylindrical pi*dr^2 * height
elif corrtype == "cross":
RR = Num_position * Num_shape * np.pi * (rp_max ** 2 - rp_min ** 2) * abs(pi_max - pi_min) / L3
else:
raise ValueError("Unknown input for corrtype, choose from auto or cross.")
return RR
@staticmethod
def get_volume_spherical_cap(mur, r):
"""Calculate the volume of a spherical cap.
Parameters
----------
mur : float
cos(theta), where theta is the polar angle between the apex and disk of the cap.
r : float
radius
Returns
-------
float
Volume of the spherical cap.
"""
return np.pi / 3.0 * r ** 3 * (2 + mur) * (1 - mur) ** 2
def get_random_pairs_r_mur(self, r_max, r_min, mur_max, mur_min, L3, corrtype, Num_position, Num_shape):
"""Returns analytical value of the number of pairs expected in an r, mu_r bin for a random uniform distribution.
Parameters
----------
r_max : float
Upper bound of separation vector bin
r_min : float
Lower bound of separation vector bin
mur_max : float
Upper bound of mu_r bin
mur_min : float
Lower bound of mu_r bin
L3 : float
Volume of the simulation box
corrtype : str
Correlation type, auto or cross. RR for auto is RR_cross/2.
Num_position : int
Number of objects in the position sample.
Num_shape : int
Number of objects in the shape sample.
Returns
-------
float
number of pairs in r, mu_r bin
"""
if corrtype == "auto":
RR = (
(Num_position - 1.0)
/ 2.0
* Num_shape
* (
self.get_volume_spherical_cap(mur_min, r_max)
- self.get_volume_spherical_cap(mur_max, r_max)
- (self.get_volume_spherical_cap(mur_min, r_min) - self.get_volume_spherical_cap(mur_max,
r_min))
)
/ L3
)
# volume is big cap - small cap for large - small radius
elif corrtype == "cross":
RR = (
(Num_position - 1.0)
* Num_shape
* (
self.get_volume_spherical_cap(mur_min, r_max)
- self.get_volume_spherical_cap(mur_max, r_max)
- (self.get_volume_spherical_cap(mur_min, r_min) - self.get_volume_spherical_cap(mur_max,
r_min))
)
/ L3
)
else:
raise ValueError("Unknown input for corrtype, choose from auto or cross.")
return abs(RR)
@staticmethod
def setdiff2D(a1, a2):
"""Compares each row of a1 and a2 and returns the elements that do not overlap
Parameters
----------
a1 : nested list
List containing lists of elements to compare to a2
a2 : nested list
List containing lists of elements to compare to a1
Returns
-------
nested list
For each row, the not-overlapping elements between a1 and a2
"""
assert len(a1) == len(a2), "Lengths of lists where each row is to be compared, are not the same."
diff = []
for i in np.arange(0, len(a1)):
setdiff = np.setdiff1d(a1[i], a2[i])
diff.append(setdiff)
del setdiff
return diff
@staticmethod
def setdiff_omit(a1, a2, incl_ind):
"""For rows in nested list a1, whose index is included in incl_ind, returns elements that do not overlap between
the row in a1 and a2.
Parameters
----------
a1 : nested list
List of lists or arrays where indicated rows need to be compared to a2
a2 : list or array
Elements to be compared to the row in a1 [and not included in return values].
incl_ind : list or array
Indices of rows in a1 to be compared to a2.
Returns
-------
nested list
For each included row in a1, the not-overlapping elements between a1 and a2
"""
diff = []
for i in np.arange(0, len(a1)):
if np.isin(i, incl_ind):
setdiff = np.setdiff1d(a1[i], a2)
diff.append(setdiff)
del setdiff
return diff
def _measure_w_g_i(self, dataset_name, corr_type="both", return_output=False, jk_group_name=""):
"""Measures w_gg or w_g+ for a given xi_gi dataset that has been calculated with the _measure_xi_rp_pi_sims
methods. Integrates over pi bins via sum * dpi. Stores rp, and w_gg or w_g+.
Parameters
----------
dataset_name : str
Name of xi_gg or xi_g+ dataset and name given to w_gg or w_g+ dataset when stored.
return_output : bool, optional
Output is returned if True, saved to file if False. Default value = False
corr_type : str, optional
Type of correlation function. Choose from [g+,gg,both]. Default value = "both"
jk_group_name : str, optional
Name of subgroup in hdf5 file where jackknife realisations are stored. Default value = ""
Returns
-------
ndarray
[rp, wgg] or [rp, wg+] if return_output is True
"""
if corr_type == "both":
xi_data = ["xi_g_plus", "xi_gg"]
wg_data = ["w_g_plus", "w_gg"]
elif corr_type == "g+":
xi_data = ["xi_g_plus"]
wg_data = ["w_g_plus"]
elif corr_type == "gg":
xi_data = ["xi_gg"]
wg_data = ["w_gg"]
else:
raise KeyError("Unknown value for corr_type. Choose from [g+, gg, both]")
for i in np.arange(0, len(xi_data)):
correlation_data_file = h5py.File(self.output_file_name, "a")
group = correlation_data_file[f"{self.snap_group}/w/{xi_data[i]}/{jk_group_name}"]
correlation_data = group[dataset_name][:]
pi = group[dataset_name + "_pi"]
rp = group[dataset_name + "_rp"]
dpi = (self.pi_bins[1:] - self.pi_bins[:-1])
pi_bins = self.pi_bins[:-1] + abs(dpi) / 2.0 # middle of bins
# variance = group[dataset_name + "_sigmasq"][:]
if sum(np.isin(pi, pi_bins)) == len(pi):
dpi = np.array([dpi] * len(correlation_data[:, 0]))
correlation_data = correlation_data * abs(dpi)
# sigsq_el = variance * dpi ** 2
else:
raise ValueError("Update pi bins in initialisation of object to match xi_g_plus dataset.")
w_g_i = np.sum(correlation_data, axis=1) # sum over pi values
# sigsq = np.sum(sigsq_el, axis=1)
if return_output:
output_data = np.array([rp, w_g_i]).transpose()
correlation_data_file.close()
return output_data
else:
group_out = create_group_hdf5(correlation_data_file,
f"{self.snap_group}/{wg_data[i]}/{jk_group_name}")
write_dataset_hdf5(group_out, dataset_name + "_rp", data=rp)
write_dataset_hdf5(group_out, dataset_name, data=w_g_i)
# write_dataset_hdf5(group_out, dataset_name + "_sigma", data=np.sqrt(sigsq))
correlation_data_file.close()
return
def _measure_multipoles(self, dataset_name, corr_type="both", return_output=False, jk_group_name=""):
"""Measures multipoles for a given xi_g+ or xi_gg measured by _measure_xi_r_pi_sims methods.
The data assumes xi_g+ and xi_gg to be measured in bins of r and mu_r.
Parameters
----------
dataset_name : str
Name of xi_gg or xi_g+ dataset and name given to multipoles dataset when stored.
corr_type : str, optional
Type of correlation function. Choose from [g+,gg,both]. Default value = "both"
return_output : bool, optional
Output is returned if True, saved to file if False. Default value = False.
jk_group_name : str, optional
Name of subgroup in hdf5 file where jackknife realisations are stored. Default value = ""
Returns
-------
ndarray
[r, multipoles_gg] or [r, multipoles_g+] if return_output is True
"""
correlation_data_file = h5py.File(self.output_file_name, "a")
if corr_type == "g+": # todo: expand to include ++ option
group = correlation_data_file[f"{self.snap_group}/multipoles/xi_g_plus/{jk_group_name}"]
correlation_data_list = [group[dataset_name][:]] # xi_g+ in grid of r,mur
r_list = [group[dataset_name + "_r"][:]]
mu_r_list = [group[dataset_name + "_mu_r"][:]]
sab_list = [2]
l_list = sab_list
corr_type_list = ["g_plus"]
elif corr_type == "gg":
group = correlation_data_file[f"{self.snap_group}/multipoles/xi_gg/{jk_group_name}"]
correlation_data_list = [group[dataset_name][:]] # xi_g+ in grid of rp,pi
r_list = [group[dataset_name + "_r"][:]]
mu_r_list = [group[dataset_name + "_mu_r"][:]]
sab_list = [0]
l_list = sab_list
corr_type_list = ["gg"]
elif corr_type == "both":
group = correlation_data_file[f"{self.snap_group}/multipoles/xi_g_plus/{jk_group_name}"]
correlation_data_list = [group[dataset_name][:]] # xi_g+ in grid of rp,pi
r_list = [group[dataset_name + "_r"][:]]
mu_r_list = [group[dataset_name + "_mu_r"][:]]
group = correlation_data_file[f"{self.snap_group}/multipoles/xi_gg/{jk_group_name}"]
correlation_data_list.append(group[dataset_name][:]) # xi_g+ in grid of rp,pi
r_list.append(group[dataset_name + "_r"][:])
mu_r_list.append(group[dataset_name + "_mu_r"][:])
sab_list = [2, 0]
l_list = sab_list
corr_type_list = ["g_plus", "gg"]
else:
raise KeyError("Unknown value for corr_type. Choose from [g+, gg, both]")
for i in np.arange(0, len(sab_list)):
corr_type_i = corr_type_list[i]
correlation_data = correlation_data_list[i]
r = r_list[i]
mu_r = mu_r_list[i]
sab = sab_list[i]
l = l_list[i]
L = np.zeros((len(r), len(mu_r)))
mu_r = np.array(list(mu_r) * len(r)).reshape((len(r), len(mu_r))) # make pi into grid for mu
r = np.array(list(r) * len(mu_r)).reshape((len(r), len(mu_r)))
r = r.transpose()
for n in np.arange(0, len(mu_r[:, 0])):
for m in np.arange(0, len(mu_r[0])):
L_m, dL = lpmn(l, sab, mu_r[n, m]) # make associated Legendre polynomial grid
L[n, m] = L_m[-1, -1] # grid ranges from 0 to sab and 0 to l, so last element is what we seek
dmur = (self.mu_r_bins[1:] - self.mu_r_bins[:-1])
dmu_r_array = np.array(list(dmur) * len(r)).reshape((len(r), len(dmur)))
multipoles = (
(2 * l + 1)
/ 2.0
* math.factorial(l - sab)
/ math.factorial(l + sab)
* L
* correlation_data
* dmu_r_array
)
multipoles = np.sum(multipoles, axis=1)
dsep = (self.r_bins[1:] - self.r_bins[:-1]) / 2.0
separation = self.r_bins[:-1] + abs(dsep) # middle of bins
if return_output:
correlation_data_file.close()
np.array([separation, multipoles]).transpose()
else:
group_out = create_group_hdf5(
correlation_data_file, f"{self.snap_group}/multipoles_{corr_type_i}/{jk_group_name}"
)
write_dataset_hdf5(group_out, dataset_name + "_r", data=separation)
write_dataset_hdf5(group_out, dataset_name, data=multipoles)
correlation_data_file.close()
return
def _obs_estimator(self, corr_type, IA_estimator, dataset_name, dataset_name_randoms, num_samples,
jk_group_name=""):
"""Reads various components of xi and combines into correct estimator for cluster or galaxy
lightcone alignment correlations. It then writes the xi_gg or xi_g+ in the correct place in the output file.
Parameters
----------
corr_type : list of 2 str elements
First element: ['gg', 'g+', 'both'], second: 'w' or 'multipoles'
IA_estimator : str
Chooser from 'clusters' or 'galaxies' for different estimator definition.
dataset_name : str
Name of the dataset
dataset_name_randoms : str
Name of the dataset for data with randoms as positions
num_samples : dict
Dictionary of samples sizes for position, shape and random samples. Keywords: D, S, R_D, R_S
jk_group_name : str
Name of subgroup in hdf5 file where jackknife realisations are stored. Default value = ""
Returns
-------
"""
output_file = h5py.File(self.output_file_name, "a")
if corr_type[0] == "g+" or corr_type[0] == "both":
group_gp = output_file[
f"{self.snap_group}/{corr_type[1]}/xi_g_plus/{jk_group_name}"] # /w/xi_g_plus/
SpD = group_gp[f"{dataset_name}_SplusD"][:]
SpR = group_gp[f"{dataset_name_randoms}_SplusD"][:]
group_gg = output_file[f"{self.snap_group}/{corr_type[1]}/xi_gg/{jk_group_name}"]
DD = group_gg[f"{dataset_name}_DD"][:]
if IA_estimator == "clusters":
if corr_type[0] == "gg":
SR = group_gg[f"{dataset_name}_SR"][:]
else:
SR = group_gg[f"{dataset_name_randoms}_DD"][:]
SR *= num_samples["D"] / num_samples["R_D"]
if corr_type[0] == "g+" or corr_type[0] == "both":
SpR *= num_samples["D"] / num_samples["R_D"]
correlation_gp = SpD / DD - SpR / SR
write_dataset_hdf5(group_gp, dataset_name, correlation_gp)
if corr_type[0] == "gg" or corr_type[0] == "both":
RD = group_gg[f"{dataset_name}_RD"][:]
RR = group_gg[f"{dataset_name}_RR"][:]
RD *= num_samples["S"] / num_samples["R_S"]
RR *= (num_samples["S"] / num_samples["R_S"]) * (num_samples["D"] / num_samples["R_D"])
correlation_gg = (DD - RD - SR) / RR - 1
write_dataset_hdf5(group_gg, dataset_name, correlation_gg)
elif IA_estimator == "galaxies":
RR = group_gg[f"{dataset_name}_RR"][:]
RR *= (num_samples["S"] / num_samples["R_S"]) * (num_samples["D"] / num_samples["R_D"])
if corr_type[0] == "g+" or corr_type[0] == "both":
SpR *= num_samples["D"] / num_samples["R_D"]
correlation_gp = (SpD - SpR) / RR
write_dataset_hdf5(group_gp, dataset_name, correlation_gp)
if corr_type[0] == "gg" or corr_type[0] == "both":
RD = group_gg[f"{dataset_name}_RD"][:]
if corr_type[0] == "gg":
SR = group_gg[f"{dataset_name}_SR"][:]
else:
SR = group_gg[f"{dataset_name_randoms}_DD"][:]
RD *= num_samples["S"] / num_samples["R_S"]
SR *= num_samples["D"] / num_samples["R_D"]
correlation_gg = (DD - RD - SR) / RR - 1
write_dataset_hdf5(group_gg, dataset_name, correlation_gg)
else:
raise ValueError("Unknown input for IA_estimator, choose from [clusters, galaxies].")
output_file.close()
return
def assign_jackknife_patches(self, data, randoms_data, num_jk):
"""Assigns jackknife patches to data and randoms given a number of patches.
Based on https://github.com/esheldon/kmeans_radec
Parameters
----------
data : dict
Dictionary containing position and shape sample data. Keywords: "RA", "DEC", "RA_shape_sample",
"DEC_shape_sample"
randoms_data : dict
Dictionary containing position and shape sample data of randoms. Keywords: "RA", "DEC", "RA_shape_sample",
"DEC_shape_sample"
num_jk : int
Number of jackknife patches
Returns
-------
dict
Dictionary with patch numbers for each sample. Keywords: 'position', 'shape', 'randoms_position',
'randoms_shape'
"""
jk_patches = {}
# Read the randoms file from which the jackknife regions will be created
RA = randoms_data['RA']
DEC = randoms_data['DEC']
# Define a number of jaccknife regions and find their centres using kmans
X = np.column_stack((RA, DEC))
km = kmeans_sample(X, num_jk, maxiter=100, tol=1.0e-5)
jk_labels = km.labels
jk_patches['randoms_position'] = jk_labels
RA = randoms_data['RA_shape_sample']
DEC = randoms_data['DEC_shape_sample']
X2 = np.column_stack((RA, DEC))
jk_labels = km.find_nearest(X2)
jk_patches['randoms_shape'] = jk_labels
RA = data['RA']
DEC = data['DEC']
X2 = np.column_stack((RA, DEC))
jk_labels = km.find_nearest(X2)
jk_patches['position'] = jk_labels
RA = data['RA_shape_sample']
DEC = data['DEC_shape_sample']
X2 = np.column_stack((RA, DEC))
jk_labels = km.find_nearest(X2)
jk_patches['shape'] = jk_labels
return jk_patches
__init__(data, output_file_name, simulation=None, snapshot=None, separation_limits=[0.1, 20.0], num_bins_r=8, num_bins_pi=20, pi_max=None, boxsize=None, periodicity=True)
The init method of the MeasureIABase class.
Parameters: |
|
---|
Source code in src/measureia/measure_IA_base.py
def __init__(
self,
data,
output_file_name,
simulation=None,
snapshot=None,
separation_limits=[0.1, 20.0],
num_bins_r=8,
num_bins_pi=20,
pi_max=None,
boxsize=None,
periodicity=True,
):
"""
The __init__ method of the MeasureIABase class.
Parameters
----------
data : dict or NoneType
Dictionary with data needed for calculations.
For cartesian coordinates, the keywords are:
'Position' and 'Position_shape_sample': (N_p,3), (N_s,3) ndarrays with the x, y, z coordinates
of the N_p, N_s objects in the position and shape samples, respectively.
'Axis_Direction': (N_s,2) ndarray with the two elements of the unit vectors describing the
axis direction of the projected axis of the object shape.
'LOS': index referring back to the column number in the 'Position' samples that contains the
line-of-sight coordinate. (e.g. if the shapes are projected over the z-axis, LOS=2)
'q': (N_s) array containing the axis ratio q=b/a for each object in the shape sample.
For lightcone coordinates, the keywords are:
'Redshift' and 'Redshift_shape_sample': (N_p) and (N_s) ndarray with redshifts of position and shape samples.
'RA' and 'RA_shape_sample': (N_p) and (N_s) ndarray with RA coordinate of position and shape samples.
'DEC' and 'DEC_shape_sample': (N_p) and (N_s) ndarray with DEC coordinate of position and shape samples.
'e1' and 'e2': (N_s) arrays with the two ellipticity components e1 and e2 of the shape sample objects.
output_file_name : str
Name and filepath of the file where the output should be stored. Needs to be hdf5-type.
simulation : str or NoneType, optional
Indicator of simulation, obtaining correct boxsize in cMpc/h automatically.
Choose from [TNG100, TNG100_2, TNG300, EAGLE, HorizonAGN, FLAMINGO_L1, FLAMINGO_L2p8].
Default is None, in which case boxsize needs to be added manually; or in the case of observational data,
the pi_max.
snapshot : int or str or NoneType, optional
Number of the snapshot, which, if given, will ensure that the output file to contains a group
'Snapshot_[snapshot]'. If None, the group is omitted from the output file structure. Default is None.
separation_limits : iterable of 2 entries, optional
Bounds of the (projected) separation vector length bins in cMpc/h (so, r or r_p). Default is [0.1,20].
num_bins_r : int, optional
Number of bins for (projected) separation vector. Default is 8.
num_bins_pi : int, optional
Number of bins for line of sight (LOS) vector, pi or mu_r when multipoles are measured. Default is 20.
pi_max : int or float, optional
Bound for line of sight bins. Bounds will be [-pi_max, pi_max]. Default is None, in which case half the
boxsize will be used.
boxsize : int or float or NoneType, optional
If simulation is not included in SimInfo, a manual boxsize can be added here. Make sure simulation=None
and the boxsize units are equal to those in the data dictionary. Default is None.
periodicity : bool, optional
If True, the periodic boundary conditions of the simulation box are taken into account. If False, they are
ignored. Note that because this code used analytical randoms for the simulations, the correlations will not
be correct in this case and only DD and S+D terms should be studied. Non-periodic randoms can be measured by
providing random data to the code and considering the DD term that is measured. Correlations and covariance
matrix will need to be reconstructed from parts. [Please add a request for teh integration of this method of
this if you would like to use this option often.] Default is True.
"""
SimInfo.__init__(self, simulation, snapshot, boxsize)
self.data = data
self.output_file_name = output_file_name
self.periodicity = periodicity
if periodicity:
periodic = "periodic "
else:
periodic = ""
try:
self.Num_position = len(data["Position"]) # number of halos in position sample
self.Num_shape = len(data["Position_shape_sample"]) # number of halos in shape sample
except:
try:
self.Num_position = len(data["RA"])
self.Num_shape = len(data["RA_shape_sample"])
except:
self.Num_position = 0
self.Num_shape = 0
print("Warning: no Postion or Position_shape_sample given.")
if self.Num_position > 0:
try:
weight = self.data["weight"]
except:
self.data["weight"] = np.ones(self.Num_position)
try:
weight = self.data["weight_shape_sample"]
except:
self.data["weight_shape_sample"] = np.ones(self.Num_shape)
self.r_min = separation_limits[0] # cMpc/h
self.r_max = separation_limits[1] # cMpc/h
self.num_bins_r = num_bins_r
self.num_bins_pi = num_bins_pi
self.r_bins = np.logspace(np.log10(self.r_min), np.log10(self.r_max), self.num_bins_r + 1)
if pi_max == None:
if self.L_0p5 is None:
raise ValueError(
"Both pi_max and boxsize are None. Provide input on one of them to determine the integration limit pi_max.")
else:
pi_max = self.L_0p5
self.pi_bins = np.linspace(-pi_max, pi_max, self.num_bins_pi + 1)
self.mu_r_bins = np.linspace(-1, 1, self.num_bins_pi + 1)
if simulation == False:
print(f"MeasureIA object initialised with:\n \
observational data.\n \
There are {self.Num_shape} galaxies in the shape sample and {self.Num_position} galaxies in the position sample.\n\
The separation bin edges are given by {self.r_bins} Mpc.\n \
There are {num_bins_r} r or r_p bins and {num_bins_pi} pi bins.\n \
The maximum pi used for binning is {pi_max}.\n \
The data will be written to {self.output_file_name}")
else:
print(f"MeasureIA object initialised with:\n \
simulation {simulation} that has a {periodic}boxsize of {self.boxsize} cMpc/h.\n \
There are {self.Num_shape} galaxies in the shape sample and {self.Num_position} galaxies in the position sample.\n\
The separation bin edges are given by {self.r_bins} cMpc/h.\n \
There are {num_bins_r} r or r_p bins and {num_bins_pi} pi bins.\n \
The maximum pi used for binning is {pi_max}.\n \
The data will be written to {self.output_file_name}")
return
calculate_dot_product_arrays(a1, a2)
staticmethod
Calculates the dot product over 2 2D arrays across axis 1 so that dot_product[i] = np.dot(a1[i],a2[i])
Parameters: |
|
---|
Returns: |
|
---|
Source code in src/measureia/measure_IA_base.py
@staticmethod
def calculate_dot_product_arrays(a1, a2):
"""Calculates the dot product over 2 2D arrays across axis 1 so that dot_product[i] = np.dot(a1[i],a2[i])
Parameters
----------
a1 : ndarray
First array
a2 : ndarray
Second array
Returns
-------
ndarray
Dot product of columns of arrays
"""
dot_product = np.zeros(np.shape(a1)[0])
for i in np.arange(0, np.shape(a1)[1]):
dot_product += a1[:, i] * a2[:, i]
return dot_product
get_ellipticity(e, phi)
staticmethod
Calculates the radial and tangential components of the ellipticity, given the size of the ellipticty vector and the angle between the semimajor or semiminor axis and the separation vector.
Parameters: |
|
---|
Returns: |
|
---|
Source code in src/measureia/measure_IA_base.py
@staticmethod
def get_ellipticity(e, phi):
"""Calculates the radial and tangential components of the ellipticity, given the size of the ellipticty vector
and the angle between the semimajor or semiminor axis and the separation vector.
Parameters
----------
e : ndarray
size of the ellipticity vector
phi : ndarray
angle between semimajor/semiminor axis and separation vector
Returns
-------
ndarray
e_+ and e_x
"""
e_plus, e_cross = e * np.cos(2 * phi), e * np.sin(2 * phi)
return e_plus, e_cross
get_random_pairs(rp_max, rp_min, pi_max, pi_min, L3, corrtype, Num_position, Num_shape)
staticmethod
Returns analytical value of the number of pairs expected in an r_p, pi bin for a random uniform distribution. (Singh et al. 2023)
Parameters: |
|
---|
Returns: |
|
---|
Source code in src/measureia/measure_IA_base.py
@staticmethod
def get_random_pairs(rp_max, rp_min, pi_max, pi_min, L3, corrtype, Num_position, Num_shape):
"""Returns analytical value of the number of pairs expected in an r_p, pi bin for a random uniform distribution.
(Singh et al. 2023)
Parameters
----------
rp_max : float
Upper bound of projected separation vector bin
rp_min : float
Lower bound of projected separation vector bin
pi_max : float
Upper bound of line of sight vector bin
pi_min : float
Lower bound of line of sight vector bin
L3 : float or int
Volume of the simulation box
corrtype : str
Correlation type, auto or cross. RR for auto is RR_cross/2.
Num_position : int
Number of objects in the position sample.
Num_shape : int
Number of objects in the shape sample.
Returns
-------
float
number of pairs in r_p, pi bin
"""
if corrtype == "auto":
RR = (
(Num_position - 1.0) * Num_shape / 2.0
* np.pi
* (rp_max ** 2 - rp_min ** 2)
* abs(pi_max - pi_min)
/ L3
) # volume is cylindrical pi*dr^2 * height
elif corrtype == "cross":
RR = Num_position * Num_shape * np.pi * (rp_max ** 2 - rp_min ** 2) * abs(pi_max - pi_min) / L3
else:
raise ValueError("Unknown input for corrtype, choose from auto or cross.")
return RR
get_volume_spherical_cap(mur, r)
staticmethod
Calculate the volume of a spherical cap.
Parameters: |
|
---|
Returns: |
|
---|
Source code in src/measureia/measure_IA_base.py
@staticmethod
def get_volume_spherical_cap(mur, r):
"""Calculate the volume of a spherical cap.
Parameters
----------
mur : float
cos(theta), where theta is the polar angle between the apex and disk of the cap.
r : float
radius
Returns
-------
float
Volume of the spherical cap.
"""
return np.pi / 3.0 * r ** 3 * (2 + mur) * (1 - mur) ** 2
get_random_pairs_r_mur(r_max, r_min, mur_max, mur_min, L3, corrtype, Num_position, Num_shape)
Returns analytical value of the number of pairs expected in an r, mu_r bin for a random uniform distribution.
Parameters: |
|
---|
Returns: |
|
---|
Source code in src/measureia/measure_IA_base.py
def get_random_pairs_r_mur(self, r_max, r_min, mur_max, mur_min, L3, corrtype, Num_position, Num_shape):
"""Returns analytical value of the number of pairs expected in an r, mu_r bin for a random uniform distribution.
Parameters
----------
r_max : float
Upper bound of separation vector bin
r_min : float
Lower bound of separation vector bin
mur_max : float
Upper bound of mu_r bin
mur_min : float
Lower bound of mu_r bin
L3 : float
Volume of the simulation box
corrtype : str
Correlation type, auto or cross. RR for auto is RR_cross/2.
Num_position : int
Number of objects in the position sample.
Num_shape : int
Number of objects in the shape sample.
Returns
-------
float
number of pairs in r, mu_r bin
"""
if corrtype == "auto":
RR = (
(Num_position - 1.0)
/ 2.0
* Num_shape
* (
self.get_volume_spherical_cap(mur_min, r_max)
- self.get_volume_spherical_cap(mur_max, r_max)
- (self.get_volume_spherical_cap(mur_min, r_min) - self.get_volume_spherical_cap(mur_max,
r_min))
)
/ L3
)
# volume is big cap - small cap for large - small radius
elif corrtype == "cross":
RR = (
(Num_position - 1.0)
* Num_shape
* (
self.get_volume_spherical_cap(mur_min, r_max)
- self.get_volume_spherical_cap(mur_max, r_max)
- (self.get_volume_spherical_cap(mur_min, r_min) - self.get_volume_spherical_cap(mur_max,
r_min))
)
/ L3
)
else:
raise ValueError("Unknown input for corrtype, choose from auto or cross.")
return abs(RR)
setdiff2D(a1, a2)
staticmethod
Compares each row of a1 and a2 and returns the elements that do not overlap
Parameters: |
|
---|
Returns: |
|
---|
Source code in src/measureia/measure_IA_base.py
@staticmethod
def setdiff2D(a1, a2):
"""Compares each row of a1 and a2 and returns the elements that do not overlap
Parameters
----------
a1 : nested list
List containing lists of elements to compare to a2
a2 : nested list
List containing lists of elements to compare to a1
Returns
-------
nested list
For each row, the not-overlapping elements between a1 and a2
"""
assert len(a1) == len(a2), "Lengths of lists where each row is to be compared, are not the same."
diff = []
for i in np.arange(0, len(a1)):
setdiff = np.setdiff1d(a1[i], a2[i])
diff.append(setdiff)
del setdiff
return diff
setdiff_omit(a1, a2, incl_ind)
staticmethod
For rows in nested list a1, whose index is included in incl_ind, returns elements that do not overlap between the row in a1 and a2.
Parameters: |
|
---|
Returns: |
|
---|
Source code in src/measureia/measure_IA_base.py
@staticmethod
def setdiff_omit(a1, a2, incl_ind):
"""For rows in nested list a1, whose index is included in incl_ind, returns elements that do not overlap between
the row in a1 and a2.
Parameters
----------
a1 : nested list
List of lists or arrays where indicated rows need to be compared to a2
a2 : list or array
Elements to be compared to the row in a1 [and not included in return values].
incl_ind : list or array
Indices of rows in a1 to be compared to a2.
Returns
-------
nested list
For each included row in a1, the not-overlapping elements between a1 and a2
"""
diff = []
for i in np.arange(0, len(a1)):
if np.isin(i, incl_ind):
setdiff = np.setdiff1d(a1[i], a2)
diff.append(setdiff)
del setdiff
return diff
assign_jackknife_patches(data, randoms_data, num_jk)
Assigns jackknife patches to data and randoms given a number of patches. Based on https://github.com/esheldon/kmeans_radec
Parameters: |
|
---|
Returns: |
|
---|
Source code in src/measureia/measure_IA_base.py
def assign_jackknife_patches(self, data, randoms_data, num_jk):
"""Assigns jackknife patches to data and randoms given a number of patches.
Based on https://github.com/esheldon/kmeans_radec
Parameters
----------
data : dict
Dictionary containing position and shape sample data. Keywords: "RA", "DEC", "RA_shape_sample",
"DEC_shape_sample"
randoms_data : dict
Dictionary containing position and shape sample data of randoms. Keywords: "RA", "DEC", "RA_shape_sample",
"DEC_shape_sample"
num_jk : int
Number of jackknife patches
Returns
-------
dict
Dictionary with patch numbers for each sample. Keywords: 'position', 'shape', 'randoms_position',
'randoms_shape'
"""
jk_patches = {}
# Read the randoms file from which the jackknife regions will be created
RA = randoms_data['RA']
DEC = randoms_data['DEC']
# Define a number of jaccknife regions and find their centres using kmans
X = np.column_stack((RA, DEC))
km = kmeans_sample(X, num_jk, maxiter=100, tol=1.0e-5)
jk_labels = km.labels
jk_patches['randoms_position'] = jk_labels
RA = randoms_data['RA_shape_sample']
DEC = randoms_data['DEC_shape_sample']
X2 = np.column_stack((RA, DEC))
jk_labels = km.find_nearest(X2)
jk_patches['randoms_shape'] = jk_labels
RA = data['RA']
DEC = data['DEC']
X2 = np.column_stack((RA, DEC))
jk_labels = km.find_nearest(X2)
jk_patches['position'] = jk_labels
RA = data['RA_shape_sample']
DEC = data['DEC_shape_sample']
X2 = np.column_stack((RA, DEC))
jk_labels = km.find_nearest(X2)
jk_patches['shape'] = jk_labels
return jk_patches