Automatic documentation generated from docstrings

This page is auto-generated from the docstrings of functions, methods, classes, and modules. Each lowest-level module in Mass2 (i.e., each python file) in mass2.core that you want documented and indexed for searching should be listed in this docstrings.md file. The non-core docstrings contain the docstrings for all modules other than mass2.core.

Core

mass2.core: Core Mass2 functionality, including file I/O, microcalorimeter channel bookkeeping, recipes, fitting, filtering, and more.

Data structures and methods for handling a single microcalorimeter channel's pulse data and metadata.

BadChannel dataclass

A wrapper around Channel that includes error information.

Source code in mass2/core/channel.py
1496
1497
1498
1499
1500
1501
1502
1503
@dataclass(frozen=True)
class BadChannel:
    """A wrapper around Channel that includes error information."""

    ch: Channel
    error_type: type | None
    error_msg: str
    backtrace: str | None

Channel dataclass

A single microcalorimeter channel's pulse data and associated metadata.

Source code in mass2/core/channel.py
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
@dataclass(frozen=True)  # noqa: PLR0904
class Channel:
    """A single microcalorimeter channel's pulse data and associated metadata."""

    df: pl.DataFrame = field(repr=False)
    header: ChannelHeader = field(repr=True)
    npulses: int
    subframediv: int | None = None
    noise: NoiseChannel | None = field(default=None, repr=False)
    good_expr: pl.Expr = field(default_factory=alwaysTrue)
    df_history: list[pl.DataFrame] = field(default_factory=list, repr=False)
    steps: Recipe = field(default_factory=Recipe.new_empty, repr=False)
    steps_elapsed_s: list[float] = field(default_factory=list)
    transform_raw: Callable | None = None

    def __post_init__(self) -> None:
        # If column "pulse" exists and is an Array, make sure it has the same number of samples as the header
        pulse_col = "pulse"
        if pulse_col in self.df.columns:
            dtype = self.df[pulse_col].dtype
            if isinstance(dtype, pl.Array) and dtype.size != self.header.n_samples:
                raise ValueError(f"Column '{pulse_col}' has array width {dtype.size} but header.n_samples={self.header.n_samples}")

    @property
    def shortname(self) -> str:
        """A short name for this channel, suitable for plot titles."""
        return self.header.description

    @property
    def ch_num(self) -> int:
        "Channel number, from the filename"
        return self.header.ch_num

    @property
    def frametime_s(self) -> float:
        "Sample (or frame) period in seconds, from the file header"
        return self.header.frametime_s

    @property
    def n_presamples(self) -> int:
        "Pretrigger samples in each pulse record, from the file header"
        return self.header.n_presamples

    @property
    def n_samples(self) -> int:
        "Samples per pulse, from the file header"
        return self.header.n_samples

    def mo_stepplots(self) -> mo.ui.dropdown:
        """Marimo UI element to choose and display step plots, with a dropdown to choose channel number."""
        desc_ind = {step.description: i for i, step in enumerate(self.steps)}
        first_non_summarize_step = self.steps[0]
        for step in self.steps:
            if isinstance(step, SummarizeStep):
                continue
            first_non_summarize_step = step
            break
        mo_ui = mo.ui.dropdown(
            desc_ind,
            value=first_non_summarize_step.description,
            label=f"choose step for ch {self.ch_num}",
        )

        def show() -> mo.Html:
            """Show the selected step plot."""
            return self._mo_stepplots_explicit(mo_ui)

        def step_ind() -> Any:
            """Get the selected step index from the dropdown item, if any."""
            return mo_ui.value

        mo_ui.show = show
        mo_ui.step_ind = step_ind
        return mo_ui

    def _mo_stepplots_explicit(self, mo_ui: mo.ui.dropdown) -> mo.Html:
        """Marimo UI element to choose and display step plots."""
        step_ind = mo_ui.value
        self.step_plot(step_ind)
        fig = plt.gcf()
        return mo.vstack([mo_ui, misc.show(fig)])

    def get_step(self, index: int) -> tuple[RecipeStep, int]:
        """Get the step at the given index, supporting negative indices."""
        # normalize the index to a positive index
        if index < 0:
            index = len(self.steps) + index
        step = self.steps[index]
        return step, index

    def step_plot(self, step_ind: int, **kwargs: Any) -> plt.Axes:
        """Make a debug plot for the given step index, supporting negative indices."""
        step, step_ind = self.get_step(step_ind)
        if step_ind + 1 == len(self.df_history):
            df_after = self.df
        else:
            df_after = self.df_history[step_ind + 1]
        return step.dbg_plot(df_after, **kwargs)

    def hist(
        self,
        col: str,
        bin_edges: ArrayLike,
        use_good_expr: bool = True,
        use_expr: pl.Expr = pl.lit(True),
    ) -> tuple[NDArray, NDArray]:
        """Compute a histogram of the given column, optionally filtering by good_expr and use_expr."""
        if use_good_expr and self.good_expr is not True:
            # True doesn't implement .and_, haven't found a exper literal equivalent that does
            # so we special case True
            filter_expr = self.good_expr.and_(use_expr)
        else:
            filter_expr = use_expr

        # Group by the specified column and filter using good_expr
        df_small = (self.df.lazy().filter(filter_expr).select(col)).collect()

        values = df_small[col]
        bin_centers, counts = misc.hist_of_series(values, bin_edges)
        return bin_centers, counts

    def plot_hist(
        self,
        col: str,
        bin_edges: ArrayLike,
        axis: plt.Axes | None = None,
        use_good_expr: bool = True,
        use_expr: pl.Expr = pl.lit(True),
    ) -> tuple[NDArray, NDArray]:
        """Compute and plot a histogram of the given column, optionally filtering by good_expr and use_expr."""
        if axis is None:
            _, ax = plt.subplots()  # Create a new figure if no axis is provided
        else:
            ax = axis

        bin_centers, counts = self.hist(col, bin_edges=bin_edges, use_good_expr=use_good_expr, use_expr=use_expr)
        _, step_size = misc.midpoints_and_step_size(bin_edges)
        plt.step(bin_centers, counts, where="mid")

        # Customize the plot
        ax.set_xlabel(str(col))
        ax.set_ylabel(f"Counts per {step_size:.02f} unit bin")
        ax.set_title(f"Histogram of {col} for {self.shortname}")

        plt.tight_layout()
        return bin_centers, counts

    def plot_hists(
        self,
        col: str,
        bin_edges: ArrayLike,
        group_by_col: str,
        axis: plt.Axes | None = None,
        use_good_expr: bool = True,
        use_expr: pl.Expr = pl.lit(True),
        skip_none: bool = True,
    ) -> tuple[NDArray, dict[str, NDArray]]:
        """
        Plots histograms for the given column, grouped by the specified column.

        Parameters:
        - col (str): The column name to plot.
        - bin_edges (array-like): The edges of the bins for the histogram.
        - group_by_col (str): The column name to group by. This is required.
        - axis (matplotlib.Axes, optional): The axis to plot on. If None, a new figure is created.
        """
        if axis is None:
            _, ax = plt.subplots()  # Create a new figure if no axis is provided
        else:
            ax = axis

        if use_good_expr and self.good_expr is not True:
            # True doesn't implement .and_, haven't found a exper literal equivalent that does
            # so we special case True
            filter_expr = self.good_expr.and_(use_expr)
        else:
            filter_expr = use_expr

        # Group by the specified column and filter using good_expr
        df_small = (self.df.lazy().filter(filter_expr).select(col, group_by_col)).collect().sort(group_by_col, descending=False)

        # Plot a histogram for each group
        counts_dict: dict[str, NDArray] = {}
        for (group_name,), group_data in df_small.group_by(group_by_col, maintain_order=True):
            if group_name is None and skip_none:
                continue
            # Get the data for the column to plot
            values = group_data[col]
            _, step_size = misc.midpoints_and_step_size(bin_edges)
            bin_centers, counts = misc.hist_of_series(values, bin_edges)
            group_name_str = str(group_name)
            counts_dict[group_name_str] = counts
            plt.step(bin_centers, counts, where="mid", label=group_name_str)
            # Plot the histogram for the current group
            # if group_name == "EBIT":
            #     ax.hist(values, bins=bin_edges, alpha=0.9, color="k", label=group_name_str)
            # else:
            #     ax.hist(values, bins=bin_edges, alpha=0.5, label=group_name_str)
            # bin_centers, counts = misc.hist_of_series(values, bin_edges)
            # plt.plot(bin_centers, counts, label=group_name)
        # Customize the plot
        ax.set_xlabel(str(col))
        if len(counts_dict) > 0:
            ax.set_ylabel(f"Counts per {step_size:.02f} unit bin")
        ax.set_title(f"Histogram of {col} grouped by {group_by_col}")

        # Add a legend to label the groups
        ax.legend(title=group_by_col)

        plt.tight_layout()
        return bin_centers, counts_dict

    def plot_scatter(  # noqa: PLR0917
        self,
        x_col: str,
        y_col: str,
        cont_color_col: str | None = None,
        color_col: str | None = None,
        use_expr: pl.Expr = pl.lit(True),
        use_good_expr: bool = True,
        skip_none: bool = True,
        axis: plt.Axes | None = None,
        annotate: bool = False,
        max_points: int | None = None,
        extended_title: bool = True,
    ) -> None:
        """Generate a scatter plot of `y_col` vs `x_col`, optionally colored by `color_col`.

        Parameters
        ----------
        x_col : str
            Name of the column to put on the x axis
        y_col : str
            Name of the column to put on the y axis
        cont_color_col : str | None, optional
            Name of the column to use for continuously coloring points, by default None
        color_col : str | None, optional
            Name of the column to discretely color points by (generally a low cardinality
            category like "state_label"), by default None
            At least of `cont_color_col` and `color_col` must be None
        use_expr : pl.Expr, optional
            An expression to select plottable points, by default pl.lit(True)
        use_good_expr : bool, optional
            Whether to apply the object's `good_expr` before plotting, by default True
        skip_none : bool, optional
            Whether to skip color categories with no name, by default True
        axis : plt.Axes | None, optional
            Axes to plot on, by default None
        annotate : bool, optional
            Whether to annotate points that are hovered over or clicked on by the mouse, by default True
        max_points: int, optional
            Maximum number of points allowed in scatter plot (or if None, no maximum). To ensure representative
            from all portions of the data, only 1 of each consecutive N points will be plotted, with N chosen to
            be consistent with the `max_points` requirement.
        extended_title : bool, optional
            Whether to represent the use and good expressions as lines 2-3 in the plot title,
        """
        # You can't have both kinds of colors: you either use `color_col` for categorical coloring,
        # or cont_color_col for continuous coloring, or neither.
        assert color_col is None or cont_color_col is None

        if axis is None:
            fig = plt.figure()
            axis = plt.gca()
        plt.sca(axis)  # set current axis so I can use plt api
        fig = plt.gcf()
        filter_expr = use_expr
        if use_good_expr:
            filter_expr = self.good_expr.and_(use_expr)
        index_name = "pulse_idx"
        # Caused errors in Polars 1.35 if this was "index". See issue #85.

        # Plot only 1 data value out of every n, if max_points argument is an integer.
        # Compute n from the ratio of all points to max_points.
        plot_every_nth = 1
        if max_points is not None:
            if max_points < self.npulses:
                plot_every_nth = 1 + (self.npulses - 1) // max_points

        columns_to_keep = [x_col, y_col, index_name]
        if color_col is not None:
            columns_to_keep.append(color_col)
        if cont_color_col is not None:
            columns_to_keep.append(cont_color_col)
        df_small = (
            self.df.lazy()
            .with_row_index(name=index_name)
            .filter(filter_expr)
            .select(*columns_to_keep)
            .gather_every(plot_every_nth)
            .collect()
        )
        lines_pnums: list[tuple[plt.Line2D, pl.Series]] = []

        if cont_color_col is not None:
            line = plt.scatter(
                df_small.select(x_col).to_series(),
                df_small.select(y_col).to_series(),
                s=3,
                c=df_small.select(cont_color_col).to_series(),
            )

        else:
            for (name,), data in df_small.group_by(color_col, maintain_order=True):
                if name is None and skip_none and color_col is not None:
                    continue
                (line,) = plt.plot(
                    data.select(x_col).to_series(),
                    data.select(y_col).to_series(),
                    ".",
                    label=name,
                )
                lines_pnums.append((line, data.select(index_name).to_series()))

        if annotate:
            annotation = axis.annotate(
                "",
                xy=(0, 0),
                xytext=(-20, 20),
                textcoords="offset points",
                bbox=dict(boxstyle="round", fc="w"),
                arrowprops=dict(arrowstyle="->"),
            )
            annotation.set_visible(False)

            def update_note(points: list) -> None:
                """Generate a matplotlib hovering note about the data point index

                Parameters
                ----------
                points : list
                    List of the plotted data points that are hovered over
                """
                # TODO: this only works if the first line object has the pulse we want.
                line, pnum = lines_pnums[0]
                x, y = line.get_data()
                annotation.xy = (x[points[0]], y[points[0]])
                text2 = " ".join([str(pnum[int(n)]) for n in points])
                if len(points) > 1:
                    text = f"Pulses [{text2}]"
                else:
                    text = f"Pulse {text2}"
                annotation.set_text(text)
                annotation.get_bbox_patch().set_alpha(0.75)

            def hover(event: MouseEvent) -> None:
                """Callback to be used when mouse hovers near a plotted point

                Parameters
                ----------
                event : MouseEvent
                    The mouse-related event; contains location information
                """
                vis = annotation.get_visible()
                if event.inaxes != axis:
                    return
                cont, ind = line.contains(event)
                if cont:
                    update_note(ind["ind"])
                    annotation.set_visible(True)
                    fig.canvas.draw_idle()
                elif vis:
                    annotation.set_visible(False)
                    fig.canvas.draw_idle()

            def click(event: MouseEvent) -> None:
                """Callback to be used when mouse clicks near a plotted point

                Parameters
                ----------
                event : MouseEvent
                    The mouse-related event; contains location information
                """
                if event.inaxes != axis:
                    return
                cont, ind = line.contains(event)
                if cont:
                    pnum = lines_pnums[0][1]
                    rownum = pnum[int(ind["ind"][0])]
                    print(f"This is pulse# {rownum}")
                    print(self.df.drop("pulse").row(rownum, named=True))

            fig.canvas.mpl_connect("motion_notify_event", hover)
            fig.canvas.mpl_connect("button_press_event", click)

        plt.xlabel(str(x_col))
        plt.ylabel(str(y_col))

        if extended_title:
            title_parts = [self.header.description]

            def truncated_str(s: str, max: int = 50) -> str:
                if len(s) <= 50:
                    return s
                return s[:50] + "..."

            if use_expr is not pl.lit(True):
                usestr = truncated_str("Use: " + str(use_expr))
                title_parts.append(usestr)
            title_parts.append("Good: " + truncated_str(str(self.good_expr)))
            title_str = "\n".join(title_parts)
        else:
            title_str = self.header.description
        plt.title(title_str)
        if color_col is not None:
            plt.legend(title=color_col)
        plt.tight_layout()

    def plot_pulses(  # noqa: PLR0917
        self,
        length: int = 30,
        skip: int = 0,
        random: bool = False,
        record_numbers: Collection[Any] | pl.Series | None = None,
        subtract_baseline: bool = False,
        derivative: bool = False,
        summarize: bool = True,
        summary_columns: Collection[Any] | None = None,
        pulse_field: str = "pulse",
        use_expr: pl.Expr = pl.lit(True),
        use_good_expr: bool = True,
        axis: plt.Axes | None = None,
        cm: str | Colormap = "viridis_r",
    ) -> None:
        """Plot some example pulses

        Parameters
        ----------
        length : int, optional
            How many pulses to plot, by default 30
        skip : int, optional
            Start plotting at this pulse record number, by default 0
        random : bool, optional
            Whether to plot `length` randomly selected records, by default False
            If True, `skip` is ignored.
        record_numbers : Collection[Any] | pl.Series | None, optional
            Plot the specified records, numbered from 0 for the first in the dataframe, by default None.
            If given, `length`, `skip`, and `random` are ignored.
        subtract_baseline : bool, optional
            Whether to subtract the pretrigger mean before plotting each record, by default False
        derivative : bool, optional
            Whether to plot the "derivative" of a pulse (actually the successive differences), by default False
        summarize : bool, optional
            Whether to summarize key facts about each plotted pulse to the terminal, by default True
        summary_columns : Collection[Any] | None, optional
            Which specific data columns to report in the summary to the terminal, by default None
            If None, then a pre-selected set are reported.
        pulse_field : str
            The column name in the polars dataframe where plottable pulses, by default "pulse"
        use_expr : pl.Expr, optional
            An expression to select plottable points, by default pl.lit(True)
        use_good_expr : bool, optional
            Whether to apply the object's `good_expr` before plotting, by default True
            If True, then the existing `good_expr` will be applied AND the `use_expr` will be, too.
        axis : plt.Axes | None, optional
            Axes to plot on, by default None
            If None, create a new figure.
        cm : str | Colormap, optional
            The colormap to use for distinguishing pulses, by default "viridis_r"
        """
        pulse_type = self.df[pulse_field].dtype
        assert pulse_type in (pl.Array, pl.List), (  # noqa: PLR6201
            f"Cannot plot column '{pulse_field}' as pulse records: not a pl.Array or pl.List type"
        )

        if axis is None:
            _, axis = plt.subplots()  # Create a new figure if no axis is provided

        if use_good_expr and self.good_expr is not True:
            # True doesn't implement .and_, haven't found a exper literal equivalent that does
            # so we special case True
            filter_expr = self.good_expr.and_(use_expr)
        else:
            filter_expr = use_expr

        if isinstance(cm, str):
            cmap = plt.get_cmap(cm)
        else:
            cmap = cm

        lf = self.df.lazy().with_row_index("Record #")
        if record_numbers is None:
            if random:
                title = f"{length} random pulses"
                lf = lf.filter(filter_expr).collect().sample(length).lazy().sort("Record #")
            else:
                lf = lf.filter(filter_expr).slice(skip, length)
                title = f"{length} selected pulses"
        else:
            title = f"Pulses #{record_numbers}"
            lf = lf.filter(pl.col("Record #").is_in(record_numbers))
        plt.title(f"{title} from Chan {self.ch_num}")
        if summarize:
            # Preferred data info to print to terminal.
            if summary_columns is None:
                summary_columns = [
                    "Record #",
                    "pretrig_mean",
                    "pulse_rms",
                    "pulse_average",
                    "rise_time",
                    "peak_value",
                    "energy_5lagy",
                    "state_label",
                ]
            # Remove preferred column if it doesn't exist
            columns = [c for c in summary_columns if c in lf.collect_schema().names()]
            summary_df = lf.select(columns).collect()
            summary_df.show(limit=None)

        frametime_ms = self.frametime_s * 1e3
        sample_x = np.arange(self.header.n_samples) - self.header.n_presamples

        def samples2ms(s: ArrayLike) -> ArrayLike:
            return np.asarray(s) * frametime_ms

        def ms2samples(ms: ArrayLike) -> ArrayLike:
            return np.asarray(ms) / frametime_ms

        upper_axis = axis.secondary_xaxis("top", functions=(samples2ms, ms2samples))
        upper_axis.set_xlabel("Time after trigger (ms)")
        plt.xlabel("Samples after trigger")

        plot_columns = (pulse_field, "pretrig_mean")
        df = lf.select(plot_columns).collect()
        N = len(df)
        pulses = df[pulse_field]
        ptmean = df["pretrig_mean"]
        for i in range(N):
            pulse = pulses[i].to_numpy()
            color = cmap(i / N)
            if subtract_baseline:
                pulse = pulse - ptmean[i]  # noqa: PLR6104
            if derivative:
                pulse = np.hstack((0, np.diff(pulse)))
            axis.plot(sample_x, pulse, color=color)

    def good_series(self, col: str, use_expr: pl.Expr = pl.lit(True)) -> pl.Series:
        """Return a Polars Series of the given column, filtered by good_expr and use_expr."""
        return mass2.misc.good_series(self.df, col, self.good_expr, use_expr)

    @property
    def last_avg_pulse(self) -> NDArray | None:
        """Return the average pulse stored in the last recipe step that's an optimal filter step

        Returns
        -------
        NDArray | None
            The last filtering step's signal model, or None if no such step
        """
        for step in reversed(self.steps):
            if isinstance(step, OptimalFilterStep):
                return step.filter_maker.signal_model
        return None

    @property
    def last_filter(self) -> NDArray | None:
        """Return the average pulse stored in the last recipe step that's an optimal filter step

        Returns
        -------
        NDArray | None
            The last filtering step's signal model, or None if no such step
        """
        for step in reversed(self.steps):
            if isinstance(step, OptimalFilterStep):
                return step.filter.values
        return None

    @property
    def last_v_over_dv(self) -> float | None:
        """Return the predicted V/dV stored in the last recipe step that's an optimal filter step

        Returns
        -------
        float | None
            The last filtering step's predicted V/dV ratio, or None if no such step
        """
        for step in reversed(self.steps):
            if isinstance(step, OptimalFilterStep):
                return step.filter.predicted_v_over_dv
        return None

    @property
    def last_noise_psd(self) -> tuple[NDArray, NDArray] | None:
        """Return the noise PSD stored in the last recipe step that's an optimal filter step

        Returns
        -------
        tuple[NDArray, NDArray] | None
            The last filtering step's (frequencies, noise spectrum), or None if no such step
        """
        for step in reversed(self.steps):
            if isinstance(step, OptimalFilterStep) and step.spectrum is not None:
                return step.spectrum.frequencies, step.spectrum.psd
        return None

    @property
    def last_noise_autocorrelation(self) -> NDArray | None:
        """Return the noise autocorrelation stored in the last recipe step that's an optimal filter step

        Returns
        -------
        NDArray | None
            The last filtering step's noise autocorrelation, or None if no such step
        """
        for step in reversed(self.steps):
            if isinstance(step, OptimalFilterStep) and step.spectrum is not None:
                return step.spectrum.autocorr_vec
        return None

    def rough_cal_combinatoric(
        self,
        line_names: list[str],
        uncalibrated_col: str,
        calibrated_col: str,
        ph_smoothing_fwhm: float,
        n_extra: int = 3,
        use_expr: pl.Expr = pl.lit(True),
    ) -> "Channel":
        """Learn a rough calibration by trying all combinatorically possible peak assignments."""
        step = mass2.core.RoughCalibrationStep.learn_combinatoric(
            self,
            line_names,
            uncalibrated_col=uncalibrated_col,
            calibrated_col=calibrated_col,
            ph_smoothing_fwhm=ph_smoothing_fwhm,
            n_extra=n_extra,
            use_expr=use_expr,
        )
        return self.with_step(step)

    def rough_cal_combinatoric_height_info(
        self,
        line_names: list[str],
        line_heights_allowed: list[list[int]],
        uncalibrated_col: str,
        calibrated_col: str,
        ph_smoothing_fwhm: float,
        n_extra: int = 3,
        use_expr: pl.Expr = pl.lit(True),
    ) -> "Channel":
        """Learn a rough calibration by trying all combinatorically possible peak assignments,
        using known relative peak heights to limit the possibilities."""
        step = mass2.core.RoughCalibrationStep.learn_combinatoric_height_info(
            self,
            line_names,
            line_heights_allowed,
            uncalibrated_col=uncalibrated_col,
            calibrated_col=calibrated_col,
            ph_smoothing_fwhm=ph_smoothing_fwhm,
            n_extra=n_extra,
            use_expr=use_expr,
        )
        return self.with_step(step)

    def rough_cal(  # noqa: PLR0917
        self,
        line_names: list[str | float],
        uncalibrated_col: str = "filtValue",
        calibrated_col: str | None = None,
        use_expr: pl.Expr = pl.lit(True),
        max_fractional_energy_error_3rd_assignment: float = 0.1,
        min_gain_fraction_at_ph_30k: float = 0.25,
        fwhm_pulse_height_units: float = 75,
        n_extra_peaks: int = 10,
        acceptable_rms_residual_e: float = 10,
    ) -> "Channel":
        """Learn a rough calibration by trying to assign the 3 brightest peaks,
        then fitting a line to those and looking for other peaks that fit that line.
        """
        step = mass2.core.RoughCalibrationStep.learn_3peak(
            self,
            line_names,
            uncalibrated_col,
            calibrated_col,
            use_expr,
            max_fractional_energy_error_3rd_assignment,
            min_gain_fraction_at_ph_30k,
            fwhm_pulse_height_units,
            n_extra_peaks,
            acceptable_rms_residual_e,
        )
        return self.with_step(step)

    def with_step(self, step: RecipeStep) -> "Channel":
        """Return a new Channel with the given step applied to generate new columns in the dataframe."""
        t_start = time.time()
        df2 = step.calc_from_df(self.df)
        elapsed_s = time.time() - t_start
        ch2 = dataclasses.replace(
            self,
            df=df2,
            good_expr=step.good_expr,
            df_history=self.df_history + [self.df],
            steps=self.steps.with_step(step),
            steps_elapsed_s=self.steps_elapsed_s + [elapsed_s],
        )
        return ch2

    def with_steps(self, steps: Recipe) -> "Channel":
        """Return a new Channel with the given steps applied to generate new columns in the dataframe."""
        ch2 = self
        for step in steps:
            ch2 = ch2.with_step(step)
        return ch2

    def with_good_expr(self, good_expr: pl.Expr, replace: bool = False) -> "Channel":
        """Return a new Channel with the given good_expr, combined with the existing good_expr by "and",
        of by replacing it entirely if `replace` is True."""
        # the default value of self.good_expr is pl.lit(True)
        # and_(True) will just add visual noise when looking at good_expr and not affect behavior
        if not replace and good_expr is not True and not good_expr.meta.eq(pl.lit(True)):
            good_expr = good_expr.and_(self.good_expr)
        return dataclasses.replace(self, good_expr=good_expr)

    def with_column_map_step(self, input_col: str, output_col: str, f: Callable) -> "Channel":
        """f should take a numpy array and return a numpy array with the same number of elements"""
        step = mass2.core.recipe.ColumnAsNumpyMapStep([input_col], [output_col], good_expr=self.good_expr, use_expr=pl.lit(True), f=f)
        return self.with_step(step)

    def with_good_expr_pretrig_rms_and_postpeak_deriv(
        self, n_sigma_pretrig_rms: float = 20, n_sigma_postpeak_deriv: float = 20, replace: bool = False
    ) -> "Channel":
        """Set good_expr to exclude pulses with pretrigger RMS or postpeak derivative above outlier-resistant thresholds."""
        max_postpeak_deriv = misc.outlier_resistant_nsigma_above_mid(
            self.df["postpeak_deriv"].to_numpy(), nsigma=n_sigma_postpeak_deriv
        )
        max_pretrig_rms = misc.outlier_resistant_nsigma_above_mid(self.df["pretrig_rms"].to_numpy(), nsigma=n_sigma_pretrig_rms)
        good_expr = (pl.col("postpeak_deriv") < max_postpeak_deriv).and_(pl.col("pretrig_rms") < max_pretrig_rms)
        return self.with_good_expr(good_expr, replace)

    def with_range_around_median(self, col: str, range_up: float, range_down: float) -> "Channel":
        """Set good_expr to exclude pulses with `col` outside the given range around its median."""
        med = np.median(self.df[col].to_numpy())
        return self.with_good_expr(pl.col(col).is_between(med - range_down, med + range_up))

    def with_good_expr_below_nsigma_outlier_resistant(
        self, col_nsigma_pairs: Iterable[tuple[str, float]], replace: bool = False, use_prev_good_expr: bool = True
    ) -> "Channel":
        """Set good_expr to exclude pulses with any of the given columns above outlier-resistant thresholds.
        Always sets lower limit at 0, so don't use for values that can be negative
        """
        if use_prev_good_expr:
            df = self.df.lazy().select(pl.exclude("pulse")).filter(self.good_expr).collect()
        else:
            df = self.df
        for i, (col, nsigma) in enumerate(col_nsigma_pairs):
            max_for_col = misc.outlier_resistant_nsigma_above_mid(df[col].to_numpy(), nsigma=nsigma)
            this_iter_good_expr = pl.col(col).is_between(0, max_for_col)
            if i == 0:
                good_expr = this_iter_good_expr
            else:
                good_expr = good_expr.and_(this_iter_good_expr)
        return self.with_good_expr(good_expr, replace)

    def with_good_expr_nsigma_range_outlier_resistant(
        self, col_nsigma_pairs: Iterable[tuple[str, float]], replace: bool = False, use_prev_good_expr: bool = True
    ) -> "Channel":
        """Set good_expr to exclude pulses with any of the given columns above outlier-resistant thresholds.
        Always sets lower limit at 0, so don't use for values that can be negative
        """
        if use_prev_good_expr:
            df = self.df.lazy().select(pl.exclude("pulse")).filter(self.good_expr).collect()
        else:
            df = self.df
        for i, (col, nsigma) in enumerate(col_nsigma_pairs):
            min_for_col, max_for_col = misc.outlier_resistant_nsigma_range_from_mid(df[col].to_numpy(), nsigma=nsigma)
            this_iter_good_expr = pl.col(col).is_between(min_for_col, max_for_col)
            if i == 0:
                good_expr = this_iter_good_expr
            else:
                good_expr = good_expr.and_(this_iter_good_expr)
        return self.with_good_expr(good_expr, replace)

    @functools.cache
    def typical_peak_ind(self, col: str = "pulse") -> int:
        """Return the typical peak index of the given column, using the median peak index for the first 100 pulses."""
        raw = self.df.limit(100)[col].to_numpy()
        if self.transform_raw is not None:
            raw = self.transform_raw(raw)
        return int(np.median(raw.argmax(axis=1)))

    def summarize_pulses(self, col: str = "pulse", pretrigger_ignore_samples: int = 0, peak_index: int | None = None) -> "Channel":
        """Summarize the pulses, adding columns for pulse height, pretrigger mean, etc."""
        if peak_index is None:
            peak_index = self.typical_peak_ind(col)
        out_names = mass2.core.pulse_algorithms.result_dtype.names
        # mypy (incorrectly) thinks `out_names` might be None, and `list(None)` is forbidden. Assertion makes it happy again.
        assert out_names is not None
        outputs = list(out_names)
        step = SummarizeStep(
            inputs=[col],
            output=outputs,
            good_expr=self.good_expr,
            use_expr=pl.lit(True),
            frametime_s=self.frametime_s,
            peak_index=peak_index,
            pulse_col=col,
            pretrigger_ignore_samples=pretrigger_ignore_samples,
            n_presamples=self.n_presamples,
            transform_raw=self.transform_raw,
        )
        return self.with_step(step)

    def correct_pretrig_mean_jumps(
        self, uncorrected: str = "pretrig_mean", corrected: str = "ptm_jf", period: int = 4096
    ) -> "Channel":
        """Correct pretrigger mean jumps in the raw pulse data, writing to a new column."""
        step = mass2.core.recipe.PretrigMeanJumpFixStep(
            inputs=[uncorrected],
            output=[corrected],
            good_expr=self.good_expr,
            use_expr=pl.lit(True),
            period=period,
        )
        return self.with_step(step)

    def with_select_step(self, col_expr_dict: dict[str, pl.Expr]) -> "Channel":
        """
        This step is meant for interactive exploration; it's basically like the df.select() method, but it's saved as a step.
        """
        extract = mass2.misc.extract_column_names_from_polars_expr
        inputs: set[str] = set()
        for expr in col_expr_dict.values():
            inputs.update(extract(expr))
        step = mass2.core.recipe.SelectStep(
            inputs=list(inputs),
            output=list(col_expr_dict.keys()),
            good_expr=self.good_expr,
            use_expr=pl.lit(True),
            col_expr_dict=col_expr_dict,
        )
        return self.with_step(step)

    def with_categorize_step(self, category_condition_dict: dict[str, pl.Expr], output_col: str = "category") -> "Channel":
        """Add a recipe step that categorizes pulses based on the given conditions."""
        # ensure the first condition is True, to be used as a fallback
        first_expr = next(iter(category_condition_dict.values()))
        if not first_expr.meta.eq(pl.lit(True)):
            category_condition_dict = {"fallback": pl.lit(True), **category_condition_dict}
        extract = mass2.misc.extract_column_names_from_polars_expr
        inputs: set[str] = set()
        for expr in category_condition_dict.values():
            inputs.update(extract(expr))
        step = mass2.core.recipe.CategorizeStep(
            inputs=list(inputs),
            output=[output_col],
            good_expr=self.good_expr,
            use_expr=pl.lit(True),
            category_condition_dict=category_condition_dict,
        )
        return self.with_step(step)

    def compute_average_pulse(self, pulse_col: str = "pulse", use_expr: pl.Expr = pl.lit(True), limit: int = 2000) -> NDArray:
        """Compute an average pulse given a use expression.

        Parameters
        ----------
        pulse_col : str, optional
            Name of the column in self.df containing raw pulses, by default "pulse"
        use_expr : pl.Expr, optional
            Selection (in addition to self.good_expr) to use, by default pl.lit(True)
        limit : int, optional
            Use no more than this many pulses, by default 2000

        Returns
        -------
        NDArray
            _description_
        """
        avg_pulse = (
            self.df.lazy()
            .filter(self.good_expr)
            .filter(use_expr)
            .select(pulse_col)
            .limit(limit)
            .collect()
            .to_series()
            .to_numpy()
            .mean(axis=0)
        )
        avg_pulse -= avg_pulse[: self.n_presamples].mean()
        return avg_pulse

    def filter5lag(
        self,
        pulse_col: str = "pulse",
        peak_y_col: str = "5lagy",
        peak_x_col: str = "5lagx",
        f_3db: float = 25e3,
        use_expr: pl.Expr = pl.lit(True),
        time_constant_s_of_exp_to_be_orthogonal_to: float | None = None,
        fourier: bool = False,
        longest_autocorr_filter: int = 10_000,
    ) -> "Channel":
        """Compute a 5-lag optimal filter and apply it.

        Parameters
        ----------
        pulse_col : str, optional
            Which column contains raw data, by default "pulse"
        peak_y_col : str, optional
            Column to contain the optimal filter results, by default "5lagy"
        peak_x_col : str, optional
            Column to contain the 5-lag filter's estimate of arrival-time/phase, by default "5lagx"
        f_3db : float, optional
            A low-pass filter 3 dB point to apply to the computed filter, by default 25e3
        use_expr : pl.Expr, optional
            An expression to select pulses for averaging, by default pl.lit(True)
        time_constant_s_of_exp_to_be_orthogonal_to : float | None, optional
            Optionally an exponential decay time to make the filter insensitive to, by default None
        fourier : bool, optional
            Whether to use filters constructed in the Fourier domain, by default False
            The alternative, default choice is to construct time-domain filters using the noise autocorrelation
        longest_autocorr_filter: int, optional
            Don't compute noise autocorrelation-based filters if the record length exceeds this limit, by default 10000.
            (Filters based on very long autocorrelations take O(N^2) operations and memory to generate.)
            If exceeded, filters will be Fourier-space filters.

        Returns
        -------
        Channel
            This channel with a Filter5LagStep added to the recipe.
        """
        assert self.noise
        shortening_5lag = 4  # 5-lag filters shorten the pulse by 2 on each end
        n_samples_5lag = self.n_samples - shortening_5lag
        if not fourier:
            suggest = "use `fourier=True` or increase `longest_autocorr_filter`"
            assert n_samples_5lag <= longest_autocorr_filter, (
                f"Autocorrelation not computed for records exceeding {longest_autocorr_filter}; {suggest}"
            )

        noiseresult = self.noise.spectrum(skip_autocorr_if_length_over=longest_autocorr_filter)
        if not fourier:
            assert noiseresult.autocorr_vec is not None, f"Autocorrelation not computed; {suggest}"
            Nac = len(noiseresult.autocorr_vec)
            assert n_samples_5lag <= Nac, f"Autocorrelation result ({Nac}) is too short for {n_samples_5lag}; {suggest}"

        avg_pulse = self.compute_average_pulse(pulse_col=pulse_col, use_expr=use_expr)
        filter_maker = FilterMaker(
            signal_model=avg_pulse,
            n_pretrigger=self.n_presamples,
            noise_psd=noiseresult.psd,
            noise_autocorr=noiseresult.autocorr_vec,
            sample_time_sec=self.frametime_s,
        )

        if time_constant_s_of_exp_to_be_orthogonal_to is None:
            if fourier:
                filter5lag = filter_maker.compute_fourier(f_3db=f_3db)
            else:
                filter5lag = filter_maker.compute_5lag(f_3db=f_3db)
        else:
            if fourier:
                raise NotImplementedError(
                    "Can't make filters orthogonal to an exponential AND in Fourier domain (i.e. without noise autocorrelation)"
                )
            filter5lag = filter_maker.compute_5lag_noexp(f_3db=f_3db, exp_time_seconds=time_constant_s_of_exp_to_be_orthogonal_to)
        step = OptimalFilterStep(
            inputs=["pulse"],
            output=[peak_x_col, peak_y_col],
            good_expr=self.good_expr,
            use_expr=use_expr,
            filter=filter5lag,
            spectrum=noiseresult,
            filter_maker=filter_maker,
            transform_raw=self.transform_raw,
        )
        return self.with_step(step)

    def compute_ats_model(self, pulse_col: str, use_expr: pl.Expr = pl.lit(True), limit: int = 2000) -> tuple[NDArray, NDArray]:
        """Compute the average pulse and arrival-time model for an ATS filter.
        We use the first `limit` pulses that pass `good_expr` and `use_expr`.

        Parameters
        ----------
        pulse_col : str
            _description_
        use_expr : pl.Expr, optional
            _description_, by default pl.lit(True)
        limit : int, optional
            _description_, by default 2000

        Returns
        -------
        tuple[NDArray, NDArray]
            _description_
        """
        df = (
            self.df.lazy()
            .filter(self.good_expr)
            .filter(use_expr)
            .limit(limit)
            .select(pulse_col, "pulse_rms", "promptness", "pretrig_mean")
            .collect()
        )

        # Adjust promptness: subtract a linear trend with pulse_rms
        prms = df["pulse_rms"].to_numpy()
        promptness = df["promptness"].to_numpy()
        poly = np.poly1d(np.polyfit(prms, promptness, 1))
        df = df.with_columns(promptshifted=(promptness - poly(prms)))

        # Rescale promptness quadratically to span approximately [-0.5, +0.5], dropping any pulses with abs(t) > 0.45.
        x, y, z = np.percentile(df["promptshifted"], [10, 50, 90])
        A = np.array([[x * x, x, 1], [y * y, y, 1], [z * z, z, 1]])
        param = np.linalg.solve(A, [-0.4, 0, +0.4])
        ATime = np.poly1d(param)(df["promptshifted"])
        df = df.with_columns(ATime=ATime).filter(np.abs(ATime) < 0.45).drop("promptshifted")

        # Compute mean pulse and dt model as the offset and slope of a linear fit to each pulse sample vs ATime
        pulse = df["pulse"].to_numpy()
        avg_pulse = np.zeros(self.n_samples, dtype=float)
        dt_model = np.zeros(self.n_samples, dtype=float)
        for i in range(self.n_presamples, self.n_samples):
            slope, offset = np.polyfit(df["ATime"], (pulse[:, i] - df["pretrig_mean"]), 1)
            dt_model[i] = -slope
            avg_pulse[i] = offset
        return avg_pulse, dt_model

    def filterATS(
        self,
        pulse_col: str = "pulse",
        peak_y_col: str = "ats_y",
        peak_x_col: str = "ats_x",
        f_3db: float = 25e3,
        use_expr: pl.Expr = pl.lit(True),
    ) -> "Channel":
        """Compute an arrival-time-safe (ATS) optimal filter and apply it.

        Parameters
        ----------
        pulse_col : str, optional
            Which column contains raw data, by default "pulse"
        peak_y_col : str, optional
            Column to contain the optimal filter results, by default "ats_y"
        peak_x_col : str, optional
            Column to contain the ATS filter's estimate of arrival-time/phase, by default "ats_x"
        f_3db : float, optional
            A low-pass filter 3 dB point to apply to the computed filter, by default 25e3
        use_expr : pl.Expr, optional
            An expression to select pulses for averaging, by default pl.lit(True)

        Returns
        -------
        Channel
            This channel with a Filter5LagStep added to the recipe.
        """
        assert self.noise
        mprms = self.good_series("pulse_rms", use_expr).median()
        use = use_expr.and_(np.abs(pl.col("pulse_rms") / mprms - 1.0) < 0.3)
        limit = 4000
        avg_pulse, dt_model = self.compute_ats_model(pulse_col, use, limit)
        noiseresult = self.noise.spectrum()
        filter_maker = FilterMaker(
            signal_model=avg_pulse,
            dt_model=dt_model,
            n_pretrigger=self.n_presamples,
            noise_psd=noiseresult.psd,
            noise_autocorr=noiseresult.autocorr_vec,
            sample_time_sec=self.frametime_s,
        )
        filter_ats = filter_maker.compute_ats(f_3db=f_3db)
        step = OptimalFilterStep(
            inputs=["pulse"],
            output=[peak_x_col, peak_y_col],
            good_expr=self.good_expr,
            use_expr=use_expr,
            filter=filter_ats,
            spectrum=noiseresult,
            filter_maker=filter_maker,
            transform_raw=self.transform_raw,
        )
        return self.with_step(step)

    def good_df(self, cols: list[str] | pl.Expr = pl.all(), use_expr: pl.Expr = pl.lit(True)) -> pl.DataFrame:
        """Return a Polars DataFrame of the given columns, filtered by good_expr and use_expr."""
        good_df = self.df.lazy().filter(self.good_expr)
        if use_expr is not True:
            good_df = good_df.filter(use_expr)
        return good_df.select(cols).collect()

    def bad_df(self, cols: list[str] | pl.Expr = pl.all(), use_expr: pl.Expr = pl.lit(True)) -> pl.DataFrame:
        """Return a Polars DataFrame of the given columns, filtered by the inverse of good_expr, and use_expr."""
        bad_df = self.df.lazy().filter(self.good_expr.not_())
        if use_expr is not True:
            bad_df = bad_df.filter(use_expr)
        return bad_df.select(cols).collect()

    def good_serieses(self, cols: list[str], use_expr: pl.Expr = pl.lit(True)) -> list[pl.Series]:
        """Return a list of Polars Series of the given columns, filtered by good_expr and use_expr."""
        df2 = self.good_df(cols, use_expr)
        return [df2[col] for col in cols]

    def driftcorrect(
        self,
        indicator_col: str = "pretrig_mean",
        uncorrected_col: str = "5lagy",
        corrected_col: str | None = None,
        use_expr: pl.Expr = pl.lit(True),
    ) -> "Channel":
        """Correct for gain drift correlated with the given indicator column."""
        # by defining a seperate learn method that takes ch as an argument,
        # we can move all the code for the step outside of Channel
        step = DriftCorrectStep.learn(
            ch=self,
            indicator_col=indicator_col,
            uncorrected_col=uncorrected_col,
            corrected_col=corrected_col,
            use_expr=use_expr,
        )
        return self.with_step(step)

    def linefit(  # noqa: PLR0917
        self,
        line: GenericLineModel | SpectralLine | str | float,
        col: str,
        use_expr: pl.Expr = pl.lit(True),
        has_linear_background: bool = False,
        has_tails: bool = False,
        dlo: float = 50,
        dhi: float = 50,
        binsize: float = 0.5,
        params_update: lmfit.Parameters = lmfit.Parameters(),
    ) -> LineModelResult:
        """Fit a spectral line to the  binned data from the given column, optionally filtering by use_expr."""
        model = mass2.calibration.algorithms.get_model(line, has_linear_background=has_linear_background, has_tails=has_tails)
        pe = model.spect.peak_energy
        _bin_edges = np.arange(pe - dlo, pe + dhi, binsize)
        df_small = self.df.lazy().filter(self.good_expr).filter(use_expr).select(col).collect()
        bin_centers, counts = misc.hist_of_series(df_small[col], _bin_edges)
        params = model.guess(counts, bin_centers=bin_centers, dph_de=1)
        params["dph_de"].set(1.0, vary=False)
        print(f"before update {params=}")
        params = params.update(params_update)
        print(f"after update {params=}")
        result = model.fit(counts, params, bin_centers=bin_centers, minimum_bins_per_fwhm=3)
        result.set_label_hints(
            binsize=bin_centers[1] - bin_centers[0],
            ds_shortname=self.header.description,
            unit_str="eV",
            attr_str=col,
            states_hint=f"{use_expr=}",
            cut_hint="",
        )
        return result

    def step_summary(self) -> list[tuple[str, float]]:
        """Return a list of (step type name, elapsed time in seconds) for each step in the recipe."""
        return [(type(a).__name__, b) for (a, b) in zip(self.steps, self.steps_elapsed_s)]

    def __hash__(self) -> int:
        """Return a hash based on the object's id."""
        # needed to make functools.cache work
        # if self or self.anything is mutated, assumptions will be broken
        # and we may get nonsense results
        return hash(id(self))

    def __eq__(self, other: object) -> bool:
        """Return True if the other object is the same object (by id)."""
        # needed to make functools.cache work
        # if self or self.anything is mutated, assumptions will be broken
        # and we may get nonsense results
        # only checks if the ids match, does not try to be equal if all contents are equal
        return id(self) == id(other)

    @classmethod
    def from_ljh(
        cls,
        path: str | Path,
        noise_path: str | Path | None = None,
        keep_posix_usec: bool = False,
        transform_raw: Callable | None = None,
    ) -> "Channel":
        """Load a Channel from an LJH file, optionally with a NoiseChannel from a corresponding noise LJH file."""
        if not noise_path:
            noise_channel = None
        else:
            noise_channel = NoiseChannel.from_ljh(noise_path)
        ljh = mass2.LJHFile.open(path)
        df, header_df = ljh.to_polars(keep_posix_usec)
        header = ChannelHeader.from_ljh_header_df(header_df)
        channel = cls(
            df, header=header, npulses=ljh.npulses, subframediv=ljh.subframediv, noise=noise_channel, transform_raw=transform_raw
        )
        return channel

    @classmethod
    def from_off(cls, off: OffFile) -> "Channel":
        """Load a Channel from an OFF file."""
        assert off._mmap is not None
        df = pl.from_numpy(np.asarray(off._mmap))
        df = (
            df.select(
                pl.from_epoch("unixnano", time_unit="ns")
                .dt.cast_time_unit("us")
                .dt.convert_time_zone(_local_timezone_name)
                .alias("timestamp")
            )
            .with_columns(df)
            .select(pl.exclude("unixnano"))
        )
        df_header = pl.DataFrame(off.header)
        df_header = df_header.with_columns(pl.Series("Filename", [off.filename]))
        header = ChannelHeader(
            f"{os.path.split(off.filename)[1]}",
            off.filename,
            off.header["ChannelNumberMatchingName"],
            off.framePeriodSeconds,
            off._mmap["recordPreSamples"][0],
            off._mmap["recordSamples"][0],
            df_header,
        )
        channel = cls(df, header, off.nRecords, subframediv=off.subframediv)
        return channel

    def with_experiment_state_df(self, df_es: pl.DataFrame, force_timestamp_monotonic: bool = False) -> "Channel":
        """Add experiment states from an existing dataframe"""

        # Make sure experiment state dataframe and self.df agree on time zones. If not, convert the former.
        times = df_es["timestamp"]
        expt_state_time_type = times.dtype
        self_time_type = self.df["timestamp"].dtype
        assert isinstance(expt_state_time_type, Datetime)
        assert isinstance(self_time_type, Datetime)
        desired_time_zone = self_time_type.time_zone
        if desired_time_zone is None:
            desired_time_zone = _local_timezone_name
        if expt_state_time_type.time_zone != desired_time_zone:
            times = times.dt.convert_time_zone(desired_time_zone)
            df_es = df_es.with_columns(timestamp=times)

        if not self.df["timestamp"].is_sorted():
            df = self.df.select(pl.col("timestamp").cum_max().alias("timestamp")).with_columns(self.df.select(pl.exclude("timestamp")))
            # print("WARNING: in with_experiment_state_df, timestamp is not monotonic, forcing it to be")
            # print("This is likely a BUG in DASTARD.")
        else:
            df = self.df
        df2 = df.join_asof(df_es, on="timestamp", strategy="backward")
        return self.with_replacement_df(df2)

    def with_external_trigger_df(self, df_ext: pl.DataFrame) -> "Channel":
        """Add external trigger times from an existing dataframe"""
        df = self.df
        # Expect "subframecount" will be in the dataframe for LJH 2.2 files, but have to add it for OFF files:
        if "subframecount" not in df:
            df = self.df.with_columns(subframecount=pl.col("framecount") * self.subframediv)
        df2 = df.join_asof(df_ext, on="subframecount", strategy="backward", coalesce=False, suffix="_prev_ext_trig").join_asof(
            df_ext, on="subframecount", strategy="forward", coalesce=False, suffix="_next_ext_trig"
        )
        return self.with_replacement_df(df2)

    def with_replacement_df(self, df2: pl.DataFrame) -> "Channel":
        """Replace the dataframe with a new one, keeping all other attributes the same."""
        return dataclasses.replace(
            self,
            df=df2,
        )

    def with_columns(self, df2: pl.DataFrame) -> "Channel":
        """Append columns from df2 to the existing dataframe, keeping all other attributes the same."""
        df3 = self.df.with_columns(df2)
        return self.with_replacement_df(df3)

    def multifit_quadratic_gain_cal(
        self,
        multifit: MultiFit,
        previous_cal_step_index: int,
        calibrated_col: str,
        use_expr: pl.Expr = pl.lit(True),
    ) -> "Channel":
        """Fit multiple spectral lines, to create a quadratic gain calibration."""
        step = MultiFitQuadraticGainStep.learn(
            self,
            multifit_spec=multifit,
            previous_cal_step_index=previous_cal_step_index,
            calibrated_col=calibrated_col,
            use_expr=use_expr,
        )
        return self.with_step(step)

    def multifit_mass_cal(
        self,
        multifit: MultiFit,
        previous_cal_step_index: int,
        calibrated_col: str,
        use_expr: pl.Expr = pl.lit(True),
    ) -> "Channel":
        """Fit multiple spectral lines, to create a Mass1-style gain calibration."""
        step = MultiFitMassCalibrationStep.learn(
            self,
            multifit_spec=multifit,
            previous_cal_step_index=previous_cal_step_index,
            calibrated_col=calibrated_col,
            use_expr=use_expr,
        )
        return self.with_step(step)

    def concat_df(self, df: pl.DataFrame) -> "Channel":
        """Concat the given dataframe to the existing dataframe, keeping all other attributes the same.
        If the new frame `df` has a history and/or steps, those will be lost"""
        ch2 = Channel(
            mass2.core.misc.concat_dfs_with_concat_state(self.df, df),
            self.header,
            self.npulses,
            subframediv=self.subframediv,
            noise=self.noise,
            good_expr=self.good_expr,
        )
        # we won't copy over df_history and steps. I don't think you should use this when those are filled in?
        return ch2

    def concat_ch(self, ch: "Channel") -> "Channel":
        """Concat the given channel's dataframe to the existing dataframe, keeping all other attributes the same.
        If the new channel `ch` has a history and/or steps, those will be lost"""
        ch2 = self.concat_df(ch.df)
        return ch2

    def phase_correct_mass_specific_lines(
        self,
        indicator_col: str,
        uncorrected_col: str,
        line_names: Iterable[str | float],
        previous_cal_step_index: int,
        corrected_col: str | None = None,
        use_expr: pl.Expr = pl.lit(True),
    ) -> "Channel":
        """Apply phase correction to the given uncorrected column, where specific lines are used to judge the correction."""
        if corrected_col is None:
            corrected_col = uncorrected_col + "_pc"
        step = mass2.core.phase_correct_steps.phase_correct_mass_specific_lines(
            self,
            indicator_col,
            uncorrected_col,
            corrected_col,
            previous_cal_step_index,
            line_names,
            use_expr,
        )
        return self.with_step(step)

    def as_bad(self, error_type: type | None, error_msg: str, backtrace: str | None) -> "BadChannel":
        """Return a BadChannel object, which wraps this Channel and includes error information."""
        return BadChannel(self, error_type, error_msg, backtrace)

    def save_recipes(self, filename: str) -> dict[int, Recipe]:
        """Save the recipe steps to a pickle file, keyed by channel number."""
        steps = {self.ch_num: self.steps}
        misc.pickle_object(steps, filename)
        return steps

    def plot_summaries(self, use_expr_in: pl.Expr | None = None, downsample: int | None = None, log: bool = False) -> None:
        """Plot a summary of the data set, including time series and histograms of key pulse properties.

        Parameters
        ----------
        use_expr_in: pl.Expr | None, optional
            A polars expression to determine valid pulses, by default None. If None, use `self.good_expr`
        downsample: int | None, optional
            Plot only every one of `downsample` pulses in the scatter plots, by default None.
            If None, choose the smallest value so that no more than 10000 points appear
        log: bool, optional
            Whether to make the histograms have a logarithmic y-scale, by default False.
        """
        plt.figure()
        tpi_microsec = (self.typical_peak_ind() - self.n_presamples) * (1e6 * self.frametime_s)
        plottables = (
            ("pulse_rms", "Pulse RMS", "#dd00ff", None),
            ("pulse_average", "Pulse Avg", "purple", None),
            ("peak_value", "Peak value", "blue", None),
            ("pretrig_rms", "Pretrig RMS", "green", [0, 4000]),
            ("pretrig_mean", "Pretrig Mean", "#00ff26", None),
            ("postpeak_deriv", "Max PostPk deriv", "gold", [0, 200]),
            ("rise_time_µs", "Rise time (µs)", "orange", [-0.3 * tpi_microsec, 2 * tpi_microsec]),
            ("peak_time_µs", "Peak time (µs)", "red", [-0.3 * tpi_microsec, 2 * tpi_microsec]),
        )

        use_expr = self.good_expr if use_expr_in is None else use_expr_in

        if downsample is None:
            downsample = self.npulses // 10000
        downsample = max(downsample, 1)

        df = self.df.lazy().gather_every(downsample)
        df = df.with_columns(((pl.col("peak_index") - self.n_presamples) * (1e6 * self.frametime_s)).alias("peak_time_µs"))
        df = df.with_columns((pl.col("rise_time") * 1e6).alias("rise_time_µs"))
        existing_columns = df.collect_schema().names()
        preserve = [p[0] for p in plottables if p[0] in existing_columns]
        preserve.append("timestamp")
        df2 = df.filter(use_expr).select(preserve).collect()

        # Plot timeseries relative to 0 = the last 00 UT during or before the run.
        timestamp = df2["timestamp"].to_numpy()
        last_midnight = timestamp[-1].astype("datetime64[D]")
        hour_rel = (timestamp - last_midnight).astype(float) / 3600e6

        for i, (column_name, label, color, limits) in enumerate(plottables):
            if column_name not in df2:
                continue
            y = df2[column_name].to_numpy()

            # Time series scatter plots (left-hand panels)
            plt.subplot(len(plottables), 2, 1 + i * 2)
            plt.ylabel(label)
            plt.plot(hour_rel, y, ".", ms=1, color=color)
            if i == len(plottables) - 1:
                plt.xlabel("Time since last UT midnight (hours)")

            # Histogram (right-hand panels)
            plt.subplot(len(plottables), 2, 2 + i * 2)
            contents, _, _ = plt.hist(y, 200, range=limits, log=log, histtype="stepfilled", fc=color, alpha=0.5)
            if log:
                plt.ylim(ymin=contents.min())
        print(f"Plotting {len(y)} out of {self.npulses} data points")

    def fit_pulse(self, index: int = 0, col: str = "pulse", verbose: bool = True) -> LineModelResult:
        """Fit a single pulse to a 2-exponential-with-tail model, returning the fit result."""
        pulse = self.df[col][index].to_numpy()
        result = mass2.core.pulse_algorithms.fit_pulse_2exp_with_tail(pulse, npre=self.n_presamples, dt=self.frametime_s)
        if verbose:
            print(f"ch={self}")
            print(f"pulse index={index}")
            print(result.fit_report())
        return result

ch_num property

Channel number, from the filename

frametime_s property

Sample (or frame) period in seconds, from the file header

last_avg_pulse property

Return the average pulse stored in the last recipe step that's an optimal filter step

Returns:
  • NDArray | None –

    The last filtering step's signal model, or None if no such step

last_filter property

Return the average pulse stored in the last recipe step that's an optimal filter step

Returns:
  • NDArray | None –

    The last filtering step's signal model, or None if no such step

last_noise_autocorrelation property

Return the noise autocorrelation stored in the last recipe step that's an optimal filter step

Returns:
  • NDArray | None –

    The last filtering step's noise autocorrelation, or None if no such step

last_noise_psd property

Return the noise PSD stored in the last recipe step that's an optimal filter step

Returns:
  • tuple[NDArray, NDArray] | None –

    The last filtering step's (frequencies, noise spectrum), or None if no such step

last_v_over_dv property

Return the predicted V/dV stored in the last recipe step that's an optimal filter step

Returns:
  • float | None –

    The last filtering step's predicted V/dV ratio, or None if no such step

n_presamples property

Pretrigger samples in each pulse record, from the file header

n_samples property

Samples per pulse, from the file header

shortname property

A short name for this channel, suitable for plot titles.

__eq__(other)

Return True if the other object is the same object (by id).

Source code in mass2/core/channel.py
1228
1229
1230
1231
1232
1233
1234
def __eq__(self, other: object) -> bool:
    """Return True if the other object is the same object (by id)."""
    # needed to make functools.cache work
    # if self or self.anything is mutated, assumptions will be broken
    # and we may get nonsense results
    # only checks if the ids match, does not try to be equal if all contents are equal
    return id(self) == id(other)

__hash__()

Return a hash based on the object's id.

Source code in mass2/core/channel.py
1221
1222
1223
1224
1225
1226
def __hash__(self) -> int:
    """Return a hash based on the object's id."""
    # needed to make functools.cache work
    # if self or self.anything is mutated, assumptions will be broken
    # and we may get nonsense results
    return hash(id(self))

as_bad(error_type, error_msg, backtrace)

Return a BadChannel object, which wraps this Channel and includes error information.

Source code in mass2/core/channel.py
1411
1412
1413
def as_bad(self, error_type: type | None, error_msg: str, backtrace: str | None) -> "BadChannel":
    """Return a BadChannel object, which wraps this Channel and includes error information."""
    return BadChannel(self, error_type, error_msg, backtrace)

bad_df(cols=pl.all(), use_expr=pl.lit(True))

Return a Polars DataFrame of the given columns, filtered by the inverse of good_expr, and use_expr.

Source code in mass2/core/channel.py
1152
1153
1154
1155
1156
1157
def bad_df(self, cols: list[str] | pl.Expr = pl.all(), use_expr: pl.Expr = pl.lit(True)) -> pl.DataFrame:
    """Return a Polars DataFrame of the given columns, filtered by the inverse of good_expr, and use_expr."""
    bad_df = self.df.lazy().filter(self.good_expr.not_())
    if use_expr is not True:
        bad_df = bad_df.filter(use_expr)
    return bad_df.select(cols).collect()

compute_ats_model(pulse_col, use_expr=pl.lit(True), limit=2000)

Compute the average pulse and arrival-time model for an ATS filter. We use the first limit pulses that pass good_expr and use_expr.

Parameters:
  • pulse_col (str) –

    description

  • use_expr (Expr, default: lit(True) ) –

    description, by default pl.lit(True)

  • limit (int, default: 2000 ) –

    description, by default 2000

Returns:
  • tuple[NDArray, NDArray] –

    description

Source code in mass2/core/channel.py
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
def compute_ats_model(self, pulse_col: str, use_expr: pl.Expr = pl.lit(True), limit: int = 2000) -> tuple[NDArray, NDArray]:
    """Compute the average pulse and arrival-time model for an ATS filter.
    We use the first `limit` pulses that pass `good_expr` and `use_expr`.

    Parameters
    ----------
    pulse_col : str
        _description_
    use_expr : pl.Expr, optional
        _description_, by default pl.lit(True)
    limit : int, optional
        _description_, by default 2000

    Returns
    -------
    tuple[NDArray, NDArray]
        _description_
    """
    df = (
        self.df.lazy()
        .filter(self.good_expr)
        .filter(use_expr)
        .limit(limit)
        .select(pulse_col, "pulse_rms", "promptness", "pretrig_mean")
        .collect()
    )

    # Adjust promptness: subtract a linear trend with pulse_rms
    prms = df["pulse_rms"].to_numpy()
    promptness = df["promptness"].to_numpy()
    poly = np.poly1d(np.polyfit(prms, promptness, 1))
    df = df.with_columns(promptshifted=(promptness - poly(prms)))

    # Rescale promptness quadratically to span approximately [-0.5, +0.5], dropping any pulses with abs(t) > 0.45.
    x, y, z = np.percentile(df["promptshifted"], [10, 50, 90])
    A = np.array([[x * x, x, 1], [y * y, y, 1], [z * z, z, 1]])
    param = np.linalg.solve(A, [-0.4, 0, +0.4])
    ATime = np.poly1d(param)(df["promptshifted"])
    df = df.with_columns(ATime=ATime).filter(np.abs(ATime) < 0.45).drop("promptshifted")

    # Compute mean pulse and dt model as the offset and slope of a linear fit to each pulse sample vs ATime
    pulse = df["pulse"].to_numpy()
    avg_pulse = np.zeros(self.n_samples, dtype=float)
    dt_model = np.zeros(self.n_samples, dtype=float)
    for i in range(self.n_presamples, self.n_samples):
        slope, offset = np.polyfit(df["ATime"], (pulse[:, i] - df["pretrig_mean"]), 1)
        dt_model[i] = -slope
        avg_pulse[i] = offset
    return avg_pulse, dt_model

compute_average_pulse(pulse_col='pulse', use_expr=pl.lit(True), limit=2000)

Compute an average pulse given a use expression.

Parameters:
  • pulse_col (str, default: 'pulse' ) –

    Name of the column in self.df containing raw pulses, by default "pulse"

  • use_expr (Expr, default: lit(True) ) –

    Selection (in addition to self.good_expr) to use, by default pl.lit(True)

  • limit (int, default: 2000 ) –

    Use no more than this many pulses, by default 2000

Returns:
  • NDArray –

    description

Source code in mass2/core/channel.py
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
def compute_average_pulse(self, pulse_col: str = "pulse", use_expr: pl.Expr = pl.lit(True), limit: int = 2000) -> NDArray:
    """Compute an average pulse given a use expression.

    Parameters
    ----------
    pulse_col : str, optional
        Name of the column in self.df containing raw pulses, by default "pulse"
    use_expr : pl.Expr, optional
        Selection (in addition to self.good_expr) to use, by default pl.lit(True)
    limit : int, optional
        Use no more than this many pulses, by default 2000

    Returns
    -------
    NDArray
        _description_
    """
    avg_pulse = (
        self.df.lazy()
        .filter(self.good_expr)
        .filter(use_expr)
        .select(pulse_col)
        .limit(limit)
        .collect()
        .to_series()
        .to_numpy()
        .mean(axis=0)
    )
    avg_pulse -= avg_pulse[: self.n_presamples].mean()
    return avg_pulse

concat_ch(ch)

Concat the given channel's dataframe to the existing dataframe, keeping all other attributes the same. If the new channel ch has a history and/or steps, those will be lost

Source code in mass2/core/channel.py
1382
1383
1384
1385
1386
def concat_ch(self, ch: "Channel") -> "Channel":
    """Concat the given channel's dataframe to the existing dataframe, keeping all other attributes the same.
    If the new channel `ch` has a history and/or steps, those will be lost"""
    ch2 = self.concat_df(ch.df)
    return ch2

concat_df(df)

Concat the given dataframe to the existing dataframe, keeping all other attributes the same. If the new frame df has a history and/or steps, those will be lost

Source code in mass2/core/channel.py
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
def concat_df(self, df: pl.DataFrame) -> "Channel":
    """Concat the given dataframe to the existing dataframe, keeping all other attributes the same.
    If the new frame `df` has a history and/or steps, those will be lost"""
    ch2 = Channel(
        mass2.core.misc.concat_dfs_with_concat_state(self.df, df),
        self.header,
        self.npulses,
        subframediv=self.subframediv,
        noise=self.noise,
        good_expr=self.good_expr,
    )
    # we won't copy over df_history and steps. I don't think you should use this when those are filled in?
    return ch2

correct_pretrig_mean_jumps(uncorrected='pretrig_mean', corrected='ptm_jf', period=4096)

Correct pretrigger mean jumps in the raw pulse data, writing to a new column.

Source code in mass2/core/channel.py
873
874
875
876
877
878
879
880
881
882
883
884
def correct_pretrig_mean_jumps(
    self, uncorrected: str = "pretrig_mean", corrected: str = "ptm_jf", period: int = 4096
) -> "Channel":
    """Correct pretrigger mean jumps in the raw pulse data, writing to a new column."""
    step = mass2.core.recipe.PretrigMeanJumpFixStep(
        inputs=[uncorrected],
        output=[corrected],
        good_expr=self.good_expr,
        use_expr=pl.lit(True),
        period=period,
    )
    return self.with_step(step)

driftcorrect(indicator_col='pretrig_mean', uncorrected_col='5lagy', corrected_col=None, use_expr=pl.lit(True))

Correct for gain drift correlated with the given indicator column.

Source code in mass2/core/channel.py
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
def driftcorrect(
    self,
    indicator_col: str = "pretrig_mean",
    uncorrected_col: str = "5lagy",
    corrected_col: str | None = None,
    use_expr: pl.Expr = pl.lit(True),
) -> "Channel":
    """Correct for gain drift correlated with the given indicator column."""
    # by defining a seperate learn method that takes ch as an argument,
    # we can move all the code for the step outside of Channel
    step = DriftCorrectStep.learn(
        ch=self,
        indicator_col=indicator_col,
        uncorrected_col=uncorrected_col,
        corrected_col=corrected_col,
        use_expr=use_expr,
    )
    return self.with_step(step)

filter5lag(pulse_col='pulse', peak_y_col='5lagy', peak_x_col='5lagx', f_3db=25000.0, use_expr=pl.lit(True), time_constant_s_of_exp_to_be_orthogonal_to=None, fourier=False, longest_autocorr_filter=10000)

Compute a 5-lag optimal filter and apply it.

Parameters:
  • pulse_col (str, default: 'pulse' ) –

    Which column contains raw data, by default "pulse"

  • peak_y_col (str, default: '5lagy' ) –

    Column to contain the optimal filter results, by default "5lagy"

  • peak_x_col (str, default: '5lagx' ) –

    Column to contain the 5-lag filter's estimate of arrival-time/phase, by default "5lagx"

  • f_3db (float, default: 25000.0 ) –

    A low-pass filter 3 dB point to apply to the computed filter, by default 25e3

  • use_expr (Expr, default: lit(True) ) –

    An expression to select pulses for averaging, by default pl.lit(True)

  • time_constant_s_of_exp_to_be_orthogonal_to (float | None, default: None ) –

    Optionally an exponential decay time to make the filter insensitive to, by default None

  • fourier (bool, default: False ) –

    Whether to use filters constructed in the Fourier domain, by default False The alternative, default choice is to construct time-domain filters using the noise autocorrelation

  • longest_autocorr_filter (int, default: 10000 ) –

    Don't compute noise autocorrelation-based filters if the record length exceeds this limit, by default 10000. (Filters based on very long autocorrelations take O(N^2) operations and memory to generate.) If exceeded, filters will be Fourier-space filters.

Returns:
  • Channel –

    This channel with a Filter5LagStep added to the recipe.

Source code in mass2/core/channel.py
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
def filter5lag(
    self,
    pulse_col: str = "pulse",
    peak_y_col: str = "5lagy",
    peak_x_col: str = "5lagx",
    f_3db: float = 25e3,
    use_expr: pl.Expr = pl.lit(True),
    time_constant_s_of_exp_to_be_orthogonal_to: float | None = None,
    fourier: bool = False,
    longest_autocorr_filter: int = 10_000,
) -> "Channel":
    """Compute a 5-lag optimal filter and apply it.

    Parameters
    ----------
    pulse_col : str, optional
        Which column contains raw data, by default "pulse"
    peak_y_col : str, optional
        Column to contain the optimal filter results, by default "5lagy"
    peak_x_col : str, optional
        Column to contain the 5-lag filter's estimate of arrival-time/phase, by default "5lagx"
    f_3db : float, optional
        A low-pass filter 3 dB point to apply to the computed filter, by default 25e3
    use_expr : pl.Expr, optional
        An expression to select pulses for averaging, by default pl.lit(True)
    time_constant_s_of_exp_to_be_orthogonal_to : float | None, optional
        Optionally an exponential decay time to make the filter insensitive to, by default None
    fourier : bool, optional
        Whether to use filters constructed in the Fourier domain, by default False
        The alternative, default choice is to construct time-domain filters using the noise autocorrelation
    longest_autocorr_filter: int, optional
        Don't compute noise autocorrelation-based filters if the record length exceeds this limit, by default 10000.
        (Filters based on very long autocorrelations take O(N^2) operations and memory to generate.)
        If exceeded, filters will be Fourier-space filters.

    Returns
    -------
    Channel
        This channel with a Filter5LagStep added to the recipe.
    """
    assert self.noise
    shortening_5lag = 4  # 5-lag filters shorten the pulse by 2 on each end
    n_samples_5lag = self.n_samples - shortening_5lag
    if not fourier:
        suggest = "use `fourier=True` or increase `longest_autocorr_filter`"
        assert n_samples_5lag <= longest_autocorr_filter, (
            f"Autocorrelation not computed for records exceeding {longest_autocorr_filter}; {suggest}"
        )

    noiseresult = self.noise.spectrum(skip_autocorr_if_length_over=longest_autocorr_filter)
    if not fourier:
        assert noiseresult.autocorr_vec is not None, f"Autocorrelation not computed; {suggest}"
        Nac = len(noiseresult.autocorr_vec)
        assert n_samples_5lag <= Nac, f"Autocorrelation result ({Nac}) is too short for {n_samples_5lag}; {suggest}"

    avg_pulse = self.compute_average_pulse(pulse_col=pulse_col, use_expr=use_expr)
    filter_maker = FilterMaker(
        signal_model=avg_pulse,
        n_pretrigger=self.n_presamples,
        noise_psd=noiseresult.psd,
        noise_autocorr=noiseresult.autocorr_vec,
        sample_time_sec=self.frametime_s,
    )

    if time_constant_s_of_exp_to_be_orthogonal_to is None:
        if fourier:
            filter5lag = filter_maker.compute_fourier(f_3db=f_3db)
        else:
            filter5lag = filter_maker.compute_5lag(f_3db=f_3db)
    else:
        if fourier:
            raise NotImplementedError(
                "Can't make filters orthogonal to an exponential AND in Fourier domain (i.e. without noise autocorrelation)"
            )
        filter5lag = filter_maker.compute_5lag_noexp(f_3db=f_3db, exp_time_seconds=time_constant_s_of_exp_to_be_orthogonal_to)
    step = OptimalFilterStep(
        inputs=["pulse"],
        output=[peak_x_col, peak_y_col],
        good_expr=self.good_expr,
        use_expr=use_expr,
        filter=filter5lag,
        spectrum=noiseresult,
        filter_maker=filter_maker,
        transform_raw=self.transform_raw,
    )
    return self.with_step(step)

filterATS(pulse_col='pulse', peak_y_col='ats_y', peak_x_col='ats_x', f_3db=25000.0, use_expr=pl.lit(True))

Compute an arrival-time-safe (ATS) optimal filter and apply it.

Parameters:
  • pulse_col (str, default: 'pulse' ) –

    Which column contains raw data, by default "pulse"

  • peak_y_col (str, default: 'ats_y' ) –

    Column to contain the optimal filter results, by default "ats_y"

  • peak_x_col (str, default: 'ats_x' ) –

    Column to contain the ATS filter's estimate of arrival-time/phase, by default "ats_x"

  • f_3db (float, default: 25000.0 ) –

    A low-pass filter 3 dB point to apply to the computed filter, by default 25e3

  • use_expr (Expr, default: lit(True) ) –

    An expression to select pulses for averaging, by default pl.lit(True)

Returns:
  • Channel –

    This channel with a Filter5LagStep added to the recipe.

Source code in mass2/core/channel.py
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
def filterATS(
    self,
    pulse_col: str = "pulse",
    peak_y_col: str = "ats_y",
    peak_x_col: str = "ats_x",
    f_3db: float = 25e3,
    use_expr: pl.Expr = pl.lit(True),
) -> "Channel":
    """Compute an arrival-time-safe (ATS) optimal filter and apply it.

    Parameters
    ----------
    pulse_col : str, optional
        Which column contains raw data, by default "pulse"
    peak_y_col : str, optional
        Column to contain the optimal filter results, by default "ats_y"
    peak_x_col : str, optional
        Column to contain the ATS filter's estimate of arrival-time/phase, by default "ats_x"
    f_3db : float, optional
        A low-pass filter 3 dB point to apply to the computed filter, by default 25e3
    use_expr : pl.Expr, optional
        An expression to select pulses for averaging, by default pl.lit(True)

    Returns
    -------
    Channel
        This channel with a Filter5LagStep added to the recipe.
    """
    assert self.noise
    mprms = self.good_series("pulse_rms", use_expr).median()
    use = use_expr.and_(np.abs(pl.col("pulse_rms") / mprms - 1.0) < 0.3)
    limit = 4000
    avg_pulse, dt_model = self.compute_ats_model(pulse_col, use, limit)
    noiseresult = self.noise.spectrum()
    filter_maker = FilterMaker(
        signal_model=avg_pulse,
        dt_model=dt_model,
        n_pretrigger=self.n_presamples,
        noise_psd=noiseresult.psd,
        noise_autocorr=noiseresult.autocorr_vec,
        sample_time_sec=self.frametime_s,
    )
    filter_ats = filter_maker.compute_ats(f_3db=f_3db)
    step = OptimalFilterStep(
        inputs=["pulse"],
        output=[peak_x_col, peak_y_col],
        good_expr=self.good_expr,
        use_expr=use_expr,
        filter=filter_ats,
        spectrum=noiseresult,
        filter_maker=filter_maker,
        transform_raw=self.transform_raw,
    )
    return self.with_step(step)

fit_pulse(index=0, col='pulse', verbose=True)

Fit a single pulse to a 2-exponential-with-tail model, returning the fit result.

Source code in mass2/core/channel.py
1485
1486
1487
1488
1489
1490
1491
1492
1493
def fit_pulse(self, index: int = 0, col: str = "pulse", verbose: bool = True) -> LineModelResult:
    """Fit a single pulse to a 2-exponential-with-tail model, returning the fit result."""
    pulse = self.df[col][index].to_numpy()
    result = mass2.core.pulse_algorithms.fit_pulse_2exp_with_tail(pulse, npre=self.n_presamples, dt=self.frametime_s)
    if verbose:
        print(f"ch={self}")
        print(f"pulse index={index}")
        print(result.fit_report())
    return result

from_ljh(path, noise_path=None, keep_posix_usec=False, transform_raw=None) classmethod

Load a Channel from an LJH file, optionally with a NoiseChannel from a corresponding noise LJH file.

Source code in mass2/core/channel.py
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
@classmethod
def from_ljh(
    cls,
    path: str | Path,
    noise_path: str | Path | None = None,
    keep_posix_usec: bool = False,
    transform_raw: Callable | None = None,
) -> "Channel":
    """Load a Channel from an LJH file, optionally with a NoiseChannel from a corresponding noise LJH file."""
    if not noise_path:
        noise_channel = None
    else:
        noise_channel = NoiseChannel.from_ljh(noise_path)
    ljh = mass2.LJHFile.open(path)
    df, header_df = ljh.to_polars(keep_posix_usec)
    header = ChannelHeader.from_ljh_header_df(header_df)
    channel = cls(
        df, header=header, npulses=ljh.npulses, subframediv=ljh.subframediv, noise=noise_channel, transform_raw=transform_raw
    )
    return channel

from_off(off) classmethod

Load a Channel from an OFF file.

Source code in mass2/core/channel.py
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
@classmethod
def from_off(cls, off: OffFile) -> "Channel":
    """Load a Channel from an OFF file."""
    assert off._mmap is not None
    df = pl.from_numpy(np.asarray(off._mmap))
    df = (
        df.select(
            pl.from_epoch("unixnano", time_unit="ns")
            .dt.cast_time_unit("us")
            .dt.convert_time_zone(_local_timezone_name)
            .alias("timestamp")
        )
        .with_columns(df)
        .select(pl.exclude("unixnano"))
    )
    df_header = pl.DataFrame(off.header)
    df_header = df_header.with_columns(pl.Series("Filename", [off.filename]))
    header = ChannelHeader(
        f"{os.path.split(off.filename)[1]}",
        off.filename,
        off.header["ChannelNumberMatchingName"],
        off.framePeriodSeconds,
        off._mmap["recordPreSamples"][0],
        off._mmap["recordSamples"][0],
        df_header,
    )
    channel = cls(df, header, off.nRecords, subframediv=off.subframediv)
    return channel

get_step(index)

Get the step at the given index, supporting negative indices.

Source code in mass2/core/channel.py
150
151
152
153
154
155
156
def get_step(self, index: int) -> tuple[RecipeStep, int]:
    """Get the step at the given index, supporting negative indices."""
    # normalize the index to a positive index
    if index < 0:
        index = len(self.steps) + index
    step = self.steps[index]
    return step, index

good_df(cols=pl.all(), use_expr=pl.lit(True))

Return a Polars DataFrame of the given columns, filtered by good_expr and use_expr.

Source code in mass2/core/channel.py
1145
1146
1147
1148
1149
1150
def good_df(self, cols: list[str] | pl.Expr = pl.all(), use_expr: pl.Expr = pl.lit(True)) -> pl.DataFrame:
    """Return a Polars DataFrame of the given columns, filtered by good_expr and use_expr."""
    good_df = self.df.lazy().filter(self.good_expr)
    if use_expr is not True:
        good_df = good_df.filter(use_expr)
    return good_df.select(cols).collect()

good_series(col, use_expr=pl.lit(True))

Return a Polars Series of the given column, filtered by good_expr and use_expr.

Source code in mass2/core/channel.py
605
606
607
def good_series(self, col: str, use_expr: pl.Expr = pl.lit(True)) -> pl.Series:
    """Return a Polars Series of the given column, filtered by good_expr and use_expr."""
    return mass2.misc.good_series(self.df, col, self.good_expr, use_expr)

good_serieses(cols, use_expr=pl.lit(True))

Return a list of Polars Series of the given columns, filtered by good_expr and use_expr.

Source code in mass2/core/channel.py
1159
1160
1161
1162
def good_serieses(self, cols: list[str], use_expr: pl.Expr = pl.lit(True)) -> list[pl.Series]:
    """Return a list of Polars Series of the given columns, filtered by good_expr and use_expr."""
    df2 = self.good_df(cols, use_expr)
    return [df2[col] for col in cols]

hist(col, bin_edges, use_good_expr=True, use_expr=pl.lit(True))

Compute a histogram of the given column, optionally filtering by good_expr and use_expr.

Source code in mass2/core/channel.py
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
def hist(
    self,
    col: str,
    bin_edges: ArrayLike,
    use_good_expr: bool = True,
    use_expr: pl.Expr = pl.lit(True),
) -> tuple[NDArray, NDArray]:
    """Compute a histogram of the given column, optionally filtering by good_expr and use_expr."""
    if use_good_expr and self.good_expr is not True:
        # True doesn't implement .and_, haven't found a exper literal equivalent that does
        # so we special case True
        filter_expr = self.good_expr.and_(use_expr)
    else:
        filter_expr = use_expr

    # Group by the specified column and filter using good_expr
    df_small = (self.df.lazy().filter(filter_expr).select(col)).collect()

    values = df_small[col]
    bin_centers, counts = misc.hist_of_series(values, bin_edges)
    return bin_centers, counts

linefit(line, col, use_expr=pl.lit(True), has_linear_background=False, has_tails=False, dlo=50, dhi=50, binsize=0.5, params_update=lmfit.Parameters())

Fit a spectral line to the binned data from the given column, optionally filtering by use_expr.

Source code in mass2/core/channel.py
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
def linefit(  # noqa: PLR0917
    self,
    line: GenericLineModel | SpectralLine | str | float,
    col: str,
    use_expr: pl.Expr = pl.lit(True),
    has_linear_background: bool = False,
    has_tails: bool = False,
    dlo: float = 50,
    dhi: float = 50,
    binsize: float = 0.5,
    params_update: lmfit.Parameters = lmfit.Parameters(),
) -> LineModelResult:
    """Fit a spectral line to the  binned data from the given column, optionally filtering by use_expr."""
    model = mass2.calibration.algorithms.get_model(line, has_linear_background=has_linear_background, has_tails=has_tails)
    pe = model.spect.peak_energy
    _bin_edges = np.arange(pe - dlo, pe + dhi, binsize)
    df_small = self.df.lazy().filter(self.good_expr).filter(use_expr).select(col).collect()
    bin_centers, counts = misc.hist_of_series(df_small[col], _bin_edges)
    params = model.guess(counts, bin_centers=bin_centers, dph_de=1)
    params["dph_de"].set(1.0, vary=False)
    print(f"before update {params=}")
    params = params.update(params_update)
    print(f"after update {params=}")
    result = model.fit(counts, params, bin_centers=bin_centers, minimum_bins_per_fwhm=3)
    result.set_label_hints(
        binsize=bin_centers[1] - bin_centers[0],
        ds_shortname=self.header.description,
        unit_str="eV",
        attr_str=col,
        states_hint=f"{use_expr=}",
        cut_hint="",
    )
    return result

mo_stepplots()

Marimo UI element to choose and display step plots, with a dropdown to choose channel number.

Source code in mass2/core/channel.py
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
def mo_stepplots(self) -> mo.ui.dropdown:
    """Marimo UI element to choose and display step plots, with a dropdown to choose channel number."""
    desc_ind = {step.description: i for i, step in enumerate(self.steps)}
    first_non_summarize_step = self.steps[0]
    for step in self.steps:
        if isinstance(step, SummarizeStep):
            continue
        first_non_summarize_step = step
        break
    mo_ui = mo.ui.dropdown(
        desc_ind,
        value=first_non_summarize_step.description,
        label=f"choose step for ch {self.ch_num}",
    )

    def show() -> mo.Html:
        """Show the selected step plot."""
        return self._mo_stepplots_explicit(mo_ui)

    def step_ind() -> Any:
        """Get the selected step index from the dropdown item, if any."""
        return mo_ui.value

    mo_ui.show = show
    mo_ui.step_ind = step_ind
    return mo_ui

multifit_mass_cal(multifit, previous_cal_step_index, calibrated_col, use_expr=pl.lit(True))

Fit multiple spectral lines, to create a Mass1-style gain calibration.

Source code in mass2/core/channel.py
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
def multifit_mass_cal(
    self,
    multifit: MultiFit,
    previous_cal_step_index: int,
    calibrated_col: str,
    use_expr: pl.Expr = pl.lit(True),
) -> "Channel":
    """Fit multiple spectral lines, to create a Mass1-style gain calibration."""
    step = MultiFitMassCalibrationStep.learn(
        self,
        multifit_spec=multifit,
        previous_cal_step_index=previous_cal_step_index,
        calibrated_col=calibrated_col,
        use_expr=use_expr,
    )
    return self.with_step(step)

multifit_quadratic_gain_cal(multifit, previous_cal_step_index, calibrated_col, use_expr=pl.lit(True))

Fit multiple spectral lines, to create a quadratic gain calibration.

Source code in mass2/core/channel.py
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
def multifit_quadratic_gain_cal(
    self,
    multifit: MultiFit,
    previous_cal_step_index: int,
    calibrated_col: str,
    use_expr: pl.Expr = pl.lit(True),
) -> "Channel":
    """Fit multiple spectral lines, to create a quadratic gain calibration."""
    step = MultiFitQuadraticGainStep.learn(
        self,
        multifit_spec=multifit,
        previous_cal_step_index=previous_cal_step_index,
        calibrated_col=calibrated_col,
        use_expr=use_expr,
    )
    return self.with_step(step)

phase_correct_mass_specific_lines(indicator_col, uncorrected_col, line_names, previous_cal_step_index, corrected_col=None, use_expr=pl.lit(True))

Apply phase correction to the given uncorrected column, where specific lines are used to judge the correction.

Source code in mass2/core/channel.py
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
def phase_correct_mass_specific_lines(
    self,
    indicator_col: str,
    uncorrected_col: str,
    line_names: Iterable[str | float],
    previous_cal_step_index: int,
    corrected_col: str | None = None,
    use_expr: pl.Expr = pl.lit(True),
) -> "Channel":
    """Apply phase correction to the given uncorrected column, where specific lines are used to judge the correction."""
    if corrected_col is None:
        corrected_col = uncorrected_col + "_pc"
    step = mass2.core.phase_correct_steps.phase_correct_mass_specific_lines(
        self,
        indicator_col,
        uncorrected_col,
        corrected_col,
        previous_cal_step_index,
        line_names,
        use_expr,
    )
    return self.with_step(step)

plot_hist(col, bin_edges, axis=None, use_good_expr=True, use_expr=pl.lit(True))

Compute and plot a histogram of the given column, optionally filtering by good_expr and use_expr.

Source code in mass2/core/channel.py
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
def plot_hist(
    self,
    col: str,
    bin_edges: ArrayLike,
    axis: plt.Axes | None = None,
    use_good_expr: bool = True,
    use_expr: pl.Expr = pl.lit(True),
) -> tuple[NDArray, NDArray]:
    """Compute and plot a histogram of the given column, optionally filtering by good_expr and use_expr."""
    if axis is None:
        _, ax = plt.subplots()  # Create a new figure if no axis is provided
    else:
        ax = axis

    bin_centers, counts = self.hist(col, bin_edges=bin_edges, use_good_expr=use_good_expr, use_expr=use_expr)
    _, step_size = misc.midpoints_and_step_size(bin_edges)
    plt.step(bin_centers, counts, where="mid")

    # Customize the plot
    ax.set_xlabel(str(col))
    ax.set_ylabel(f"Counts per {step_size:.02f} unit bin")
    ax.set_title(f"Histogram of {col} for {self.shortname}")

    plt.tight_layout()
    return bin_centers, counts

plot_hists(col, bin_edges, group_by_col, axis=None, use_good_expr=True, use_expr=pl.lit(True), skip_none=True)

Plots histograms for the given column, grouped by the specified column.

Parameters: - col (str): The column name to plot. - bin_edges (array-like): The edges of the bins for the histogram. - group_by_col (str): The column name to group by. This is required. - axis (matplotlib.Axes, optional): The axis to plot on. If None, a new figure is created.

Source code in mass2/core/channel.py
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
def plot_hists(
    self,
    col: str,
    bin_edges: ArrayLike,
    group_by_col: str,
    axis: plt.Axes | None = None,
    use_good_expr: bool = True,
    use_expr: pl.Expr = pl.lit(True),
    skip_none: bool = True,
) -> tuple[NDArray, dict[str, NDArray]]:
    """
    Plots histograms for the given column, grouped by the specified column.

    Parameters:
    - col (str): The column name to plot.
    - bin_edges (array-like): The edges of the bins for the histogram.
    - group_by_col (str): The column name to group by. This is required.
    - axis (matplotlib.Axes, optional): The axis to plot on. If None, a new figure is created.
    """
    if axis is None:
        _, ax = plt.subplots()  # Create a new figure if no axis is provided
    else:
        ax = axis

    if use_good_expr and self.good_expr is not True:
        # True doesn't implement .and_, haven't found a exper literal equivalent that does
        # so we special case True
        filter_expr = self.good_expr.and_(use_expr)
    else:
        filter_expr = use_expr

    # Group by the specified column and filter using good_expr
    df_small = (self.df.lazy().filter(filter_expr).select(col, group_by_col)).collect().sort(group_by_col, descending=False)

    # Plot a histogram for each group
    counts_dict: dict[str, NDArray] = {}
    for (group_name,), group_data in df_small.group_by(group_by_col, maintain_order=True):
        if group_name is None and skip_none:
            continue
        # Get the data for the column to plot
        values = group_data[col]
        _, step_size = misc.midpoints_and_step_size(bin_edges)
        bin_centers, counts = misc.hist_of_series(values, bin_edges)
        group_name_str = str(group_name)
        counts_dict[group_name_str] = counts
        plt.step(bin_centers, counts, where="mid", label=group_name_str)
        # Plot the histogram for the current group
        # if group_name == "EBIT":
        #     ax.hist(values, bins=bin_edges, alpha=0.9, color="k", label=group_name_str)
        # else:
        #     ax.hist(values, bins=bin_edges, alpha=0.5, label=group_name_str)
        # bin_centers, counts = misc.hist_of_series(values, bin_edges)
        # plt.plot(bin_centers, counts, label=group_name)
    # Customize the plot
    ax.set_xlabel(str(col))
    if len(counts_dict) > 0:
        ax.set_ylabel(f"Counts per {step_size:.02f} unit bin")
    ax.set_title(f"Histogram of {col} grouped by {group_by_col}")

    # Add a legend to label the groups
    ax.legend(title=group_by_col)

    plt.tight_layout()
    return bin_centers, counts_dict

plot_pulses(length=30, skip=0, random=False, record_numbers=None, subtract_baseline=False, derivative=False, summarize=True, summary_columns=None, pulse_field='pulse', use_expr=pl.lit(True), use_good_expr=True, axis=None, cm='viridis_r')

Plot some example pulses

Parameters:
  • length (int, default: 30 ) –

    How many pulses to plot, by default 30

  • skip (int, default: 0 ) –

    Start plotting at this pulse record number, by default 0

  • random (bool, default: False ) –

    Whether to plot length randomly selected records, by default False If True, skip is ignored.

  • record_numbers (Collection[Any] | Series | None, default: None ) –

    Plot the specified records, numbered from 0 for the first in the dataframe, by default None. If given, length, skip, and random are ignored.

  • subtract_baseline (bool, default: False ) –

    Whether to subtract the pretrigger mean before plotting each record, by default False

  • derivative (bool, default: False ) –

    Whether to plot the "derivative" of a pulse (actually the successive differences), by default False

  • summarize (bool, default: True ) –

    Whether to summarize key facts about each plotted pulse to the terminal, by default True

  • summary_columns (Collection[Any] | None, default: None ) –

    Which specific data columns to report in the summary to the terminal, by default None If None, then a pre-selected set are reported.

  • pulse_field (str, default: 'pulse' ) –

    The column name in the polars dataframe where plottable pulses, by default "pulse"

  • use_expr (Expr, default: lit(True) ) –

    An expression to select plottable points, by default pl.lit(True)

  • use_good_expr (bool, default: True ) –

    Whether to apply the object's good_expr before plotting, by default True If True, then the existing good_expr will be applied AND the use_expr will be, too.

  • axis (Axes | None, default: None ) –

    Axes to plot on, by default None If None, create a new figure.

  • cm (str | Colormap, default: 'viridis_r' ) –

    The colormap to use for distinguishing pulses, by default "viridis_r"

Source code in mass2/core/channel.py
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
def plot_pulses(  # noqa: PLR0917
    self,
    length: int = 30,
    skip: int = 0,
    random: bool = False,
    record_numbers: Collection[Any] | pl.Series | None = None,
    subtract_baseline: bool = False,
    derivative: bool = False,
    summarize: bool = True,
    summary_columns: Collection[Any] | None = None,
    pulse_field: str = "pulse",
    use_expr: pl.Expr = pl.lit(True),
    use_good_expr: bool = True,
    axis: plt.Axes | None = None,
    cm: str | Colormap = "viridis_r",
) -> None:
    """Plot some example pulses

    Parameters
    ----------
    length : int, optional
        How many pulses to plot, by default 30
    skip : int, optional
        Start plotting at this pulse record number, by default 0
    random : bool, optional
        Whether to plot `length` randomly selected records, by default False
        If True, `skip` is ignored.
    record_numbers : Collection[Any] | pl.Series | None, optional
        Plot the specified records, numbered from 0 for the first in the dataframe, by default None.
        If given, `length`, `skip`, and `random` are ignored.
    subtract_baseline : bool, optional
        Whether to subtract the pretrigger mean before plotting each record, by default False
    derivative : bool, optional
        Whether to plot the "derivative" of a pulse (actually the successive differences), by default False
    summarize : bool, optional
        Whether to summarize key facts about each plotted pulse to the terminal, by default True
    summary_columns : Collection[Any] | None, optional
        Which specific data columns to report in the summary to the terminal, by default None
        If None, then a pre-selected set are reported.
    pulse_field : str
        The column name in the polars dataframe where plottable pulses, by default "pulse"
    use_expr : pl.Expr, optional
        An expression to select plottable points, by default pl.lit(True)
    use_good_expr : bool, optional
        Whether to apply the object's `good_expr` before plotting, by default True
        If True, then the existing `good_expr` will be applied AND the `use_expr` will be, too.
    axis : plt.Axes | None, optional
        Axes to plot on, by default None
        If None, create a new figure.
    cm : str | Colormap, optional
        The colormap to use for distinguishing pulses, by default "viridis_r"
    """
    pulse_type = self.df[pulse_field].dtype
    assert pulse_type in (pl.Array, pl.List), (  # noqa: PLR6201
        f"Cannot plot column '{pulse_field}' as pulse records: not a pl.Array or pl.List type"
    )

    if axis is None:
        _, axis = plt.subplots()  # Create a new figure if no axis is provided

    if use_good_expr and self.good_expr is not True:
        # True doesn't implement .and_, haven't found a exper literal equivalent that does
        # so we special case True
        filter_expr = self.good_expr.and_(use_expr)
    else:
        filter_expr = use_expr

    if isinstance(cm, str):
        cmap = plt.get_cmap(cm)
    else:
        cmap = cm

    lf = self.df.lazy().with_row_index("Record #")
    if record_numbers is None:
        if random:
            title = f"{length} random pulses"
            lf = lf.filter(filter_expr).collect().sample(length).lazy().sort("Record #")
        else:
            lf = lf.filter(filter_expr).slice(skip, length)
            title = f"{length} selected pulses"
    else:
        title = f"Pulses #{record_numbers}"
        lf = lf.filter(pl.col("Record #").is_in(record_numbers))
    plt.title(f"{title} from Chan {self.ch_num}")
    if summarize:
        # Preferred data info to print to terminal.
        if summary_columns is None:
            summary_columns = [
                "Record #",
                "pretrig_mean",
                "pulse_rms",
                "pulse_average",
                "rise_time",
                "peak_value",
                "energy_5lagy",
                "state_label",
            ]
        # Remove preferred column if it doesn't exist
        columns = [c for c in summary_columns if c in lf.collect_schema().names()]
        summary_df = lf.select(columns).collect()
        summary_df.show(limit=None)

    frametime_ms = self.frametime_s * 1e3
    sample_x = np.arange(self.header.n_samples) - self.header.n_presamples

    def samples2ms(s: ArrayLike) -> ArrayLike:
        return np.asarray(s) * frametime_ms

    def ms2samples(ms: ArrayLike) -> ArrayLike:
        return np.asarray(ms) / frametime_ms

    upper_axis = axis.secondary_xaxis("top", functions=(samples2ms, ms2samples))
    upper_axis.set_xlabel("Time after trigger (ms)")
    plt.xlabel("Samples after trigger")

    plot_columns = (pulse_field, "pretrig_mean")
    df = lf.select(plot_columns).collect()
    N = len(df)
    pulses = df[pulse_field]
    ptmean = df["pretrig_mean"]
    for i in range(N):
        pulse = pulses[i].to_numpy()
        color = cmap(i / N)
        if subtract_baseline:
            pulse = pulse - ptmean[i]  # noqa: PLR6104
        if derivative:
            pulse = np.hstack((0, np.diff(pulse)))
        axis.plot(sample_x, pulse, color=color)

plot_scatter(x_col, y_col, cont_color_col=None, color_col=None, use_expr=pl.lit(True), use_good_expr=True, skip_none=True, axis=None, annotate=False, max_points=None, extended_title=True)

Generate a scatter plot of y_col vs x_col, optionally colored by color_col.

Parameters:
  • x_col (str) –

    Name of the column to put on the x axis

  • y_col (str) –

    Name of the column to put on the y axis

  • cont_color_col (str | None, default: None ) –

    Name of the column to use for continuously coloring points, by default None

  • color_col (str | None, default: None ) –

    Name of the column to discretely color points by (generally a low cardinality category like "state_label"), by default None At least of cont_color_col and color_col must be None

  • use_expr (Expr, default: lit(True) ) –

    An expression to select plottable points, by default pl.lit(True)

  • use_good_expr (bool, default: True ) –

    Whether to apply the object's good_expr before plotting, by default True

  • skip_none (bool, default: True ) –

    Whether to skip color categories with no name, by default True

  • axis (Axes | None, default: None ) –

    Axes to plot on, by default None

  • annotate (bool, default: False ) –

    Whether to annotate points that are hovered over or clicked on by the mouse, by default True

  • max_points (int | None, default: None ) –

    Maximum number of points allowed in scatter plot (or if None, no maximum). To ensure representative from all portions of the data, only 1 of each consecutive N points will be plotted, with N chosen to be consistent with the max_points requirement.

  • extended_title (bool, default: True ) –

    Whether to represent the use and good expressions as lines 2-3 in the plot title,

Source code in mass2/core/channel.py
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
def plot_scatter(  # noqa: PLR0917
    self,
    x_col: str,
    y_col: str,
    cont_color_col: str | None = None,
    color_col: str | None = None,
    use_expr: pl.Expr = pl.lit(True),
    use_good_expr: bool = True,
    skip_none: bool = True,
    axis: plt.Axes | None = None,
    annotate: bool = False,
    max_points: int | None = None,
    extended_title: bool = True,
) -> None:
    """Generate a scatter plot of `y_col` vs `x_col`, optionally colored by `color_col`.

    Parameters
    ----------
    x_col : str
        Name of the column to put on the x axis
    y_col : str
        Name of the column to put on the y axis
    cont_color_col : str | None, optional
        Name of the column to use for continuously coloring points, by default None
    color_col : str | None, optional
        Name of the column to discretely color points by (generally a low cardinality
        category like "state_label"), by default None
        At least of `cont_color_col` and `color_col` must be None
    use_expr : pl.Expr, optional
        An expression to select plottable points, by default pl.lit(True)
    use_good_expr : bool, optional
        Whether to apply the object's `good_expr` before plotting, by default True
    skip_none : bool, optional
        Whether to skip color categories with no name, by default True
    axis : plt.Axes | None, optional
        Axes to plot on, by default None
    annotate : bool, optional
        Whether to annotate points that are hovered over or clicked on by the mouse, by default True
    max_points: int, optional
        Maximum number of points allowed in scatter plot (or if None, no maximum). To ensure representative
        from all portions of the data, only 1 of each consecutive N points will be plotted, with N chosen to
        be consistent with the `max_points` requirement.
    extended_title : bool, optional
        Whether to represent the use and good expressions as lines 2-3 in the plot title,
    """
    # You can't have both kinds of colors: you either use `color_col` for categorical coloring,
    # or cont_color_col for continuous coloring, or neither.
    assert color_col is None or cont_color_col is None

    if axis is None:
        fig = plt.figure()
        axis = plt.gca()
    plt.sca(axis)  # set current axis so I can use plt api
    fig = plt.gcf()
    filter_expr = use_expr
    if use_good_expr:
        filter_expr = self.good_expr.and_(use_expr)
    index_name = "pulse_idx"
    # Caused errors in Polars 1.35 if this was "index". See issue #85.

    # Plot only 1 data value out of every n, if max_points argument is an integer.
    # Compute n from the ratio of all points to max_points.
    plot_every_nth = 1
    if max_points is not None:
        if max_points < self.npulses:
            plot_every_nth = 1 + (self.npulses - 1) // max_points

    columns_to_keep = [x_col, y_col, index_name]
    if color_col is not None:
        columns_to_keep.append(color_col)
    if cont_color_col is not None:
        columns_to_keep.append(cont_color_col)
    df_small = (
        self.df.lazy()
        .with_row_index(name=index_name)
        .filter(filter_expr)
        .select(*columns_to_keep)
        .gather_every(plot_every_nth)
        .collect()
    )
    lines_pnums: list[tuple[plt.Line2D, pl.Series]] = []

    if cont_color_col is not None:
        line = plt.scatter(
            df_small.select(x_col).to_series(),
            df_small.select(y_col).to_series(),
            s=3,
            c=df_small.select(cont_color_col).to_series(),
        )

    else:
        for (name,), data in df_small.group_by(color_col, maintain_order=True):
            if name is None and skip_none and color_col is not None:
                continue
            (line,) = plt.plot(
                data.select(x_col).to_series(),
                data.select(y_col).to_series(),
                ".",
                label=name,
            )
            lines_pnums.append((line, data.select(index_name).to_series()))

    if annotate:
        annotation = axis.annotate(
            "",
            xy=(0, 0),
            xytext=(-20, 20),
            textcoords="offset points",
            bbox=dict(boxstyle="round", fc="w"),
            arrowprops=dict(arrowstyle="->"),
        )
        annotation.set_visible(False)

        def update_note(points: list) -> None:
            """Generate a matplotlib hovering note about the data point index

            Parameters
            ----------
            points : list
                List of the plotted data points that are hovered over
            """
            # TODO: this only works if the first line object has the pulse we want.
            line, pnum = lines_pnums[0]
            x, y = line.get_data()
            annotation.xy = (x[points[0]], y[points[0]])
            text2 = " ".join([str(pnum[int(n)]) for n in points])
            if len(points) > 1:
                text = f"Pulses [{text2}]"
            else:
                text = f"Pulse {text2}"
            annotation.set_text(text)
            annotation.get_bbox_patch().set_alpha(0.75)

        def hover(event: MouseEvent) -> None:
            """Callback to be used when mouse hovers near a plotted point

            Parameters
            ----------
            event : MouseEvent
                The mouse-related event; contains location information
            """
            vis = annotation.get_visible()
            if event.inaxes != axis:
                return
            cont, ind = line.contains(event)
            if cont:
                update_note(ind["ind"])
                annotation.set_visible(True)
                fig.canvas.draw_idle()
            elif vis:
                annotation.set_visible(False)
                fig.canvas.draw_idle()

        def click(event: MouseEvent) -> None:
            """Callback to be used when mouse clicks near a plotted point

            Parameters
            ----------
            event : MouseEvent
                The mouse-related event; contains location information
            """
            if event.inaxes != axis:
                return
            cont, ind = line.contains(event)
            if cont:
                pnum = lines_pnums[0][1]
                rownum = pnum[int(ind["ind"][0])]
                print(f"This is pulse# {rownum}")
                print(self.df.drop("pulse").row(rownum, named=True))

        fig.canvas.mpl_connect("motion_notify_event", hover)
        fig.canvas.mpl_connect("button_press_event", click)

    plt.xlabel(str(x_col))
    plt.ylabel(str(y_col))

    if extended_title:
        title_parts = [self.header.description]

        def truncated_str(s: str, max: int = 50) -> str:
            if len(s) <= 50:
                return s
            return s[:50] + "..."

        if use_expr is not pl.lit(True):
            usestr = truncated_str("Use: " + str(use_expr))
            title_parts.append(usestr)
        title_parts.append("Good: " + truncated_str(str(self.good_expr)))
        title_str = "\n".join(title_parts)
    else:
        title_str = self.header.description
    plt.title(title_str)
    if color_col is not None:
        plt.legend(title=color_col)
    plt.tight_layout()

plot_summaries(use_expr_in=None, downsample=None, log=False)

Plot a summary of the data set, including time series and histograms of key pulse properties.

Parameters:
  • use_expr_in (Expr | None, default: None ) –

    A polars expression to determine valid pulses, by default None. If None, use self.good_expr

  • downsample (int | None, default: None ) –

    Plot only every one of downsample pulses in the scatter plots, by default None. If None, choose the smallest value so that no more than 10000 points appear

  • log (bool, default: False ) –

    Whether to make the histograms have a logarithmic y-scale, by default False.

Source code in mass2/core/channel.py
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
def plot_summaries(self, use_expr_in: pl.Expr | None = None, downsample: int | None = None, log: bool = False) -> None:
    """Plot a summary of the data set, including time series and histograms of key pulse properties.

    Parameters
    ----------
    use_expr_in: pl.Expr | None, optional
        A polars expression to determine valid pulses, by default None. If None, use `self.good_expr`
    downsample: int | None, optional
        Plot only every one of `downsample` pulses in the scatter plots, by default None.
        If None, choose the smallest value so that no more than 10000 points appear
    log: bool, optional
        Whether to make the histograms have a logarithmic y-scale, by default False.
    """
    plt.figure()
    tpi_microsec = (self.typical_peak_ind() - self.n_presamples) * (1e6 * self.frametime_s)
    plottables = (
        ("pulse_rms", "Pulse RMS", "#dd00ff", None),
        ("pulse_average", "Pulse Avg", "purple", None),
        ("peak_value", "Peak value", "blue", None),
        ("pretrig_rms", "Pretrig RMS", "green", [0, 4000]),
        ("pretrig_mean", "Pretrig Mean", "#00ff26", None),
        ("postpeak_deriv", "Max PostPk deriv", "gold", [0, 200]),
        ("rise_time_µs", "Rise time (µs)", "orange", [-0.3 * tpi_microsec, 2 * tpi_microsec]),
        ("peak_time_µs", "Peak time (µs)", "red", [-0.3 * tpi_microsec, 2 * tpi_microsec]),
    )

    use_expr = self.good_expr if use_expr_in is None else use_expr_in

    if downsample is None:
        downsample = self.npulses // 10000
    downsample = max(downsample, 1)

    df = self.df.lazy().gather_every(downsample)
    df = df.with_columns(((pl.col("peak_index") - self.n_presamples) * (1e6 * self.frametime_s)).alias("peak_time_µs"))
    df = df.with_columns((pl.col("rise_time") * 1e6).alias("rise_time_µs"))
    existing_columns = df.collect_schema().names()
    preserve = [p[0] for p in plottables if p[0] in existing_columns]
    preserve.append("timestamp")
    df2 = df.filter(use_expr).select(preserve).collect()

    # Plot timeseries relative to 0 = the last 00 UT during or before the run.
    timestamp = df2["timestamp"].to_numpy()
    last_midnight = timestamp[-1].astype("datetime64[D]")
    hour_rel = (timestamp - last_midnight).astype(float) / 3600e6

    for i, (column_name, label, color, limits) in enumerate(plottables):
        if column_name not in df2:
            continue
        y = df2[column_name].to_numpy()

        # Time series scatter plots (left-hand panels)
        plt.subplot(len(plottables), 2, 1 + i * 2)
        plt.ylabel(label)
        plt.plot(hour_rel, y, ".", ms=1, color=color)
        if i == len(plottables) - 1:
            plt.xlabel("Time since last UT midnight (hours)")

        # Histogram (right-hand panels)
        plt.subplot(len(plottables), 2, 2 + i * 2)
        contents, _, _ = plt.hist(y, 200, range=limits, log=log, histtype="stepfilled", fc=color, alpha=0.5)
        if log:
            plt.ylim(ymin=contents.min())
    print(f"Plotting {len(y)} out of {self.npulses} data points")

rough_cal(line_names, uncalibrated_col='filtValue', calibrated_col=None, use_expr=pl.lit(True), max_fractional_energy_error_3rd_assignment=0.1, min_gain_fraction_at_ph_30k=0.25, fwhm_pulse_height_units=75, n_extra_peaks=10, acceptable_rms_residual_e=10)

Learn a rough calibration by trying to assign the 3 brightest peaks, then fitting a line to those and looking for other peaks that fit that line.

Source code in mass2/core/channel.py
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
def rough_cal(  # noqa: PLR0917
    self,
    line_names: list[str | float],
    uncalibrated_col: str = "filtValue",
    calibrated_col: str | None = None,
    use_expr: pl.Expr = pl.lit(True),
    max_fractional_energy_error_3rd_assignment: float = 0.1,
    min_gain_fraction_at_ph_30k: float = 0.25,
    fwhm_pulse_height_units: float = 75,
    n_extra_peaks: int = 10,
    acceptable_rms_residual_e: float = 10,
) -> "Channel":
    """Learn a rough calibration by trying to assign the 3 brightest peaks,
    then fitting a line to those and looking for other peaks that fit that line.
    """
    step = mass2.core.RoughCalibrationStep.learn_3peak(
        self,
        line_names,
        uncalibrated_col,
        calibrated_col,
        use_expr,
        max_fractional_energy_error_3rd_assignment,
        min_gain_fraction_at_ph_30k,
        fwhm_pulse_height_units,
        n_extra_peaks,
        acceptable_rms_residual_e,
    )
    return self.with_step(step)

rough_cal_combinatoric(line_names, uncalibrated_col, calibrated_col, ph_smoothing_fwhm, n_extra=3, use_expr=pl.lit(True))

Learn a rough calibration by trying all combinatorically possible peak assignments.

Source code in mass2/core/channel.py
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
def rough_cal_combinatoric(
    self,
    line_names: list[str],
    uncalibrated_col: str,
    calibrated_col: str,
    ph_smoothing_fwhm: float,
    n_extra: int = 3,
    use_expr: pl.Expr = pl.lit(True),
) -> "Channel":
    """Learn a rough calibration by trying all combinatorically possible peak assignments."""
    step = mass2.core.RoughCalibrationStep.learn_combinatoric(
        self,
        line_names,
        uncalibrated_col=uncalibrated_col,
        calibrated_col=calibrated_col,
        ph_smoothing_fwhm=ph_smoothing_fwhm,
        n_extra=n_extra,
        use_expr=use_expr,
    )
    return self.with_step(step)

rough_cal_combinatoric_height_info(line_names, line_heights_allowed, uncalibrated_col, calibrated_col, ph_smoothing_fwhm, n_extra=3, use_expr=pl.lit(True))

Learn a rough calibration by trying all combinatorically possible peak assignments, using known relative peak heights to limit the possibilities.

Source code in mass2/core/channel.py
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
def rough_cal_combinatoric_height_info(
    self,
    line_names: list[str],
    line_heights_allowed: list[list[int]],
    uncalibrated_col: str,
    calibrated_col: str,
    ph_smoothing_fwhm: float,
    n_extra: int = 3,
    use_expr: pl.Expr = pl.lit(True),
) -> "Channel":
    """Learn a rough calibration by trying all combinatorically possible peak assignments,
    using known relative peak heights to limit the possibilities."""
    step = mass2.core.RoughCalibrationStep.learn_combinatoric_height_info(
        self,
        line_names,
        line_heights_allowed,
        uncalibrated_col=uncalibrated_col,
        calibrated_col=calibrated_col,
        ph_smoothing_fwhm=ph_smoothing_fwhm,
        n_extra=n_extra,
        use_expr=use_expr,
    )
    return self.with_step(step)

save_recipes(filename)

Save the recipe steps to a pickle file, keyed by channel number.

Source code in mass2/core/channel.py
1415
1416
1417
1418
1419
def save_recipes(self, filename: str) -> dict[int, Recipe]:
    """Save the recipe steps to a pickle file, keyed by channel number."""
    steps = {self.ch_num: self.steps}
    misc.pickle_object(steps, filename)
    return steps

step_plot(step_ind, **kwargs)

Make a debug plot for the given step index, supporting negative indices.

Source code in mass2/core/channel.py
158
159
160
161
162
163
164
165
def step_plot(self, step_ind: int, **kwargs: Any) -> plt.Axes:
    """Make a debug plot for the given step index, supporting negative indices."""
    step, step_ind = self.get_step(step_ind)
    if step_ind + 1 == len(self.df_history):
        df_after = self.df
    else:
        df_after = self.df_history[step_ind + 1]
    return step.dbg_plot(df_after, **kwargs)

step_summary()

Return a list of (step type name, elapsed time in seconds) for each step in the recipe.

Source code in mass2/core/channel.py
1217
1218
1219
def step_summary(self) -> list[tuple[str, float]]:
    """Return a list of (step type name, elapsed time in seconds) for each step in the recipe."""
    return [(type(a).__name__, b) for (a, b) in zip(self.steps, self.steps_elapsed_s)]

summarize_pulses(col='pulse', pretrigger_ignore_samples=0, peak_index=None)

Summarize the pulses, adding columns for pulse height, pretrigger mean, etc.

Source code in mass2/core/channel.py
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
def summarize_pulses(self, col: str = "pulse", pretrigger_ignore_samples: int = 0, peak_index: int | None = None) -> "Channel":
    """Summarize the pulses, adding columns for pulse height, pretrigger mean, etc."""
    if peak_index is None:
        peak_index = self.typical_peak_ind(col)
    out_names = mass2.core.pulse_algorithms.result_dtype.names
    # mypy (incorrectly) thinks `out_names` might be None, and `list(None)` is forbidden. Assertion makes it happy again.
    assert out_names is not None
    outputs = list(out_names)
    step = SummarizeStep(
        inputs=[col],
        output=outputs,
        good_expr=self.good_expr,
        use_expr=pl.lit(True),
        frametime_s=self.frametime_s,
        peak_index=peak_index,
        pulse_col=col,
        pretrigger_ignore_samples=pretrigger_ignore_samples,
        n_presamples=self.n_presamples,
        transform_raw=self.transform_raw,
    )
    return self.with_step(step)

typical_peak_ind(col='pulse') cached

Return the typical peak index of the given column, using the median peak index for the first 100 pulses.

Source code in mass2/core/channel.py
843
844
845
846
847
848
849
@functools.cache
def typical_peak_ind(self, col: str = "pulse") -> int:
    """Return the typical peak index of the given column, using the median peak index for the first 100 pulses."""
    raw = self.df.limit(100)[col].to_numpy()
    if self.transform_raw is not None:
        raw = self.transform_raw(raw)
    return int(np.median(raw.argmax(axis=1)))

with_categorize_step(category_condition_dict, output_col='category')

Add a recipe step that categorizes pulses based on the given conditions.

Source code in mass2/core/channel.py
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
def with_categorize_step(self, category_condition_dict: dict[str, pl.Expr], output_col: str = "category") -> "Channel":
    """Add a recipe step that categorizes pulses based on the given conditions."""
    # ensure the first condition is True, to be used as a fallback
    first_expr = next(iter(category_condition_dict.values()))
    if not first_expr.meta.eq(pl.lit(True)):
        category_condition_dict = {"fallback": pl.lit(True), **category_condition_dict}
    extract = mass2.misc.extract_column_names_from_polars_expr
    inputs: set[str] = set()
    for expr in category_condition_dict.values():
        inputs.update(extract(expr))
    step = mass2.core.recipe.CategorizeStep(
        inputs=list(inputs),
        output=[output_col],
        good_expr=self.good_expr,
        use_expr=pl.lit(True),
        category_condition_dict=category_condition_dict,
    )
    return self.with_step(step)

with_column_map_step(input_col, output_col, f)

f should take a numpy array and return a numpy array with the same number of elements

Source code in mass2/core/channel.py
784
785
786
787
def with_column_map_step(self, input_col: str, output_col: str, f: Callable) -> "Channel":
    """f should take a numpy array and return a numpy array with the same number of elements"""
    step = mass2.core.recipe.ColumnAsNumpyMapStep([input_col], [output_col], good_expr=self.good_expr, use_expr=pl.lit(True), f=f)
    return self.with_step(step)

with_columns(df2)

Append columns from df2 to the existing dataframe, keeping all other attributes the same.

Source code in mass2/core/channel.py
1329
1330
1331
1332
def with_columns(self, df2: pl.DataFrame) -> "Channel":
    """Append columns from df2 to the existing dataframe, keeping all other attributes the same."""
    df3 = self.df.with_columns(df2)
    return self.with_replacement_df(df3)

with_experiment_state_df(df_es, force_timestamp_monotonic=False)

Add experiment states from an existing dataframe

Source code in mass2/core/channel.py
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
def with_experiment_state_df(self, df_es: pl.DataFrame, force_timestamp_monotonic: bool = False) -> "Channel":
    """Add experiment states from an existing dataframe"""

    # Make sure experiment state dataframe and self.df agree on time zones. If not, convert the former.
    times = df_es["timestamp"]
    expt_state_time_type = times.dtype
    self_time_type = self.df["timestamp"].dtype
    assert isinstance(expt_state_time_type, Datetime)
    assert isinstance(self_time_type, Datetime)
    desired_time_zone = self_time_type.time_zone
    if desired_time_zone is None:
        desired_time_zone = _local_timezone_name
    if expt_state_time_type.time_zone != desired_time_zone:
        times = times.dt.convert_time_zone(desired_time_zone)
        df_es = df_es.with_columns(timestamp=times)

    if not self.df["timestamp"].is_sorted():
        df = self.df.select(pl.col("timestamp").cum_max().alias("timestamp")).with_columns(self.df.select(pl.exclude("timestamp")))
        # print("WARNING: in with_experiment_state_df, timestamp is not monotonic, forcing it to be")
        # print("This is likely a BUG in DASTARD.")
    else:
        df = self.df
    df2 = df.join_asof(df_es, on="timestamp", strategy="backward")
    return self.with_replacement_df(df2)

with_external_trigger_df(df_ext)

Add external trigger times from an existing dataframe

Source code in mass2/core/channel.py
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
def with_external_trigger_df(self, df_ext: pl.DataFrame) -> "Channel":
    """Add external trigger times from an existing dataframe"""
    df = self.df
    # Expect "subframecount" will be in the dataframe for LJH 2.2 files, but have to add it for OFF files:
    if "subframecount" not in df:
        df = self.df.with_columns(subframecount=pl.col("framecount") * self.subframediv)
    df2 = df.join_asof(df_ext, on="subframecount", strategy="backward", coalesce=False, suffix="_prev_ext_trig").join_asof(
        df_ext, on="subframecount", strategy="forward", coalesce=False, suffix="_next_ext_trig"
    )
    return self.with_replacement_df(df2)

with_good_expr(good_expr, replace=False)

Return a new Channel with the given good_expr, combined with the existing good_expr by "and", of by replacing it entirely if replace is True.

Source code in mass2/core/channel.py
775
776
777
778
779
780
781
782
def with_good_expr(self, good_expr: pl.Expr, replace: bool = False) -> "Channel":
    """Return a new Channel with the given good_expr, combined with the existing good_expr by "and",
    of by replacing it entirely if `replace` is True."""
    # the default value of self.good_expr is pl.lit(True)
    # and_(True) will just add visual noise when looking at good_expr and not affect behavior
    if not replace and good_expr is not True and not good_expr.meta.eq(pl.lit(True)):
        good_expr = good_expr.and_(self.good_expr)
    return dataclasses.replace(self, good_expr=good_expr)

with_good_expr_below_nsigma_outlier_resistant(col_nsigma_pairs, replace=False, use_prev_good_expr=True)

Set good_expr to exclude pulses with any of the given columns above outlier-resistant thresholds. Always sets lower limit at 0, so don't use for values that can be negative

Source code in mass2/core/channel.py
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
def with_good_expr_below_nsigma_outlier_resistant(
    self, col_nsigma_pairs: Iterable[tuple[str, float]], replace: bool = False, use_prev_good_expr: bool = True
) -> "Channel":
    """Set good_expr to exclude pulses with any of the given columns above outlier-resistant thresholds.
    Always sets lower limit at 0, so don't use for values that can be negative
    """
    if use_prev_good_expr:
        df = self.df.lazy().select(pl.exclude("pulse")).filter(self.good_expr).collect()
    else:
        df = self.df
    for i, (col, nsigma) in enumerate(col_nsigma_pairs):
        max_for_col = misc.outlier_resistant_nsigma_above_mid(df[col].to_numpy(), nsigma=nsigma)
        this_iter_good_expr = pl.col(col).is_between(0, max_for_col)
        if i == 0:
            good_expr = this_iter_good_expr
        else:
            good_expr = good_expr.and_(this_iter_good_expr)
    return self.with_good_expr(good_expr, replace)

with_good_expr_nsigma_range_outlier_resistant(col_nsigma_pairs, replace=False, use_prev_good_expr=True)

Set good_expr to exclude pulses with any of the given columns above outlier-resistant thresholds. Always sets lower limit at 0, so don't use for values that can be negative

Source code in mass2/core/channel.py
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
def with_good_expr_nsigma_range_outlier_resistant(
    self, col_nsigma_pairs: Iterable[tuple[str, float]], replace: bool = False, use_prev_good_expr: bool = True
) -> "Channel":
    """Set good_expr to exclude pulses with any of the given columns above outlier-resistant thresholds.
    Always sets lower limit at 0, so don't use for values that can be negative
    """
    if use_prev_good_expr:
        df = self.df.lazy().select(pl.exclude("pulse")).filter(self.good_expr).collect()
    else:
        df = self.df
    for i, (col, nsigma) in enumerate(col_nsigma_pairs):
        min_for_col, max_for_col = misc.outlier_resistant_nsigma_range_from_mid(df[col].to_numpy(), nsigma=nsigma)
        this_iter_good_expr = pl.col(col).is_between(min_for_col, max_for_col)
        if i == 0:
            good_expr = this_iter_good_expr
        else:
            good_expr = good_expr.and_(this_iter_good_expr)
    return self.with_good_expr(good_expr, replace)

with_good_expr_pretrig_rms_and_postpeak_deriv(n_sigma_pretrig_rms=20, n_sigma_postpeak_deriv=20, replace=False)

Set good_expr to exclude pulses with pretrigger RMS or postpeak derivative above outlier-resistant thresholds.

Source code in mass2/core/channel.py
789
790
791
792
793
794
795
796
797
798
def with_good_expr_pretrig_rms_and_postpeak_deriv(
    self, n_sigma_pretrig_rms: float = 20, n_sigma_postpeak_deriv: float = 20, replace: bool = False
) -> "Channel":
    """Set good_expr to exclude pulses with pretrigger RMS or postpeak derivative above outlier-resistant thresholds."""
    max_postpeak_deriv = misc.outlier_resistant_nsigma_above_mid(
        self.df["postpeak_deriv"].to_numpy(), nsigma=n_sigma_postpeak_deriv
    )
    max_pretrig_rms = misc.outlier_resistant_nsigma_above_mid(self.df["pretrig_rms"].to_numpy(), nsigma=n_sigma_pretrig_rms)
    good_expr = (pl.col("postpeak_deriv") < max_postpeak_deriv).and_(pl.col("pretrig_rms") < max_pretrig_rms)
    return self.with_good_expr(good_expr, replace)

with_range_around_median(col, range_up, range_down)

Set good_expr to exclude pulses with col outside the given range around its median.

Source code in mass2/core/channel.py
800
801
802
803
def with_range_around_median(self, col: str, range_up: float, range_down: float) -> "Channel":
    """Set good_expr to exclude pulses with `col` outside the given range around its median."""
    med = np.median(self.df[col].to_numpy())
    return self.with_good_expr(pl.col(col).is_between(med - range_down, med + range_up))

with_replacement_df(df2)

Replace the dataframe with a new one, keeping all other attributes the same.

Source code in mass2/core/channel.py
1322
1323
1324
1325
1326
1327
def with_replacement_df(self, df2: pl.DataFrame) -> "Channel":
    """Replace the dataframe with a new one, keeping all other attributes the same."""
    return dataclasses.replace(
        self,
        df=df2,
    )

with_select_step(col_expr_dict)

This step is meant for interactive exploration; it's basically like the df.select() method, but it's saved as a step.

Source code in mass2/core/channel.py
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
def with_select_step(self, col_expr_dict: dict[str, pl.Expr]) -> "Channel":
    """
    This step is meant for interactive exploration; it's basically like the df.select() method, but it's saved as a step.
    """
    extract = mass2.misc.extract_column_names_from_polars_expr
    inputs: set[str] = set()
    for expr in col_expr_dict.values():
        inputs.update(extract(expr))
    step = mass2.core.recipe.SelectStep(
        inputs=list(inputs),
        output=list(col_expr_dict.keys()),
        good_expr=self.good_expr,
        use_expr=pl.lit(True),
        col_expr_dict=col_expr_dict,
    )
    return self.with_step(step)

with_step(step)

Return a new Channel with the given step applied to generate new columns in the dataframe.

Source code in mass2/core/channel.py
753
754
755
756
757
758
759
760
761
762
763
764
765
766
def with_step(self, step: RecipeStep) -> "Channel":
    """Return a new Channel with the given step applied to generate new columns in the dataframe."""
    t_start = time.time()
    df2 = step.calc_from_df(self.df)
    elapsed_s = time.time() - t_start
    ch2 = dataclasses.replace(
        self,
        df=df2,
        good_expr=step.good_expr,
        df_history=self.df_history + [self.df],
        steps=self.steps.with_step(step),
        steps_elapsed_s=self.steps_elapsed_s + [elapsed_s],
    )
    return ch2

with_steps(steps)

Return a new Channel with the given steps applied to generate new columns in the dataframe.

Source code in mass2/core/channel.py
768
769
770
771
772
773
def with_steps(self, steps: Recipe) -> "Channel":
    """Return a new Channel with the given steps applied to generate new columns in the dataframe."""
    ch2 = self
    for step in steps:
        ch2 = ch2.with_step(step)
    return ch2

ChannelHeader dataclass

Metadata about a Channel, of the sort read from file header.

Source code in mass2/core/channel.py
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
@dataclass(frozen=True)
class ChannelHeader:
    """Metadata about a Channel, of the sort read from file header."""

    description: str  # filename or date/run number, etc
    data_source: str | None  # complete file path, if read from a file
    ch_num: int
    frametime_s: float
    n_presamples: int
    n_samples: int
    df: pl.DataFrame = field(repr=False)

    @classmethod
    def from_ljh_header_df(cls, df: pl.DataFrame) -> "ChannelHeader":
        """Construct from the LJH header dataframe as returned by LJHFile.to_polars()"""
        filepath = df["Filename"][0]
        return cls(
            description=os.path.split(filepath)[-1],
            data_source=filepath,
            ch_num=df["Channel"][0],
            frametime_s=df["Timebase"][0],
            n_presamples=df["Presamples"][0],
            n_samples=df["Total Samples"][0],
            df=df,
        )

from_ljh_header_df(df) classmethod

Construct from the LJH header dataframe as returned by LJHFile.to_polars()

Source code in mass2/core/channel.py
53
54
55
56
57
58
59
60
61
62
63
64
65
@classmethod
def from_ljh_header_df(cls, df: pl.DataFrame) -> "ChannelHeader":
    """Construct from the LJH header dataframe as returned by LJHFile.to_polars()"""
    filepath = df["Filename"][0]
    return cls(
        description=os.path.split(filepath)[-1],
        data_source=filepath,
        ch_num=df["Channel"][0],
        frametime_s=df["Timebase"][0],
        n_presamples=df["Presamples"][0],
        n_samples=df["Total Samples"][0],
        df=df,
    )

Data structures and methods for handling a group of microcalorimeter channels.

Channels dataclass

A collection of microcalorimeter channels, with methods to operate in parallel on all channels.

Source code in mass2/core/channels.py
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
@dataclass(frozen=True)  # noqa: PLR0904
class Channels:
    """A collection of microcalorimeter channels, with methods to operate in parallel on all channels."""

    channels: dict[int, Channel]
    description: str
    bad_channels: dict[int, BadChannel] = field(default_factory=dict)

    @property
    def ch0(self) -> Channel:
        """Return a representative Channel object for convenient exploration (the one with the lowest channel number)."""
        assert len(self.channels) > 0, "channels must be non-empty"
        return next(iter(self.channels.values()))

    def with_more_channels(self, more: "Channels") -> "Channels":
        """Return a Channels object with additional Channels in it.
        New channels with the same number will overrule existing ones.

        Parameters
        ----------
        more : Channels
            Another Channels object, to be added

        Returns
        -------
        Channels
            The replacement
        """
        channels = self.channels.copy()
        channels.update(more.channels)
        bad = self.bad_channels.copy()
        bad.update(more.bad_channels)
        descr = self.description + more.description + "\nWarning! created by with_more_channels()"
        return dataclasses.replace(self, channels=channels, bad_channels=bad, description=descr)

    @functools.cache
    def dfg(self, exclude: str = "pulse") -> pl.DataFrame:
        """Return a DataFrame containing good pulses from each channel. Excludes the given columns (default "pulse")."""
        # return a dataframe containing good pulses from each channel,
        # exluding "pulse" by default
        # and including column "ch_num"
        # the more common call should be to wrap this in a convenient plotter
        dfs = []
        for ch_num, channel in self.channels.items():
            df = channel.df.select(pl.exclude(exclude)).filter(channel.good_expr)
            # key_series = pl.Series("key", dtype=pl.Int64).extend_constant(key, len(df))
            assert ch_num == channel.header.ch_num
            ch_series = pl.Series("ch_num", dtype=pl.Int64).extend_constant(channel.header.ch_num, len(df))
            dfs.append(df.with_columns(ch_series))
        return pl.concat(dfs)

    def linefit(  # noqa: PLR0917
        self,
        line: float | str | SpectralLine | GenericLineModel,
        col: str,
        use_expr: pl.Expr = pl.lit(True),
        has_linear_background: bool = False,
        has_tails: bool = False,
        dlo: float = 50,
        dhi: float = 50,
        binsize: float = 0.5,
        params_update: lmfit.Parameters = lmfit.Parameters(),
    ) -> LineModelResult:
        """Perform a fit to one spectral line in the coadded histogram of the given column."""
        model = mass2.calibration.algorithms.get_model(line, has_linear_background=has_linear_background, has_tails=has_tails)
        pe = model.spect.peak_energy
        _bin_edges = np.arange(pe - dlo, pe + dhi, binsize)
        df_small = self.dfg().lazy().filter(use_expr).select(col).collect()
        bin_centers, counts = mass2.misc.hist_of_series(df_small[col], _bin_edges)
        params = model.guess(counts, bin_centers=bin_centers, dph_de=1)
        params["dph_de"].set(1.0, vary=False)
        print(f"before update {params=}")
        params = params.update(params_update)
        print(f"after update {params=}")
        result = model.fit(counts, params, bin_centers=bin_centers, minimum_bins_per_fwhm=3)
        result.set_label_hints(
            binsize=bin_centers[1] - bin_centers[0],
            ds_shortname=f"{len(self.channels)} channels, {self.description}",
            unit_str="eV",
            attr_str=col,
            states_hint=f"{use_expr=}",
            cut_hint="",
        )
        return result

    def plot_hist(self, col: str, bin_edges: ArrayLike, use_expr: pl.Expr = pl.lit(True), axis: plt.Axes | None = None) -> None:
        """Plot a histogram for the given column across all channels."""
        df_small = self.dfg().lazy().filter(use_expr).select(col).collect()
        ax = mass2.misc.plot_hist_of_series(df_small[col], bin_edges, axis)
        ax.set_title(f"{len(self.channels)} channels, {self.description}")

    def plot_hists(
        self,
        col: str,
        bin_edges: ArrayLike,
        group_by_col: bool,
        axis: plt.Axes | None = None,
        use_expr: pl.Expr | None = None,
        skip_none: bool = True,
    ) -> None:
        """
        Plots histograms for the given column, grouped by the specified column.

        Parameters:
        - col (str): The column name to plot.
        - bin_edges (array-like): The edges of the bins for the histogram.
        - group_by_col (str): The column name to group by. This is required.
        - axis (matplotlib.Axes, optional): The axis to plot on. If None, a new figure is created.
        """
        if axis is None:
            _, ax = plt.subplots()  # Create a new figure if no axis is provided
        else:
            ax = axis

        if use_expr is None:
            df_small = (self.dfg().lazy().select(col, group_by_col)).collect().sort(group_by_col, descending=False)
        else:
            df_small = (self.dfg().lazy().filter(use_expr).select(col, group_by_col)).collect().sort(group_by_col, descending=False)

        # Plot a histogram for each group
        for (group_name,), group_data in df_small.group_by(group_by_col, maintain_order=True):
            if group_name is None and skip_none:
                continue
            # Get the data for the column to plot
            values = group_data[col]
            # Plot the histogram for the current group
            if group_name == "EBIT":
                ax.hist(values, bins=bin_edges, alpha=0.9, color="k", label=str(group_name))
            else:
                ax.hist(values, bins=bin_edges, alpha=0.5, label=str(group_name))
            # bin_centers, counts = mass2.misc.hist_of_series(values, bin_edges)
            # plt.plot(bin_centers, counts, label=group_name)

        # Customize the plot
        ax.set_xlabel(str(col))
        ax.set_ylabel("Frequency")
        ax.set_title(f"Coadded Histogram of {col} grouped by {group_by_col}")

        # Add a legend to label the groups
        ax.legend(title=group_by_col)

        plt.tight_layout()

    def _limited_chan_list(self, limit: int | None = 20, channels: list[int] | None = None) -> list[int]:
        """A helper to get a list of channel numbers, limited to the given number if needed, and including only
        channel numbers from `channels` if not None."""
        limited_chan = list(self.channels.keys())
        if channels is not None:
            limited_chan = list(set(limited_chan).intersection(set(channels)))
            limited_chan.sort()
        if limit and len(limited_chan) > limit:
            limited_chan = limited_chan[:limit]
        return limited_chan

    def plot_filters(
        self,
        limit: int | None = 20,
        channels: list[int] | None = None,
        colormap: matplotlib.colors.Colormap = plt.cm.viridis,
        axis: plt.Axes | None = None,
    ) -> plt.Axes:
        """Plot the optimal filters for the channels in this Channels object.

        Parameters
        ----------
        limit : int | None, optional
            Plot at most this many filters if not None, by default 20
        channels : list[int] | None, optional
            Plot only channels with numbers in this list if not None, by default None
        colormap : matplotlib.colors.Colormap, optional
            The color scale to use, by default plt.cm.viridis
        axis : plt.Axes | None, optional
            A `plt.Axes` to plot on, or if None a new one, by default None

        Returns
        -------
        plt.Axes
            The `plt.Axes` containing the plot.
        """
        if axis is None:
            fig = plt.figure()
            axis = fig.subplots()

        plot_these_chan = self._limited_chan_list(limit, channels)
        n_expected = len(plot_these_chan)
        for i, ch_num in enumerate(plot_these_chan):
            ch = self.channels[ch_num]
            # The next line _assumes_ a 5-lag filter. Fix as needed.
            x = np.arange(ch.header.n_samples - 4) - ch.header.n_presamples + 2
            y = ch.last_filter
            if y is not None:
                plt.plot(x, y, color=colormap(i / n_expected), label=f"Chan {ch_num}")
        plt.legend()
        plt.xlabel("Samples after trigger")
        plt.title("Optimal filters")

    def plot_avg_pulses(
        self,
        limit: int | None = 20,
        channels: list[int] | None = None,
        colormap: matplotlib.colors.Colormap = plt.cm.viridis,
        axis: plt.Axes | None = None,
    ) -> plt.Axes:
        """Plot the average pulses (the signal model) for the channels in this Channels object.

        Parameters
        ----------
        limit : int | None, optional
            Plot at most this many filters if not None, by default 20
        channels : list[int] | None, optional
            Plot only channels with numbers in this list if not None, by default None
        colormap : matplotlib.colors.Colormap, optional
            The color scale to use, by default plt.cm.viridis
        axis : plt.Axes | None, optional
            A `plt.Axes` to plot on, or if None a new one, by default None

        Returns
        -------
        plt.Axes
            The `plt.Axes` containing the plot.
        """
        if axis is None:
            fig = plt.figure()
            axis = fig.subplots()

        plot_these_chan = self._limited_chan_list(limit, channels)
        frametime_ms = self.channels[plot_these_chan[0]].header.frametime_s * 1e3

        def samples2ms(s: ArrayLike) -> ArrayLike:
            return np.asarray(s) * frametime_ms

        def ms2samples(ms: ArrayLike) -> ArrayLike:
            return np.asarray(ms) / frametime_ms

        upper_axis = axis.secondary_xaxis("top", functions=(samples2ms, ms2samples))
        upper_axis.set_xlabel("Time after trigger (ms)")

        n_expected = len(plot_these_chan)
        for i, ch_num in enumerate(plot_these_chan):
            ch = self.channels[ch_num]
            x = np.arange(ch.header.n_samples) - ch.header.n_presamples
            y = ch.last_avg_pulse
            if y is not None:
                plt.plot(x, y, color=colormap(i / n_expected), label=f"Chan {ch_num}")
        plt.legend()
        plt.xlabel("Samples after trigger")
        plt.title("Average pulses")
        plt.tight_layout()

    def plot_noise_spectrum(
        self,
        limit: int | None = 20,
        channels: list[int] | None = None,
        colormap: matplotlib.colors.Colormap = plt.cm.viridis,
        axis: plt.Axes | None = None,
    ) -> plt.Axes:
        """Plot the noise power spectrum for the channels in this Channels object.

        Parameters
        ----------
        limit : int | None, optional
            Plot at most this many filters if not None, by default 20
        channels : list[int] | None, optional
            Plot only channels with numbers in this list if not None, by default None
        colormap : matplotlib.colors.Colormap, optional
            The color scale to use, by default plt.cm.viridis
        axis : plt.Axes | None, optional
            A `plt.Axes` to plot on, or if None a new one, by default None

        Returns
        -------
        plt.Axes
            The `plt.Axes` containing the plot.
        """
        if axis is None:
            fig = plt.figure()
            axis = fig.subplots()

        plot_these_chan = self._limited_chan_list(limit, channels)
        n_expected = len(plot_these_chan)
        for i, ch_num in enumerate(plot_these_chan):
            ch = self.channels[ch_num]
            freqpsd = ch.last_noise_psd
            if freqpsd is not None:
                freq, psd = freqpsd
                plt.plot(freq, psd, color=colormap(i / n_expected), label=f"Chan {ch_num}")
        plt.legend()
        plt.loglog()
        plt.xlabel("Frequency (Hz)")
        plt.title("Noise power spectral density")

    def plot_noise_autocorr(
        self,
        limit: int | None = 20,
        channels: list[int] | None = None,
        colormap: matplotlib.colors.Colormap = plt.cm.viridis,
        axis: plt.Axes | None = None,
    ) -> plt.Axes:
        """Plot the noise power autocorrelation for the channels in this Channels object.

        Parameters
        ----------
        limit : int | None, optional
            Plot at most this many filters if not None, by default 20
        channels : list[int] | None, optional
            Plot only channels with numbers in this list if not None, by default None
        colormap : matplotlib.colors.Colormap, optional
            The color scale to use, by default plt.cm.viridis
        axis : plt.Axes | None, optional
            A `plt.Axes` to plot on, or if None a new one, by default None

        Returns
        -------
        plt.Axes
            The `plt.Axes` containing the plot.
        """
        if axis is None:
            fig = plt.figure()
            axis = fig.subplots()

        plot_these_chan = self._limited_chan_list(limit, channels)
        n_expected = len(plot_these_chan)
        for i, ch_num in enumerate(plot_these_chan):
            ch = self.channels[ch_num]
            ac = ch.last_noise_autocorrelation
            if ac is not None:
                color = colormap(i / n_expected)
                plt.plot(ac, color=color, label=f"Chan {ch_num}")
                plt.plot(0, ac[0], "o", color=color)
        plt.legend()
        plt.xlabel("Lags")
        plt.title("Noise autocorrelation")

    def map(self, f: Callable, allow_throw: bool = False) -> "Channels":
        """Map function `f` over all channels, returning a new Channels object containing the new Channel objects."""
        new_channels = {}
        new_bad_channels = {}
        for key, channel in self.channels.items():
            try:
                new_channels[key] = f(channel)
            except KeyboardInterrupt as kint:
                raise kint
            except Exception as ex:
                error_type: type = type(ex)
                error_message: str = str(ex)
                backtrace: str = traceback.format_exc()
                if allow_throw:
                    raise
                print(f"{key=} {channel=} failed the step {f}")
                print(f"{error_type=}")
                print(f"{error_message=}")
                new_bad_channels[key] = channel.as_bad(error_type, error_message, backtrace)
        new_bad_channels = mass2.misc.merge_dicts_ordered_by_keys(self.bad_channels, new_bad_channels)

        return Channels(new_channels, self.description, bad_channels=new_bad_channels)

    def set_bad(self, ch_num: int, msg: str, require_ch_num_exists: bool = True) -> "Channels":
        """Return a copy of this Channels object with the given channel number marked as bad."""
        new_channels = {}
        new_bad_channels = {}
        if require_ch_num_exists:
            assert ch_num in self.channels.keys(), f"{ch_num} can't be set bad because it does not exist"
        for key, channel in self.channels.items():
            if key == ch_num:
                new_bad_channels[key] = channel.as_bad(None, msg, None)
            else:
                new_channels[key] = channel
        return Channels(new_channels, self.description, bad_channels=new_bad_channels)

    def linefit_joblib(self, line: str, col: str, prefer: str = "threads", n_jobs: int = 4) -> LineModelResult:
        """No one but Galen understands this function."""

        def work(key: int) -> LineModelResult:
            """A unit of parallel work: fit line to one channel."""
            channel = self.channels[key]
            return channel.linefit(line, col)

        parallel = joblib.Parallel(n_jobs=n_jobs, prefer=prefer)  # its not clear if threads are better.... what blocks the gil?
        results = parallel(joblib.delayed(work)(key) for key in self.channels.keys())
        return results

    def __hash__(self) -> int:
        """Hash based on the object's id (identity)."""
        # needed to make functools.cache work
        # if self or self.anything is mutated, assumptions will be broken
        # and we may get nonsense results
        return hash(id(self))

    def __eq__(self, other: Any) -> bool:
        """Equality test based on object identity."""
        return id(self) == id(other)

    @classmethod
    def from_ljh_path_pairs(cls, pulse_noise_pairs: Iterable[tuple[str, str]], description: str) -> "Channels":
        """
        Create a :class:`Channels` instance from pairs of LJH files.

        Args:
            pulse_noise_pairs (List[Tuple[str, str]]):
                A list of `(pulse_path, noise_path)` tuples, where each entry contains
                the file path to a pulse LJH file and its corresponding noise LJH file.
            description (str):
                A human-readable description for the resulting Channels object.

        Returns:
            Channels:
                A Channels object with one :class:`Channel` per `(pulse_path, noise_path)` pair.

        Raises:
            AssertionError:
                If two input files correspond to the same channel number.

        Notes:
            Each channel is created via :meth:`Channel.from_ljh`.
            The channel number is taken from the LJH file header and used as the key
            in the returned Channels mapping.

        Examples:
            >>> pairs = [
            ...     ("datadir/run0000_ch0000.ljh", "datadir/run0001_ch0000.ljh"),
            ...     ("datadir/run0000_ch0001.ljh", "datadir/run0001_ch0001.ljh"),
            ... ]
            >>> channels = Channels.from_ljh_path_pairs(pairs, description="Test run")
            >>> list(channels.keys())
            [0, 1]
        """
        channels: dict[int, Channel] = {}
        for pulse_path, noise_path in pulse_noise_pairs:
            channel = Channel.from_ljh(pulse_path, noise_path)
            assert channel.header.ch_num not in channels.keys()
            channels[channel.header.ch_num] = channel
        return cls(channels, description)

    @classmethod
    def from_off_paths(cls, off_paths: Iterable[str | Path], description: str) -> "Channels":
        """Create an instance from a sequence of OFF-file paths"""
        channels = {}
        for path in off_paths:
            ch = Channel.from_off(mass2.core.OffFile(str(path)))
            channels[ch.header.ch_num] = ch
        return cls(channels, description)

    @classmethod
    def from_ljh_folder(
        cls,
        pulse_folder: str | Path,
        noise_folder: str | Path | None = None,
        limit: int | None = None,
        exclude_ch_nums: list[int] | None = None,
        include_ch_nums: list[int] | None = None,
    ) -> "Channels":
        """Create an instance from a directory of LJH files."""
        assert os.path.isdir(pulse_folder), f"{pulse_folder=} {noise_folder=}"
        pulse_folder = str(pulse_folder)
        if exclude_ch_nums is None:
            exclude_ch_nums = []
        if noise_folder is None:
            paths = ljhutil.find_ljh_files(pulse_folder, exclude_ch_nums=exclude_ch_nums, include_ch_nums=include_ch_nums)
            if limit is not None:
                paths = paths[:limit]
            pairs = [(path, "") for path in paths]
        else:
            assert os.path.isdir(noise_folder), f"{pulse_folder=} {noise_folder=}"
            noise_folder = str(noise_folder)
            pairs = ljhutil.match_files_by_channel(
                pulse_folder, noise_folder, limit=limit, exclude_ch_nums=exclude_ch_nums, include_ch_nums=include_ch_nums
            )
        description = f"from_ljh_folder {pulse_folder=} {noise_folder=}"
        print(f"{description}")
        print(f"   from_ljh_folder has {len(pairs)} pairs")
        data = cls.from_ljh_path_pairs(pairs, description)
        print(f"   and the Channels obj has {len(data.channels)} pairs")
        return data

    def get_an_ljh_path(self) -> Path:
        """Return the path to a representative one of the LJH files used to create this Channels object."""
        return pathlib.Path(self.ch0.header.df["Filename"][0])

    def get_path_in_output_folder(self, filename: str | Path) -> Path:
        """Return a path in an output folder named like the run number, sibling to the LJH folder."""
        ljh_path = self.get_an_ljh_path()
        base_name, _ = ljh_path.name.split("_chan")
        date, run_num = base_name.split("_run")  # timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        output_dir = ljh_path.parent.parent / f"{run_num}mass2_output"
        output_dir.mkdir(parents=True, exist_ok=True)
        return output_dir / filename

    def get_experiment_state_df(self, experiment_state_path: str | Path | None = None) -> pl.DataFrame:
        """Return a DataFrame containing experiment state information.

        Parameters
        ----------
        experiment_state_path : str | Path | None, optional
            load experiment state info from the given path or (if None) infer the path from an LJH file, by default None

        Returns
        -------
        pl.DataFrame
            A Data Frame with a table of experiment state labels and the corresponding start time (in Polars timestamp format).
        """
        if experiment_state_path is None:
            ljh_path = self.get_an_ljh_path()
            experiment_state_path = ljhutil.experiment_state_path_from_ljh_path(ljh_path)
        df = pl.read_csv(experiment_state_path, new_columns=["unixnano", "state_label"])
        df_es = df.select(pl.from_epoch("unixnano", time_unit="ns").dt.cast_time_unit("us").alias("timestamp"))

        # Strip leading/trailing whitespace from state labels. Convert string -> categorical
        df_labels = df.select(pl.col("state_label").str.strip_chars()).cast(pl.Categorical)
        df_es = df_es.with_columns(df_labels)
        return df_es

    def with_experiment_state_df(self, df_es: pl.DataFrame) -> "Channels":
        """Return a copy of this Channels object with experiment state information added to each Channel.

        Parameters
        ----------
        df_es : pl.DataFrame
            experiment state info

        Returns
        -------
        Channels
            An enhanced copy of self, with experiment state information added to each channel.
        """
        # TODO: the following is less performant than making a use_expr for each state,
        # and using .set_sorted on the timestamp column.
        ch2s = {}
        for ch_num, ch in self.channels.items():
            ch2s[ch_num] = ch.with_experiment_state_df(df_es)
        return Channels(ch2s, self.description)

    def with_experiment_state_by_path(self, experiment_state_path: str | None = None) -> "Channels":
        """Return a copy of this Channels object with experiment state information added to each Channel.

        Parameters
        ----------
        experiment_state_path : str | Path | None, optional
            load experiment state info from the given path or (if None) infer the path from an LJH file, by default None

        Returns
        -------
        Channels
            An enhanced copy of self, with experiment state information added to each channel.
        """
        df_es = self.get_experiment_state_df(experiment_state_path)
        return self.with_experiment_state_df(df_es)

    def with_external_trigger_by_path(self, path: str | pathlib.Path | None = None) -> "Channels":
        """Return a copy of this Channels object with external trigger information added, loaded
        from the given path or EVENTUALLY (if None) inferring it from an LJH file (not yet implemented)."""
        if path is None:
            raise NotImplementedError("cannot infer external trigger path yet")
        with open(path, "rb") as _f:
            _header_line = _f.readline()  # read the one header line before opening the binary data
            external_trigger_subframe_count = np.fromfile(_f, "int64")
        df_ext = pl.DataFrame({
            "subframecount": external_trigger_subframe_count,
        })
        return self.with_external_trigger_df(df_ext)

    def with_external_trigger_df(self, df_ext: pl.DataFrame) -> "Channels":
        """Return a copy of this Channels object with external trigger information added to each Channel,
        found from the given DataFrame."""

        def with_etrig_df(channel: Channel) -> Channel:
            """Return a copy of one Channel object with external trigger information added to it"""
            return channel.with_external_trigger_df(df_ext)

        return self.map(with_etrig_df)

    def with_steps_dict(self, steps_dict: dict[int, Recipe]) -> "Channels":
        """Return a copy of this Channels object with the given Recipe objects added to each Channel."""

        def load_recipes(channel: Channel) -> Channel:
            """Return a copy of one Channel object with Recipe steps added to it"""
            try:
                steps = steps_dict[channel.header.ch_num]
            except KeyError:
                raise Exception("steps dict did not contain steps for this ch_num")
            return channel.with_steps(steps)

        return self.map(load_recipes)

    def save_recipes(
        self, filename: str, required_fields: str | Iterable[str] | None = None, drop_debug: bool = True
    ) -> dict[int, Recipe]:
        """Pickle a dictionary (one entry per channel) of Recipe objects.

        If you want to save a "recipe", a minimal series of steps required to reproduce the required field(s),
        then set `required_fields` to be a list/tuple/set of DataFrame column names (or a single column name)
        whose production from raw data should be possible.

        Parameters
        ----------
        filename : str
            Filename to store recipe in, typically of the form "*.pkl"
        required_fields : str | Iterable[str] | None
            The field (str) or fields (Iterable[str]) that the recipe should be able to generate from a raw LJH file.
            Drop all steps that do not lead (directly or indireactly) to producing this field or these fields.
            If None, then preserve all steps (default None).
        drop_debug: bool
            Whether to remove debugging-related data from each `RecipeStep`, if the subclass supports this (via the
            `RecipeStep.drop_debug() method).

        Returns
        -------
        dict
            Dictionary with keys=channel numbers, values=the (possibly trimmed and debug-dropped) Recipe objects.
        """
        steps = {}
        for channum, ch in self.channels.items():
            steps[channum] = ch.steps.trim_dead_ends(required_fields=required_fields, drop_debug=drop_debug)
        mass2.misc.pickle_object(steps, filename)
        return steps

    def load_recipes(self, filename: str) -> "Channels":
        """Return a copy of this Channels object with Recipe objects loaded from the given pickle file
        and applied to each Channel."""
        steps = mass2.misc.unpickle_object(filename)
        return self.with_steps_dict(steps)

    def parent_folder_path(self) -> pathlib.Path:
        """Return the parent folder of the LJH files used to create this Channels object. Specifically, the
        `self.ch0` channel's directory is used (normally the answer would be the same for all channels)."""
        parent_folder_path = pathlib.Path(self.ch0.header.df["Filename"][0]).parent.parent
        print(f"{parent_folder_path=}")
        return parent_folder_path

    def concat_data(self, other_data: "Channels") -> "Channels":
        """Return a new Channels object with data from this and the other Channels object concatenated together.
        Only channels that exist in both objects are included in the result."""
        # sorting here to show intention, but I think set is sorted by insertion order as
        # an implementation detail so this may not do anything
        ch_nums = sorted(list(set(self.channels.keys()).intersection(other_data.channels.keys())))
        new_channels = {}
        for ch_num in ch_nums:
            ch = self.channels[ch_num]
            other_ch = other_data.channels[ch_num]
            combined_df = mass2.core.misc.concat_dfs_with_concat_state(ch.df, other_ch.df)
            new_ch = ch.with_replacement_df(combined_df)
            new_channels[ch_num] = new_ch
        return mass2.Channels(new_channels, self.description + other_data.description)

    @classmethod
    def from_df(
        cls,
        df_in: pl.DataFrame,
        frametime_s: float,
        n_presamples: int,
        n_samples: int,
        description: str = "from Channels.channels_from_df",
    ) -> "Channels":
        """Create a Channels object from a single DataFrame that holds data from multiple channels."""
        # requres a column named "ch_num" containing the channel number
        keys_df: dict[tuple, pl.DataFrame] = df_in.partition_by(by=["ch_num"], as_dict=True)
        dfs: dict[int, pl.DataFrame] = {keys[0]: df for (keys, df) in keys_df.items()}
        channels: dict[int, Channel] = {}
        for ch_num, df in dfs.items():
            channels[ch_num] = Channel(
                df,
                header=ChannelHeader(
                    description="from df",
                    data_source=None,
                    ch_num=ch_num,
                    frametime_s=frametime_s,
                    n_presamples=n_presamples,
                    n_samples=n_samples,
                    df=df,
                ),
                npulses=len(df),
            )
        return Channels(channels, description)

    def save_analysis(
        self, zip_path: Path | str, overwrite: bool = False, trim_debug: bool = False, trim_timestamp_and_subframecount: bool = False
    ) -> None:
        """Save an analysis-in-progress completely to a zip file, only tested for ljh backed channels so far

        Parameters
        ----------
        path : Path | str
            Directory to save work in. If it doesn't exist, its parent should.
        overwrite : bool, optional
            If `path` exists, whether to overwrite it, by default False
        trim_debug : bool, optional
            Whether to make save file smaller (potentially) at the cost of breaking some debugging plots, by default False
        trim_timestamp_and_subframecount: bool, optional
            Whether to make the save file smaller at the cost of reduced information avaialble when the ljh files are
            not available, eg when loading on another computer.
        """
        zip_path = pathlib.Path(zip_path)
        if zip_path.suffix != ".zip":
            zip_path = zip_path.with_suffix(".zip")

        if os.path.exists(zip_path) and not overwrite:
            raise ValueError(f"File exists; use `save_analysis(...overwrite=True)` to overwrite the existing {zip_path=}")

        def store_dataframe_to_parquet_and_return_pickleable_channel(ch: Channel, zf: ZipFile, parquet_path: str) -> Channel:
            """Store the `ch.df` to a parquet file of the given name in the ZipFile (open for writing).
            Prepare `ch` for pickling by removing its dataframe and dataframe history, and stripping debug info from `steps`

            Parameters
            ----------
            ch : Channel
                The channel to modify by storing dataframe to parquet and removing it.
            zf : ZipFile
                A ZipFile object currently open for writing.
            parquet_file : str
                The name to use for storing the parquet file within `zf`

            Returns
            -------
            Channel
                A copy of `ch` amenable to pickling with the dataframe and dataframe history removed and with trimmed steps.
            """
            # Don't store the memmapped LJH pulse info (if present) in the Parquet file
            if trim_timestamp_and_subframecount:
                df = ch.df.drop("pulse", "timestamp", "subframecount", strict=False)
            else:
                df = ch.df.drop("pulse", strict=False)
            buffer = io.BytesIO()
            df.write_parquet(buffer)
            zf.writestr(parquet_path, buffer.getvalue())
            if trim_debug:
                steps = ch.steps.trim_debug_info()
            else:
                steps = ch.steps
            return dataclasses.replace(ch, df=pl.DataFrame(), df_history=[], noise=None, steps=steps)

        with ZipFile(str(zip_path), "w") as zf:
            channels = {}
            bad_channels = {}
            for ch_num, ch in self.channels.items():
                parquet_path = f"data_chan{ch_num:04d}.parquet"
                channels[ch_num] = store_dataframe_to_parquet_and_return_pickleable_channel(ch, zf, parquet_path)
            for ch_num, badch in self.bad_channels.items():
                parquet_path = f"data_bad_chan{ch_num:04d}.parquet"
                ch = store_dataframe_to_parquet_and_return_pickleable_channel(badch.ch, zf, parquet_path)
                bad_channels[ch_num] = dataclasses.replace(badch, ch=ch)
            data = dataclasses.replace(self, channels=channels, bad_channels=bad_channels)
            pickle_file = "data_all.pkl"
            zf.writestr(pickle_file, dill.dumps(data))

    @staticmethod
    def load_analysis(path: Path | str) -> "Channels":
        """Load an analysis-in-progress from a zipfile

        Parameters
        ----------
        path : Path | str
            Zipfile that work was saved in.
        """
        path = pathlib.Path(path)
        path.exists() and path.is_file()

        def _restore_dataframe(ch: Channel, df: pl.DataFrame) -> Channel:
            """Take a channel and replace its dataframe with the given one, loaded from a parquet file

            Parameters
            ----------
            ch : Channel
                A channel, loaded from a pickle file, with an empty dataframe
            df : DataFrame
                A replacement dataframe for the existing one (typically, the existing one is empty)

            Returns
            -------
            Channel
                The Channel `ch` but with `ch.df` updated, including any raw data backed by an LJH file
            """
            # If this channel was based on an LJH file, restore columns from the LJH file to the dataframe.
            if ch.header.data_source is not None:
                ljh_path = ch.header.data_source
                if ljh_path.endswith(".ljh") or ljh_path.endswith(".noi"):
                    ljh_backed_chan = Channel.from_ljh(ljh_path)
                    df = df.with_columns(ljh_backed_chan.df)
            # df_history is needed for some debug plots to work. This version has strictly more columns than required
            # at each history point. TODO: We could use the steps inputs and outputs to trim the appropriate columns.
            # For getting started, though, it's easier just to let each dataframe in history equal the final dataframe.
            df_history = [df] * len(ch.steps)
            return dataclasses.replace(ch, df=df, df_history=df_history)

        with ZipFile(path, "r") as zf:
            pickle_file = "data_all.pkl"
            pickle_bytes = zf.read(pickle_file)
            data: Channels = dill.loads(pickle_bytes)

            restored_channels = {}
            for ch_num, ch in data.channels.items():
                parquet_file = f"data_chan{ch_num:04d}.parquet"
                df = pl.read_parquet(zf.read(parquet_file))
                restored_channels[ch_num] = _restore_dataframe(ch, df)

            restored_bad_channels = {}
            for ch_num, badch in data.bad_channels.items():
                parquet_file = f"data_bad_chan{ch_num:04d}.parquet"
                df = pl.read_parquet(zf.read(parquet_file))
                ch = _restore_dataframe(badch.ch, df)
                restored_bad_channels[ch_num] = dataclasses.replace(badch, ch=ch)

            return dataclasses.replace(data, channels=restored_channels, bad_channels=restored_bad_channels)

ch0 property

Return a representative Channel object for convenient exploration (the one with the lowest channel number).

__eq__(other)

Equality test based on object identity.

Source code in mass2/core/channels.py
421
422
423
def __eq__(self, other: Any) -> bool:
    """Equality test based on object identity."""
    return id(self) == id(other)

__hash__()

Hash based on the object's id (identity).

Source code in mass2/core/channels.py
414
415
416
417
418
419
def __hash__(self) -> int:
    """Hash based on the object's id (identity)."""
    # needed to make functools.cache work
    # if self or self.anything is mutated, assumptions will be broken
    # and we may get nonsense results
    return hash(id(self))

concat_data(other_data)

Return a new Channels object with data from this and the other Channels object concatenated together. Only channels that exist in both objects are included in the result.

Source code in mass2/core/channels.py
661
662
663
664
665
666
667
668
669
670
671
672
673
674
def concat_data(self, other_data: "Channels") -> "Channels":
    """Return a new Channels object with data from this and the other Channels object concatenated together.
    Only channels that exist in both objects are included in the result."""
    # sorting here to show intention, but I think set is sorted by insertion order as
    # an implementation detail so this may not do anything
    ch_nums = sorted(list(set(self.channels.keys()).intersection(other_data.channels.keys())))
    new_channels = {}
    for ch_num in ch_nums:
        ch = self.channels[ch_num]
        other_ch = other_data.channels[ch_num]
        combined_df = mass2.core.misc.concat_dfs_with_concat_state(ch.df, other_ch.df)
        new_ch = ch.with_replacement_df(combined_df)
        new_channels[ch_num] = new_ch
    return mass2.Channels(new_channels, self.description + other_data.description)

dfg(exclude='pulse') cached

Return a DataFrame containing good pulses from each channel. Excludes the given columns (default "pulse").

Source code in mass2/core/channels.py
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
@functools.cache
def dfg(self, exclude: str = "pulse") -> pl.DataFrame:
    """Return a DataFrame containing good pulses from each channel. Excludes the given columns (default "pulse")."""
    # return a dataframe containing good pulses from each channel,
    # exluding "pulse" by default
    # and including column "ch_num"
    # the more common call should be to wrap this in a convenient plotter
    dfs = []
    for ch_num, channel in self.channels.items():
        df = channel.df.select(pl.exclude(exclude)).filter(channel.good_expr)
        # key_series = pl.Series("key", dtype=pl.Int64).extend_constant(key, len(df))
        assert ch_num == channel.header.ch_num
        ch_series = pl.Series("ch_num", dtype=pl.Int64).extend_constant(channel.header.ch_num, len(df))
        dfs.append(df.with_columns(ch_series))
    return pl.concat(dfs)

from_df(df_in, frametime_s, n_presamples, n_samples, description='from Channels.channels_from_df') classmethod

Create a Channels object from a single DataFrame that holds data from multiple channels.

Source code in mass2/core/channels.py
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
@classmethod
def from_df(
    cls,
    df_in: pl.DataFrame,
    frametime_s: float,
    n_presamples: int,
    n_samples: int,
    description: str = "from Channels.channels_from_df",
) -> "Channels":
    """Create a Channels object from a single DataFrame that holds data from multiple channels."""
    # requres a column named "ch_num" containing the channel number
    keys_df: dict[tuple, pl.DataFrame] = df_in.partition_by(by=["ch_num"], as_dict=True)
    dfs: dict[int, pl.DataFrame] = {keys[0]: df for (keys, df) in keys_df.items()}
    channels: dict[int, Channel] = {}
    for ch_num, df in dfs.items():
        channels[ch_num] = Channel(
            df,
            header=ChannelHeader(
                description="from df",
                data_source=None,
                ch_num=ch_num,
                frametime_s=frametime_s,
                n_presamples=n_presamples,
                n_samples=n_samples,
                df=df,
            ),
            npulses=len(df),
        )
    return Channels(channels, description)

from_ljh_folder(pulse_folder, noise_folder=None, limit=None, exclude_ch_nums=None, include_ch_nums=None) classmethod

Create an instance from a directory of LJH files.

Source code in mass2/core/channels.py
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
@classmethod
def from_ljh_folder(
    cls,
    pulse_folder: str | Path,
    noise_folder: str | Path | None = None,
    limit: int | None = None,
    exclude_ch_nums: list[int] | None = None,
    include_ch_nums: list[int] | None = None,
) -> "Channels":
    """Create an instance from a directory of LJH files."""
    assert os.path.isdir(pulse_folder), f"{pulse_folder=} {noise_folder=}"
    pulse_folder = str(pulse_folder)
    if exclude_ch_nums is None:
        exclude_ch_nums = []
    if noise_folder is None:
        paths = ljhutil.find_ljh_files(pulse_folder, exclude_ch_nums=exclude_ch_nums, include_ch_nums=include_ch_nums)
        if limit is not None:
            paths = paths[:limit]
        pairs = [(path, "") for path in paths]
    else:
        assert os.path.isdir(noise_folder), f"{pulse_folder=} {noise_folder=}"
        noise_folder = str(noise_folder)
        pairs = ljhutil.match_files_by_channel(
            pulse_folder, noise_folder, limit=limit, exclude_ch_nums=exclude_ch_nums, include_ch_nums=include_ch_nums
        )
    description = f"from_ljh_folder {pulse_folder=} {noise_folder=}"
    print(f"{description}")
    print(f"   from_ljh_folder has {len(pairs)} pairs")
    data = cls.from_ljh_path_pairs(pairs, description)
    print(f"   and the Channels obj has {len(data.channels)} pairs")
    return data

from_ljh_path_pairs(pulse_noise_pairs, description) classmethod

Create a :class:Channels instance from pairs of LJH files.

Args: pulse_noise_pairs (List[Tuple[str, str]]): A list of (pulse_path, noise_path) tuples, where each entry contains the file path to a pulse LJH file and its corresponding noise LJH file. description (str): A human-readable description for the resulting Channels object.

Returns: Channels: A Channels object with one :class:Channel per (pulse_path, noise_path) pair.

Raises: AssertionError: If two input files correspond to the same channel number.

Notes: Each channel is created via :meth:Channel.from_ljh. The channel number is taken from the LJH file header and used as the key in the returned Channels mapping.

Examples: >>> pairs = [ ... ("datadir/run0000_ch0000.ljh", "datadir/run0001_ch0000.ljh"), ... ("datadir/run0000_ch0001.ljh", "datadir/run0001_ch0001.ljh"), ... ] >>> channels = Channels.from_ljh_path_pairs(pairs, description="Test run") >>> list(channels.keys()) [0, 1]

Source code in mass2/core/channels.py
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
@classmethod
def from_ljh_path_pairs(cls, pulse_noise_pairs: Iterable[tuple[str, str]], description: str) -> "Channels":
    """
    Create a :class:`Channels` instance from pairs of LJH files.

    Args:
        pulse_noise_pairs (List[Tuple[str, str]]):
            A list of `(pulse_path, noise_path)` tuples, where each entry contains
            the file path to a pulse LJH file and its corresponding noise LJH file.
        description (str):
            A human-readable description for the resulting Channels object.

    Returns:
        Channels:
            A Channels object with one :class:`Channel` per `(pulse_path, noise_path)` pair.

    Raises:
        AssertionError:
            If two input files correspond to the same channel number.

    Notes:
        Each channel is created via :meth:`Channel.from_ljh`.
        The channel number is taken from the LJH file header and used as the key
        in the returned Channels mapping.

    Examples:
        >>> pairs = [
        ...     ("datadir/run0000_ch0000.ljh", "datadir/run0001_ch0000.ljh"),
        ...     ("datadir/run0000_ch0001.ljh", "datadir/run0001_ch0001.ljh"),
        ... ]
        >>> channels = Channels.from_ljh_path_pairs(pairs, description="Test run")
        >>> list(channels.keys())
        [0, 1]
    """
    channels: dict[int, Channel] = {}
    for pulse_path, noise_path in pulse_noise_pairs:
        channel = Channel.from_ljh(pulse_path, noise_path)
        assert channel.header.ch_num not in channels.keys()
        channels[channel.header.ch_num] = channel
    return cls(channels, description)

from_off_paths(off_paths, description) classmethod

Create an instance from a sequence of OFF-file paths

Source code in mass2/core/channels.py
466
467
468
469
470
471
472
473
@classmethod
def from_off_paths(cls, off_paths: Iterable[str | Path], description: str) -> "Channels":
    """Create an instance from a sequence of OFF-file paths"""
    channels = {}
    for path in off_paths:
        ch = Channel.from_off(mass2.core.OffFile(str(path)))
        channels[ch.header.ch_num] = ch
    return cls(channels, description)

get_an_ljh_path()

Return the path to a representative one of the LJH files used to create this Channels object.

Source code in mass2/core/channels.py
507
508
509
def get_an_ljh_path(self) -> Path:
    """Return the path to a representative one of the LJH files used to create this Channels object."""
    return pathlib.Path(self.ch0.header.df["Filename"][0])

get_experiment_state_df(experiment_state_path=None)

Return a DataFrame containing experiment state information.

Parameters:
  • experiment_state_path (str | Path | None, default: None ) –

    load experiment state info from the given path or (if None) infer the path from an LJH file, by default None

Returns:
  • DataFrame –

    A Data Frame with a table of experiment state labels and the corresponding start time (in Polars timestamp format).

Source code in mass2/core/channels.py
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
def get_experiment_state_df(self, experiment_state_path: str | Path | None = None) -> pl.DataFrame:
    """Return a DataFrame containing experiment state information.

    Parameters
    ----------
    experiment_state_path : str | Path | None, optional
        load experiment state info from the given path or (if None) infer the path from an LJH file, by default None

    Returns
    -------
    pl.DataFrame
        A Data Frame with a table of experiment state labels and the corresponding start time (in Polars timestamp format).
    """
    if experiment_state_path is None:
        ljh_path = self.get_an_ljh_path()
        experiment_state_path = ljhutil.experiment_state_path_from_ljh_path(ljh_path)
    df = pl.read_csv(experiment_state_path, new_columns=["unixnano", "state_label"])
    df_es = df.select(pl.from_epoch("unixnano", time_unit="ns").dt.cast_time_unit("us").alias("timestamp"))

    # Strip leading/trailing whitespace from state labels. Convert string -> categorical
    df_labels = df.select(pl.col("state_label").str.strip_chars()).cast(pl.Categorical)
    df_es = df_es.with_columns(df_labels)
    return df_es

get_path_in_output_folder(filename)

Return a path in an output folder named like the run number, sibling to the LJH folder.

Source code in mass2/core/channels.py
511
512
513
514
515
516
517
518
def get_path_in_output_folder(self, filename: str | Path) -> Path:
    """Return a path in an output folder named like the run number, sibling to the LJH folder."""
    ljh_path = self.get_an_ljh_path()
    base_name, _ = ljh_path.name.split("_chan")
    date, run_num = base_name.split("_run")  # timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    output_dir = ljh_path.parent.parent / f"{run_num}mass2_output"
    output_dir.mkdir(parents=True, exist_ok=True)
    return output_dir / filename

linefit(line, col, use_expr=pl.lit(True), has_linear_background=False, has_tails=False, dlo=50, dhi=50, binsize=0.5, params_update=lmfit.Parameters())

Perform a fit to one spectral line in the coadded histogram of the given column.

Source code in mass2/core/channels.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def linefit(  # noqa: PLR0917
    self,
    line: float | str | SpectralLine | GenericLineModel,
    col: str,
    use_expr: pl.Expr = pl.lit(True),
    has_linear_background: bool = False,
    has_tails: bool = False,
    dlo: float = 50,
    dhi: float = 50,
    binsize: float = 0.5,
    params_update: lmfit.Parameters = lmfit.Parameters(),
) -> LineModelResult:
    """Perform a fit to one spectral line in the coadded histogram of the given column."""
    model = mass2.calibration.algorithms.get_model(line, has_linear_background=has_linear_background, has_tails=has_tails)
    pe = model.spect.peak_energy
    _bin_edges = np.arange(pe - dlo, pe + dhi, binsize)
    df_small = self.dfg().lazy().filter(use_expr).select(col).collect()
    bin_centers, counts = mass2.misc.hist_of_series(df_small[col], _bin_edges)
    params = model.guess(counts, bin_centers=bin_centers, dph_de=1)
    params["dph_de"].set(1.0, vary=False)
    print(f"before update {params=}")
    params = params.update(params_update)
    print(f"after update {params=}")
    result = model.fit(counts, params, bin_centers=bin_centers, minimum_bins_per_fwhm=3)
    result.set_label_hints(
        binsize=bin_centers[1] - bin_centers[0],
        ds_shortname=f"{len(self.channels)} channels, {self.description}",
        unit_str="eV",
        attr_str=col,
        states_hint=f"{use_expr=}",
        cut_hint="",
    )
    return result

linefit_joblib(line, col, prefer='threads', n_jobs=4)

No one but Galen understands this function.

Source code in mass2/core/channels.py
402
403
404
405
406
407
408
409
410
411
412
def linefit_joblib(self, line: str, col: str, prefer: str = "threads", n_jobs: int = 4) -> LineModelResult:
    """No one but Galen understands this function."""

    def work(key: int) -> LineModelResult:
        """A unit of parallel work: fit line to one channel."""
        channel = self.channels[key]
        return channel.linefit(line, col)

    parallel = joblib.Parallel(n_jobs=n_jobs, prefer=prefer)  # its not clear if threads are better.... what blocks the gil?
    results = parallel(joblib.delayed(work)(key) for key in self.channels.keys())
    return results

load_analysis(path) staticmethod

Load an analysis-in-progress from a zipfile

Parameters:
  • path (Path | str) –

    Zipfile that work was saved in.

Source code in mass2/core/channels.py
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
@staticmethod
def load_analysis(path: Path | str) -> "Channels":
    """Load an analysis-in-progress from a zipfile

    Parameters
    ----------
    path : Path | str
        Zipfile that work was saved in.
    """
    path = pathlib.Path(path)
    path.exists() and path.is_file()

    def _restore_dataframe(ch: Channel, df: pl.DataFrame) -> Channel:
        """Take a channel and replace its dataframe with the given one, loaded from a parquet file

        Parameters
        ----------
        ch : Channel
            A channel, loaded from a pickle file, with an empty dataframe
        df : DataFrame
            A replacement dataframe for the existing one (typically, the existing one is empty)

        Returns
        -------
        Channel
            The Channel `ch` but with `ch.df` updated, including any raw data backed by an LJH file
        """
        # If this channel was based on an LJH file, restore columns from the LJH file to the dataframe.
        if ch.header.data_source is not None:
            ljh_path = ch.header.data_source
            if ljh_path.endswith(".ljh") or ljh_path.endswith(".noi"):
                ljh_backed_chan = Channel.from_ljh(ljh_path)
                df = df.with_columns(ljh_backed_chan.df)
        # df_history is needed for some debug plots to work. This version has strictly more columns than required
        # at each history point. TODO: We could use the steps inputs and outputs to trim the appropriate columns.
        # For getting started, though, it's easier just to let each dataframe in history equal the final dataframe.
        df_history = [df] * len(ch.steps)
        return dataclasses.replace(ch, df=df, df_history=df_history)

    with ZipFile(path, "r") as zf:
        pickle_file = "data_all.pkl"
        pickle_bytes = zf.read(pickle_file)
        data: Channels = dill.loads(pickle_bytes)

        restored_channels = {}
        for ch_num, ch in data.channels.items():
            parquet_file = f"data_chan{ch_num:04d}.parquet"
            df = pl.read_parquet(zf.read(parquet_file))
            restored_channels[ch_num] = _restore_dataframe(ch, df)

        restored_bad_channels = {}
        for ch_num, badch in data.bad_channels.items():
            parquet_file = f"data_bad_chan{ch_num:04d}.parquet"
            df = pl.read_parquet(zf.read(parquet_file))
            ch = _restore_dataframe(badch.ch, df)
            restored_bad_channels[ch_num] = dataclasses.replace(badch, ch=ch)

        return dataclasses.replace(data, channels=restored_channels, bad_channels=restored_bad_channels)

load_recipes(filename)

Return a copy of this Channels object with Recipe objects loaded from the given pickle file and applied to each Channel.

Source code in mass2/core/channels.py
648
649
650
651
652
def load_recipes(self, filename: str) -> "Channels":
    """Return a copy of this Channels object with Recipe objects loaded from the given pickle file
    and applied to each Channel."""
    steps = mass2.misc.unpickle_object(filename)
    return self.with_steps_dict(steps)

map(f, allow_throw=False)

Map function f over all channels, returning a new Channels object containing the new Channel objects.

Source code in mass2/core/channels.py
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
def map(self, f: Callable, allow_throw: bool = False) -> "Channels":
    """Map function `f` over all channels, returning a new Channels object containing the new Channel objects."""
    new_channels = {}
    new_bad_channels = {}
    for key, channel in self.channels.items():
        try:
            new_channels[key] = f(channel)
        except KeyboardInterrupt as kint:
            raise kint
        except Exception as ex:
            error_type: type = type(ex)
            error_message: str = str(ex)
            backtrace: str = traceback.format_exc()
            if allow_throw:
                raise
            print(f"{key=} {channel=} failed the step {f}")
            print(f"{error_type=}")
            print(f"{error_message=}")
            new_bad_channels[key] = channel.as_bad(error_type, error_message, backtrace)
    new_bad_channels = mass2.misc.merge_dicts_ordered_by_keys(self.bad_channels, new_bad_channels)

    return Channels(new_channels, self.description, bad_channels=new_bad_channels)

parent_folder_path()

Return the parent folder of the LJH files used to create this Channels object. Specifically, the self.ch0 channel's directory is used (normally the answer would be the same for all channels).

Source code in mass2/core/channels.py
654
655
656
657
658
659
def parent_folder_path(self) -> pathlib.Path:
    """Return the parent folder of the LJH files used to create this Channels object. Specifically, the
    `self.ch0` channel's directory is used (normally the answer would be the same for all channels)."""
    parent_folder_path = pathlib.Path(self.ch0.header.df["Filename"][0]).parent.parent
    print(f"{parent_folder_path=}")
    return parent_folder_path

plot_avg_pulses(limit=20, channels=None, colormap=plt.cm.viridis, axis=None)

Plot the average pulses (the signal model) for the channels in this Channels object.

Parameters:
  • limit (int | None, default: 20 ) –

    Plot at most this many filters if not None, by default 20

  • channels (list[int] | None, default: None ) –

    Plot only channels with numbers in this list if not None, by default None

  • colormap (Colormap, default: viridis ) –

    The color scale to use, by default plt.cm.viridis

  • axis (Axes | None, default: None ) –

    A plt.Axes to plot on, or if None a new one, by default None

Returns:
  • Axes –

    The plt.Axes containing the plot.

Source code in mass2/core/channels.py
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
def plot_avg_pulses(
    self,
    limit: int | None = 20,
    channels: list[int] | None = None,
    colormap: matplotlib.colors.Colormap = plt.cm.viridis,
    axis: plt.Axes | None = None,
) -> plt.Axes:
    """Plot the average pulses (the signal model) for the channels in this Channels object.

    Parameters
    ----------
    limit : int | None, optional
        Plot at most this many filters if not None, by default 20
    channels : list[int] | None, optional
        Plot only channels with numbers in this list if not None, by default None
    colormap : matplotlib.colors.Colormap, optional
        The color scale to use, by default plt.cm.viridis
    axis : plt.Axes | None, optional
        A `plt.Axes` to plot on, or if None a new one, by default None

    Returns
    -------
    plt.Axes
        The `plt.Axes` containing the plot.
    """
    if axis is None:
        fig = plt.figure()
        axis = fig.subplots()

    plot_these_chan = self._limited_chan_list(limit, channels)
    frametime_ms = self.channels[plot_these_chan[0]].header.frametime_s * 1e3

    def samples2ms(s: ArrayLike) -> ArrayLike:
        return np.asarray(s) * frametime_ms

    def ms2samples(ms: ArrayLike) -> ArrayLike:
        return np.asarray(ms) / frametime_ms

    upper_axis = axis.secondary_xaxis("top", functions=(samples2ms, ms2samples))
    upper_axis.set_xlabel("Time after trigger (ms)")

    n_expected = len(plot_these_chan)
    for i, ch_num in enumerate(plot_these_chan):
        ch = self.channels[ch_num]
        x = np.arange(ch.header.n_samples) - ch.header.n_presamples
        y = ch.last_avg_pulse
        if y is not None:
            plt.plot(x, y, color=colormap(i / n_expected), label=f"Chan {ch_num}")
    plt.legend()
    plt.xlabel("Samples after trigger")
    plt.title("Average pulses")
    plt.tight_layout()

plot_filters(limit=20, channels=None, colormap=plt.cm.viridis, axis=None)

Plot the optimal filters for the channels in this Channels object.

Parameters:
  • limit (int | None, default: 20 ) –

    Plot at most this many filters if not None, by default 20

  • channels (list[int] | None, default: None ) –

    Plot only channels with numbers in this list if not None, by default None

  • colormap (Colormap, default: viridis ) –

    The color scale to use, by default plt.cm.viridis

  • axis (Axes | None, default: None ) –

    A plt.Axes to plot on, or if None a new one, by default None

Returns:
  • Axes –

    The plt.Axes containing the plot.

Source code in mass2/core/channels.py
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
def plot_filters(
    self,
    limit: int | None = 20,
    channels: list[int] | None = None,
    colormap: matplotlib.colors.Colormap = plt.cm.viridis,
    axis: plt.Axes | None = None,
) -> plt.Axes:
    """Plot the optimal filters for the channels in this Channels object.

    Parameters
    ----------
    limit : int | None, optional
        Plot at most this many filters if not None, by default 20
    channels : list[int] | None, optional
        Plot only channels with numbers in this list if not None, by default None
    colormap : matplotlib.colors.Colormap, optional
        The color scale to use, by default plt.cm.viridis
    axis : plt.Axes | None, optional
        A `plt.Axes` to plot on, or if None a new one, by default None

    Returns
    -------
    plt.Axes
        The `plt.Axes` containing the plot.
    """
    if axis is None:
        fig = plt.figure()
        axis = fig.subplots()

    plot_these_chan = self._limited_chan_list(limit, channels)
    n_expected = len(plot_these_chan)
    for i, ch_num in enumerate(plot_these_chan):
        ch = self.channels[ch_num]
        # The next line _assumes_ a 5-lag filter. Fix as needed.
        x = np.arange(ch.header.n_samples - 4) - ch.header.n_presamples + 2
        y = ch.last_filter
        if y is not None:
            plt.plot(x, y, color=colormap(i / n_expected), label=f"Chan {ch_num}")
    plt.legend()
    plt.xlabel("Samples after trigger")
    plt.title("Optimal filters")

plot_hist(col, bin_edges, use_expr=pl.lit(True), axis=None)

Plot a histogram for the given column across all channels.

Source code in mass2/core/channels.py
118
119
120
121
122
def plot_hist(self, col: str, bin_edges: ArrayLike, use_expr: pl.Expr = pl.lit(True), axis: plt.Axes | None = None) -> None:
    """Plot a histogram for the given column across all channels."""
    df_small = self.dfg().lazy().filter(use_expr).select(col).collect()
    ax = mass2.misc.plot_hist_of_series(df_small[col], bin_edges, axis)
    ax.set_title(f"{len(self.channels)} channels, {self.description}")

plot_hists(col, bin_edges, group_by_col, axis=None, use_expr=None, skip_none=True)

Plots histograms for the given column, grouped by the specified column.

Parameters: - col (str): The column name to plot. - bin_edges (array-like): The edges of the bins for the histogram. - group_by_col (str): The column name to group by. This is required. - axis (matplotlib.Axes, optional): The axis to plot on. If None, a new figure is created.

Source code in mass2/core/channels.py
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
def plot_hists(
    self,
    col: str,
    bin_edges: ArrayLike,
    group_by_col: bool,
    axis: plt.Axes | None = None,
    use_expr: pl.Expr | None = None,
    skip_none: bool = True,
) -> None:
    """
    Plots histograms for the given column, grouped by the specified column.

    Parameters:
    - col (str): The column name to plot.
    - bin_edges (array-like): The edges of the bins for the histogram.
    - group_by_col (str): The column name to group by. This is required.
    - axis (matplotlib.Axes, optional): The axis to plot on. If None, a new figure is created.
    """
    if axis is None:
        _, ax = plt.subplots()  # Create a new figure if no axis is provided
    else:
        ax = axis

    if use_expr is None:
        df_small = (self.dfg().lazy().select(col, group_by_col)).collect().sort(group_by_col, descending=False)
    else:
        df_small = (self.dfg().lazy().filter(use_expr).select(col, group_by_col)).collect().sort(group_by_col, descending=False)

    # Plot a histogram for each group
    for (group_name,), group_data in df_small.group_by(group_by_col, maintain_order=True):
        if group_name is None and skip_none:
            continue
        # Get the data for the column to plot
        values = group_data[col]
        # Plot the histogram for the current group
        if group_name == "EBIT":
            ax.hist(values, bins=bin_edges, alpha=0.9, color="k", label=str(group_name))
        else:
            ax.hist(values, bins=bin_edges, alpha=0.5, label=str(group_name))
        # bin_centers, counts = mass2.misc.hist_of_series(values, bin_edges)
        # plt.plot(bin_centers, counts, label=group_name)

    # Customize the plot
    ax.set_xlabel(str(col))
    ax.set_ylabel("Frequency")
    ax.set_title(f"Coadded Histogram of {col} grouped by {group_by_col}")

    # Add a legend to label the groups
    ax.legend(title=group_by_col)

    plt.tight_layout()

plot_noise_autocorr(limit=20, channels=None, colormap=plt.cm.viridis, axis=None)

Plot the noise power autocorrelation for the channels in this Channels object.

Parameters:
  • limit (int | None, default: 20 ) –

    Plot at most this many filters if not None, by default 20

  • channels (list[int] | None, default: None ) –

    Plot only channels with numbers in this list if not None, by default None

  • colormap (Colormap, default: viridis ) –

    The color scale to use, by default plt.cm.viridis

  • axis (Axes | None, default: None ) –

    A plt.Axes to plot on, or if None a new one, by default None

Returns:
  • Axes –

    The plt.Axes containing the plot.

Source code in mass2/core/channels.py
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
def plot_noise_autocorr(
    self,
    limit: int | None = 20,
    channels: list[int] | None = None,
    colormap: matplotlib.colors.Colormap = plt.cm.viridis,
    axis: plt.Axes | None = None,
) -> plt.Axes:
    """Plot the noise power autocorrelation for the channels in this Channels object.

    Parameters
    ----------
    limit : int | None, optional
        Plot at most this many filters if not None, by default 20
    channels : list[int] | None, optional
        Plot only channels with numbers in this list if not None, by default None
    colormap : matplotlib.colors.Colormap, optional
        The color scale to use, by default plt.cm.viridis
    axis : plt.Axes | None, optional
        A `plt.Axes` to plot on, or if None a new one, by default None

    Returns
    -------
    plt.Axes
        The `plt.Axes` containing the plot.
    """
    if axis is None:
        fig = plt.figure()
        axis = fig.subplots()

    plot_these_chan = self._limited_chan_list(limit, channels)
    n_expected = len(plot_these_chan)
    for i, ch_num in enumerate(plot_these_chan):
        ch = self.channels[ch_num]
        ac = ch.last_noise_autocorrelation
        if ac is not None:
            color = colormap(i / n_expected)
            plt.plot(ac, color=color, label=f"Chan {ch_num}")
            plt.plot(0, ac[0], "o", color=color)
    plt.legend()
    plt.xlabel("Lags")
    plt.title("Noise autocorrelation")

plot_noise_spectrum(limit=20, channels=None, colormap=plt.cm.viridis, axis=None)

Plot the noise power spectrum for the channels in this Channels object.

Parameters:
  • limit (int | None, default: 20 ) –

    Plot at most this many filters if not None, by default 20

  • channels (list[int] | None, default: None ) –

    Plot only channels with numbers in this list if not None, by default None

  • colormap (Colormap, default: viridis ) –

    The color scale to use, by default plt.cm.viridis

  • axis (Axes | None, default: None ) –

    A plt.Axes to plot on, or if None a new one, by default None

Returns:
  • Axes –

    The plt.Axes containing the plot.

Source code in mass2/core/channels.py
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
def plot_noise_spectrum(
    self,
    limit: int | None = 20,
    channels: list[int] | None = None,
    colormap: matplotlib.colors.Colormap = plt.cm.viridis,
    axis: plt.Axes | None = None,
) -> plt.Axes:
    """Plot the noise power spectrum for the channels in this Channels object.

    Parameters
    ----------
    limit : int | None, optional
        Plot at most this many filters if not None, by default 20
    channels : list[int] | None, optional
        Plot only channels with numbers in this list if not None, by default None
    colormap : matplotlib.colors.Colormap, optional
        The color scale to use, by default plt.cm.viridis
    axis : plt.Axes | None, optional
        A `plt.Axes` to plot on, or if None a new one, by default None

    Returns
    -------
    plt.Axes
        The `plt.Axes` containing the plot.
    """
    if axis is None:
        fig = plt.figure()
        axis = fig.subplots()

    plot_these_chan = self._limited_chan_list(limit, channels)
    n_expected = len(plot_these_chan)
    for i, ch_num in enumerate(plot_these_chan):
        ch = self.channels[ch_num]
        freqpsd = ch.last_noise_psd
        if freqpsd is not None:
            freq, psd = freqpsd
            plt.plot(freq, psd, color=colormap(i / n_expected), label=f"Chan {ch_num}")
    plt.legend()
    plt.loglog()
    plt.xlabel("Frequency (Hz)")
    plt.title("Noise power spectral density")

save_analysis(zip_path, overwrite=False, trim_debug=False, trim_timestamp_and_subframecount=False)

Save an analysis-in-progress completely to a zip file, only tested for ljh backed channels so far

Parameters:
  • path (Path | str) –

    Directory to save work in. If it doesn't exist, its parent should.

  • overwrite (bool, default: False ) –

    If path exists, whether to overwrite it, by default False

  • trim_debug (bool, default: False ) –

    Whether to make save file smaller (potentially) at the cost of breaking some debugging plots, by default False

  • trim_timestamp_and_subframecount (bool, default: False ) –

    Whether to make the save file smaller at the cost of reduced information avaialble when the ljh files are not available, eg when loading on another computer.

Source code in mass2/core/channels.py
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
def save_analysis(
    self, zip_path: Path | str, overwrite: bool = False, trim_debug: bool = False, trim_timestamp_and_subframecount: bool = False
) -> None:
    """Save an analysis-in-progress completely to a zip file, only tested for ljh backed channels so far

    Parameters
    ----------
    path : Path | str
        Directory to save work in. If it doesn't exist, its parent should.
    overwrite : bool, optional
        If `path` exists, whether to overwrite it, by default False
    trim_debug : bool, optional
        Whether to make save file smaller (potentially) at the cost of breaking some debugging plots, by default False
    trim_timestamp_and_subframecount: bool, optional
        Whether to make the save file smaller at the cost of reduced information avaialble when the ljh files are
        not available, eg when loading on another computer.
    """
    zip_path = pathlib.Path(zip_path)
    if zip_path.suffix != ".zip":
        zip_path = zip_path.with_suffix(".zip")

    if os.path.exists(zip_path) and not overwrite:
        raise ValueError(f"File exists; use `save_analysis(...overwrite=True)` to overwrite the existing {zip_path=}")

    def store_dataframe_to_parquet_and_return_pickleable_channel(ch: Channel, zf: ZipFile, parquet_path: str) -> Channel:
        """Store the `ch.df` to a parquet file of the given name in the ZipFile (open for writing).
        Prepare `ch` for pickling by removing its dataframe and dataframe history, and stripping debug info from `steps`

        Parameters
        ----------
        ch : Channel
            The channel to modify by storing dataframe to parquet and removing it.
        zf : ZipFile
            A ZipFile object currently open for writing.
        parquet_file : str
            The name to use for storing the parquet file within `zf`

        Returns
        -------
        Channel
            A copy of `ch` amenable to pickling with the dataframe and dataframe history removed and with trimmed steps.
        """
        # Don't store the memmapped LJH pulse info (if present) in the Parquet file
        if trim_timestamp_and_subframecount:
            df = ch.df.drop("pulse", "timestamp", "subframecount", strict=False)
        else:
            df = ch.df.drop("pulse", strict=False)
        buffer = io.BytesIO()
        df.write_parquet(buffer)
        zf.writestr(parquet_path, buffer.getvalue())
        if trim_debug:
            steps = ch.steps.trim_debug_info()
        else:
            steps = ch.steps
        return dataclasses.replace(ch, df=pl.DataFrame(), df_history=[], noise=None, steps=steps)

    with ZipFile(str(zip_path), "w") as zf:
        channels = {}
        bad_channels = {}
        for ch_num, ch in self.channels.items():
            parquet_path = f"data_chan{ch_num:04d}.parquet"
            channels[ch_num] = store_dataframe_to_parquet_and_return_pickleable_channel(ch, zf, parquet_path)
        for ch_num, badch in self.bad_channels.items():
            parquet_path = f"data_bad_chan{ch_num:04d}.parquet"
            ch = store_dataframe_to_parquet_and_return_pickleable_channel(badch.ch, zf, parquet_path)
            bad_channels[ch_num] = dataclasses.replace(badch, ch=ch)
        data = dataclasses.replace(self, channels=channels, bad_channels=bad_channels)
        pickle_file = "data_all.pkl"
        zf.writestr(pickle_file, dill.dumps(data))

save_recipes(filename, required_fields=None, drop_debug=True)

Pickle a dictionary (one entry per channel) of Recipe objects.

If you want to save a "recipe", a minimal series of steps required to reproduce the required field(s), then set required_fields to be a list/tuple/set of DataFrame column names (or a single column name) whose production from raw data should be possible.

Parameters:
  • filename (str) –

    Filename to store recipe in, typically of the form "*.pkl"

  • required_fields (str | Iterable[str] | None, default: None ) –

    The field (str) or fields (Iterable[str]) that the recipe should be able to generate from a raw LJH file. Drop all steps that do not lead (directly or indireactly) to producing this field or these fields. If None, then preserve all steps (default None).

  • drop_debug (bool, default: True ) –

    Whether to remove debugging-related data from each RecipeStep, if the subclass supports this (via the `RecipeStep.drop_debug() method).

Returns:
  • dict –

    Dictionary with keys=channel numbers, values=the (possibly trimmed and debug-dropped) Recipe objects.

Source code in mass2/core/channels.py
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
def save_recipes(
    self, filename: str, required_fields: str | Iterable[str] | None = None, drop_debug: bool = True
) -> dict[int, Recipe]:
    """Pickle a dictionary (one entry per channel) of Recipe objects.

    If you want to save a "recipe", a minimal series of steps required to reproduce the required field(s),
    then set `required_fields` to be a list/tuple/set of DataFrame column names (or a single column name)
    whose production from raw data should be possible.

    Parameters
    ----------
    filename : str
        Filename to store recipe in, typically of the form "*.pkl"
    required_fields : str | Iterable[str] | None
        The field (str) or fields (Iterable[str]) that the recipe should be able to generate from a raw LJH file.
        Drop all steps that do not lead (directly or indireactly) to producing this field or these fields.
        If None, then preserve all steps (default None).
    drop_debug: bool
        Whether to remove debugging-related data from each `RecipeStep`, if the subclass supports this (via the
        `RecipeStep.drop_debug() method).

    Returns
    -------
    dict
        Dictionary with keys=channel numbers, values=the (possibly trimmed and debug-dropped) Recipe objects.
    """
    steps = {}
    for channum, ch in self.channels.items():
        steps[channum] = ch.steps.trim_dead_ends(required_fields=required_fields, drop_debug=drop_debug)
    mass2.misc.pickle_object(steps, filename)
    return steps

set_bad(ch_num, msg, require_ch_num_exists=True)

Return a copy of this Channels object with the given channel number marked as bad.

Source code in mass2/core/channels.py
389
390
391
392
393
394
395
396
397
398
399
400
def set_bad(self, ch_num: int, msg: str, require_ch_num_exists: bool = True) -> "Channels":
    """Return a copy of this Channels object with the given channel number marked as bad."""
    new_channels = {}
    new_bad_channels = {}
    if require_ch_num_exists:
        assert ch_num in self.channels.keys(), f"{ch_num} can't be set bad because it does not exist"
    for key, channel in self.channels.items():
        if key == ch_num:
            new_bad_channels[key] = channel.as_bad(None, msg, None)
        else:
            new_channels[key] = channel
    return Channels(new_channels, self.description, bad_channels=new_bad_channels)

with_experiment_state_by_path(experiment_state_path=None)

Return a copy of this Channels object with experiment state information added to each Channel.

Parameters:
  • experiment_state_path (str | Path | None, default: None ) –

    load experiment state info from the given path or (if None) infer the path from an LJH file, by default None

Returns:
  • Channels –

    An enhanced copy of self, with experiment state information added to each channel.

Source code in mass2/core/channels.py
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
def with_experiment_state_by_path(self, experiment_state_path: str | None = None) -> "Channels":
    """Return a copy of this Channels object with experiment state information added to each Channel.

    Parameters
    ----------
    experiment_state_path : str | Path | None, optional
        load experiment state info from the given path or (if None) infer the path from an LJH file, by default None

    Returns
    -------
    Channels
        An enhanced copy of self, with experiment state information added to each channel.
    """
    df_es = self.get_experiment_state_df(experiment_state_path)
    return self.with_experiment_state_df(df_es)

with_experiment_state_df(df_es)

Return a copy of this Channels object with experiment state information added to each Channel.

Parameters:
  • df_es (DataFrame) –

    experiment state info

Returns:
  • Channels –

    An enhanced copy of self, with experiment state information added to each channel.

Source code in mass2/core/channels.py
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
def with_experiment_state_df(self, df_es: pl.DataFrame) -> "Channels":
    """Return a copy of this Channels object with experiment state information added to each Channel.

    Parameters
    ----------
    df_es : pl.DataFrame
        experiment state info

    Returns
    -------
    Channels
        An enhanced copy of self, with experiment state information added to each channel.
    """
    # TODO: the following is less performant than making a use_expr for each state,
    # and using .set_sorted on the timestamp column.
    ch2s = {}
    for ch_num, ch in self.channels.items():
        ch2s[ch_num] = ch.with_experiment_state_df(df_es)
    return Channels(ch2s, self.description)

with_external_trigger_by_path(path=None)

Return a copy of this Channels object with external trigger information added, loaded from the given path or EVENTUALLY (if None) inferring it from an LJH file (not yet implemented).

Source code in mass2/core/channels.py
580
581
582
583
584
585
586
587
588
589
590
591
def with_external_trigger_by_path(self, path: str | pathlib.Path | None = None) -> "Channels":
    """Return a copy of this Channels object with external trigger information added, loaded
    from the given path or EVENTUALLY (if None) inferring it from an LJH file (not yet implemented)."""
    if path is None:
        raise NotImplementedError("cannot infer external trigger path yet")
    with open(path, "rb") as _f:
        _header_line = _f.readline()  # read the one header line before opening the binary data
        external_trigger_subframe_count = np.fromfile(_f, "int64")
    df_ext = pl.DataFrame({
        "subframecount": external_trigger_subframe_count,
    })
    return self.with_external_trigger_df(df_ext)

with_external_trigger_df(df_ext)

Return a copy of this Channels object with external trigger information added to each Channel, found from the given DataFrame.

Source code in mass2/core/channels.py
593
594
595
596
597
598
599
600
601
def with_external_trigger_df(self, df_ext: pl.DataFrame) -> "Channels":
    """Return a copy of this Channels object with external trigger information added to each Channel,
    found from the given DataFrame."""

    def with_etrig_df(channel: Channel) -> Channel:
        """Return a copy of one Channel object with external trigger information added to it"""
        return channel.with_external_trigger_df(df_ext)

    return self.map(with_etrig_df)

with_more_channels(more)

Return a Channels object with additional Channels in it. New channels with the same number will overrule existing ones.

Parameters:
  • more (Channels) –

    Another Channels object, to be added

Returns:
Source code in mass2/core/channels.py
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
def with_more_channels(self, more: "Channels") -> "Channels":
    """Return a Channels object with additional Channels in it.
    New channels with the same number will overrule existing ones.

    Parameters
    ----------
    more : Channels
        Another Channels object, to be added

    Returns
    -------
    Channels
        The replacement
    """
    channels = self.channels.copy()
    channels.update(more.channels)
    bad = self.bad_channels.copy()
    bad.update(more.bad_channels)
    descr = self.description + more.description + "\nWarning! created by with_more_channels()"
    return dataclasses.replace(self, channels=channels, bad_channels=bad, description=descr)

with_steps_dict(steps_dict)

Return a copy of this Channels object with the given Recipe objects added to each Channel.

Source code in mass2/core/channels.py
603
604
605
606
607
608
609
610
611
612
613
614
def with_steps_dict(self, steps_dict: dict[int, Recipe]) -> "Channels":
    """Return a copy of this Channels object with the given Recipe objects added to each Channel."""

    def load_recipes(channel: Channel) -> Channel:
        """Return a copy of one Channel object with Recipe steps added to it"""
        try:
            steps = steps_dict[channel.header.ch_num]
        except KeyError:
            raise Exception("steps dict did not contain steps for this ch_num")
        return channel.with_steps(steps)

    return self.map(load_recipes)

mass2.core.analysis_algorithms - main algorithms used in data analysis

Designed to abstract certain key algorithms out of the class MicrocalDataSet and be able to run them fast.

Created on Jun 9, 2014

@author: fowlerj

HistogramSmoother

Object that can repeatedly smooth histograms with the same bin count and width to the same Gaussian width. By pre-computing the smoothing kernel for that histogram, we can smooth multiple histograms with the same geometry.

Source code in mass2/core/analysis_algorithms.py
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
class HistogramSmoother:
    """Object that can repeatedly smooth histograms with the same bin count and
    width to the same Gaussian width.  By pre-computing the smoothing kernel for
    that histogram, we can smooth multiple histograms with the same geometry.
    """

    def __init__(self, smooth_sigma: float, limits: ArrayLike):
        """Give the smoothing Gaussian's width as <smooth_sigma> and the
        [lower,upper] histogram limits as <limits>."""

        self.limits = tuple(np.asarray(limits, dtype=float))
        self.smooth_sigma = smooth_sigma

        # Choose a reasonable # of bins, at least 1024 and a power of 2
        stepsize = 0.4 * smooth_sigma
        dlimits = self.limits[1] - self.limits[0]
        nbins_guess = int(dlimits / stepsize + 0.5)
        min_nbins = 1024
        max_nbins = 32768  # 32k bins, 2**15

        # Clamp nbins_guess to at least min_nbins
        clamped_nbins = np.clip(nbins_guess, min_nbins, max_nbins)
        nbins_forced_to_power_of_2 = int(2 ** np.ceil(np.log2(clamped_nbins)))
        # if nbins_forced_to_power_of_2 == max_nbins:
        #     print(f"Warning: HistogramSmoother (for drift correct) Limiting histogram bins to {max_nbins} (requested {nbins_guess})")
        self.nbins = nbins_forced_to_power_of_2
        self.stepsize = dlimits / self.nbins

        # Compute the Fourier-space smoothing kernel
        kernel = np.exp(-0.5 * (np.arange(self.nbins) * self.stepsize / self.smooth_sigma) ** 2)
        kernel[1:] += kernel[-1:0:-1]  # Handle the negative frequencies
        kernel /= kernel.sum()
        self.kernel_ft = np.fft.rfft(kernel)

    def __call__(self, values: ArrayLike) -> NDArray:
        """Return a smoothed histogram of the data vector <values>"""
        contents, _ = np.histogram(values, self.nbins, self.limits)
        ftc = np.fft.rfft(contents)
        csmooth = np.fft.irfft(self.kernel_ft * ftc)
        csmooth[csmooth < 0] = 0
        return csmooth

__call__(values)

Return a smoothed histogram of the data vector

Source code in mass2/core/analysis_algorithms.py
215
216
217
218
219
220
221
def __call__(self, values: ArrayLike) -> NDArray:
    """Return a smoothed histogram of the data vector <values>"""
    contents, _ = np.histogram(values, self.nbins, self.limits)
    ftc = np.fft.rfft(contents)
    csmooth = np.fft.irfft(self.kernel_ft * ftc)
    csmooth[csmooth < 0] = 0
    return csmooth

__init__(smooth_sigma, limits)

Give the smoothing Gaussian's width as and the [lower,upper] histogram limits as .

Source code in mass2/core/analysis_algorithms.py
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
def __init__(self, smooth_sigma: float, limits: ArrayLike):
    """Give the smoothing Gaussian's width as <smooth_sigma> and the
    [lower,upper] histogram limits as <limits>."""

    self.limits = tuple(np.asarray(limits, dtype=float))
    self.smooth_sigma = smooth_sigma

    # Choose a reasonable # of bins, at least 1024 and a power of 2
    stepsize = 0.4 * smooth_sigma
    dlimits = self.limits[1] - self.limits[0]
    nbins_guess = int(dlimits / stepsize + 0.5)
    min_nbins = 1024
    max_nbins = 32768  # 32k bins, 2**15

    # Clamp nbins_guess to at least min_nbins
    clamped_nbins = np.clip(nbins_guess, min_nbins, max_nbins)
    nbins_forced_to_power_of_2 = int(2 ** np.ceil(np.log2(clamped_nbins)))
    # if nbins_forced_to_power_of_2 == max_nbins:
    #     print(f"Warning: HistogramSmoother (for drift correct) Limiting histogram bins to {max_nbins} (requested {nbins_guess})")
    self.nbins = nbins_forced_to_power_of_2
    self.stepsize = dlimits / self.nbins

    # Compute the Fourier-space smoothing kernel
    kernel = np.exp(-0.5 * (np.arange(self.nbins) * self.stepsize / self.smooth_sigma) ** 2)
    kernel[1:] += kernel[-1:0:-1]  # Handle the negative frequencies
    kernel /= kernel.sum()
    self.kernel_ft = np.fft.rfft(kernel)

compute_max_deriv(pulse_data, ignore_leading, spike_reject=True, kernel=None)

Computes the maximum derivative in timeseries . can be a 2D array where each row is a different pulse record, in which case the return value will be an array last long as the number of rows in .

Args: pulse_data: ignore_leading: spike_reject: (default True) kernel: the linear filter against which the signals will be convolved (CONVOLED, not correlated, so reverse the filter as needed). If None, then the default kernel of [+.2 +.1 0 -.1 -.2] will be used. If "SG", then the cubic 5-point Savitzky-Golay filter will be used (see below). Otherwise, kernel needs to be a (short) array which will be converted to a 1xN 2-dimensional np.ndarray. (default None)

Returns: An np.ndarray, dimension 1: the value of the maximum derivative (units of per sample).

When kernel=="SG", then we estimate the derivative by Savitzky-Golay filtering (with 1 point before/3 points after the point in question and fitting polynomial of order 3). Find the right general area by first doing a simple difference.

Source code in mass2/core/analysis_algorithms.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
def compute_max_deriv(
    pulse_data: ArrayLike, ignore_leading: int, spike_reject: bool = True, kernel: ArrayLike | str | None = None
) -> NDArray:
    """Computes the maximum derivative in timeseries <pulse_data>.
    <pulse_data> can be a 2D array where each row is a different pulse record, in which case
    the return value will be an array last long as the number of rows in <pulse_data>.

    Args:
        pulse_data:
        ignore_leading:
        spike_reject: (default True)
        kernel: the linear filter against which the signals will be convolved
            (CONVOLED, not correlated, so reverse the filter as needed). If None,
            then the default kernel of [+.2 +.1 0 -.1 -.2] will be used. If
            "SG", then the cubic 5-point Savitzky-Golay filter will be used (see
            below). Otherwise, kernel needs to be a (short) array which will
            be converted to a 1xN 2-dimensional np.ndarray. (default None)

    Returns:
        An np.ndarray, dimension 1: the value of the maximum derivative (units of <pulse_data units> per sample).

    When kernel=="SG", then we estimate the derivative by Savitzky-Golay filtering
    (with 1 point before/3 points after the point in question and fitting polynomial
    of order 3).  Find the right general area by first doing a simple difference.
    """

    # If pulse_data is a 1D array, turn it into 2
    pulse_data = np.asarray(pulse_data)
    ndim = len(pulse_data.shape)
    if ndim > 2 or ndim < 1:
        raise ValueError("input pulse_data should be a 1d or 2d array.")
    if ndim == 1:
        pulse_data.shape = (1, pulse_data.shape[0])
    pulse_view = pulse_data[:, ignore_leading:]
    NPulse = pulse_view.shape[0]
    NSamp = pulse_view.shape[1]

    # The default filter:
    filter_coef = np.array([+0.2, +0.1, 0, -0.1, -0.2])
    if kernel == "SG":
        # This filter is the Savitzky-Golay filter of n_L=1, n_R=3 and M=3, to use the
        # language of Numerical Recipes 3rd edition.  It amounts to least-squares fitting
        # of an M=3rd order polynomial to the five points [-1,+3] and
        # finding the slope of the polynomial at 0.
        # Note that we reverse the order of coefficients because convolution will re-reverse
        filter_coef = np.array([-0.45238, -0.02381, 0.28571, 0.30952, -0.11905])[::-1]

    elif kernel is not None:
        filter_coef = np.array(kernel).ravel()

    f0, f1, f2, f3, f4 = filter_coef

    max_deriv = np.zeros(NPulse, dtype=np.float64)

    if spike_reject:
        for i in range(NPulse):
            pulses = pulse_view[i]
            t0 = f4 * pulses[0] + f3 * pulses[1] + f2 * pulses[2] + f1 * pulses[3] + f0 * pulses[4]
            t1 = f4 * pulses[1] + f3 * pulses[2] + f2 * pulses[3] + f1 * pulses[4] + f0 * pulses[5]
            t2 = f4 * pulses[2] + f3 * pulses[3] + f2 * pulses[4] + f1 * pulses[5] + f0 * pulses[6]
            t_max_deriv = t2 if t2 < t0 else t0

            for j in range(7, NSamp):
                t3 = f4 * pulses[j - 4] + f3 * pulses[j - 3] + f2 * pulses[j - 2] + f1 * pulses[j - 1] + f0 * pulses[j]
                t4 = t3 if t3 < t1 else t1
                t_max_deriv = max(t4, t_max_deriv)

                t0, t1, t2 = t1, t2, t3

            max_deriv[i] = t_max_deriv
    else:
        for i in range(NPulse):
            pulses = pulse_view[i]
            t0 = f4 * pulses[0] + f3 * pulses[1] + f2 * pulses[2] + f1 * pulses[3] + f0 * pulses[4]
            t_max_deriv = t0

            for j in range(5, NSamp):
                t0 = f4 * pulses[j - 4] + f3 * pulses[j - 3] + f2 * pulses[j - 2] + f1 * pulses[j - 1] + f0 * pulses[j]
                t_max_deriv = max(t0, t_max_deriv)
            max_deriv[i] = t_max_deriv

    return np.asarray(max_deriv, dtype=np.float32)

correct_flux_jumps(vals, mask, flux_quant)

Remove 'flux' jumps' from pretrigger mean.

When using umux readout, if a pulse is recorded that has a very fast rising edge (e.g. a cosmic ray), the readout system will "slip" an integer number of flux quanta. This means that the baseline level returned to after the pulse will different from the pretrigger value by an integer number of flux quanta. This causes that pretrigger mean summary quantity to jump around in a way that causes trouble for the rest of MASS. This function attempts to correct these jumps.

Arguments: vals -- array of values to correct mask -- mask indentifying "good" pulses flux_quant -- size of 1 flux quanta

Returns: Array with values corrected

Source code in mass2/core/analysis_algorithms.py
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
def correct_flux_jumps(vals: ArrayLike, mask: ArrayLike, flux_quant: float) -> NDArray:
    """Remove 'flux' jumps' from pretrigger mean.

    When using umux readout, if a pulse is recorded that has a very fast rising
    edge (e.g. a cosmic ray), the readout system will "slip" an integer number
    of flux quanta. This means that the baseline level returned to after the
    pulse will different from the pretrigger value by an integer number of flux
    quanta. This causes that pretrigger mean summary quantity to jump around in
    a way that causes trouble for the rest of MASS. This function attempts to
    correct these jumps.

    Arguments:
    vals -- array of values to correct
    mask -- mask indentifying "good" pulses
    flux_quant -- size of 1 flux quanta

    Returns:
    Array with values corrected
    """
    return unwrap_n(vals, flux_quant, mask)

correct_flux_jumps_original(vals, mask, flux_quant)

Remove 'flux' jumps' from pretrigger mean.

When using umux readout, if a pulse is recorded that has a very fast rising edge (e.g. a cosmic ray), the readout system will "slip" an integer number of flux quanta. This means that the baseline level returned to after the pulse will different from the pretrigger value by an integer number of flux quanta. This causes that pretrigger mean summary quantity to jump around in a way that causes trouble for the rest of MASS. This function attempts to correct these jumps.

Arguments: vals -- array of values to correct mask -- mask indentifying "good" pulses flux_quant -- size of 1 flux quanta

Returns: Array with values corrected

Source code in mass2/core/analysis_algorithms.py
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
def correct_flux_jumps_original(vals: ArrayLike, mask: ArrayLike, flux_quant: float) -> NDArray:
    """Remove 'flux' jumps' from pretrigger mean.

    When using umux readout, if a pulse is recorded that has a very fast rising
    edge (e.g. a cosmic ray), the readout system will "slip" an integer number
    of flux quanta. This means that the baseline level returned to after the
    pulse will different from the pretrigger value by an integer number of flux
    quanta. This causes that pretrigger mean summary quantity to jump around in
    a way that causes trouble for the rest of MASS. This function attempts to
    correct these jumps.

    Arguments:
    vals -- array of values to correct
    mask -- mask indentifying "good" pulses
    flux_quant -- size of 1 flux quanta

    Returns:
    Array with values corrected
    """
    # The naive thing is to simply replace each value with its value mod
    # the flux quantum. But of the baseline value turns out to fluctuate
    # about an integer number of flux quanta, this will introduce new
    # jumps. I don't know the best way to handle this in general. For now,
    # if there are still jumps after the mod, I add 1/4 of a flux quanta
    # before modding, then mod, then subtract the 1/4 flux quantum and then
    # *add* a single flux quantum so that the values never go negative.
    #
    # To determine whether there are "still jumps after the mod" I look at the
    # difference between the largest and smallest values for "good" pulses. If
    # you don't exclude "bad" pulses, this check can be tricked in cases where
    # the pretrigger section contains a (sufficiently large) tail.
    vals = np.asarray(vals)
    mask = np.asarray(mask)
    if (np.amax(vals) - np.amin(vals)) >= flux_quant:
        corrected = vals % flux_quant
        if (np.amax(corrected[mask]) - np.amin(corrected[mask])) > 0.75 * flux_quant:
            corrected = (vals + flux_quant / 4) % (flux_quant)
            corrected = corrected - flux_quant / 4 + flux_quant
        corrected -= corrected[0] - vals[0]
        return corrected
    else:
        return vals

drift_correct(indicator, uncorrected, limit=None)

Compute a drift correction that minimizes the spectral entropy.

Args: indicator: The "x-axis", which indicates the size of the correction. uncorrected: A filtered pulse height vector. Same length as indicator. Assumed to have some gain that is linearly related to indicator. limit: The upper limit of uncorrected values over which entropy is computed (default None).

Generally indicator will be the pretrigger mean of the pulses, but you can experiment with other choices.

The entropy will be computed on corrected values only in the range [0, limit], so limit should be set to a characteristic large value of uncorrected. If limit is None (the default), then in will be compute as 25% larger than the 99%ile point of uncorrected.

The model is that the filtered pulse height PH should be scaled by (1 + a*PTM) where a is an arbitrary parameter computed here, and PTM is the difference between each record's pretrigger mean and the median value of all pretrigger means. (Or replace "pretrigger mean" with whatever quantity you passed in as .)

Source code in mass2/core/analysis_algorithms.py
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
def drift_correct(indicator: ArrayLike, uncorrected: ArrayLike, limit: float | None = None) -> tuple[float, dict]:
    """Compute a drift correction that minimizes the spectral entropy.

    Args:
        indicator: The "x-axis", which indicates the size of the correction.
        uncorrected: A filtered pulse height vector. Same length as indicator.
            Assumed to have some gain that is linearly related to indicator.
        limit: The upper limit of uncorrected values over which entropy is
            computed (default None).

    Generally indicator will be the pretrigger mean of the pulses, but you can
    experiment with other choices.

    The entropy will be computed on corrected values only in the range
    [0, limit], so limit should be set to a characteristic large value of
    uncorrected. If limit is None (the default), then in will be compute as
    25% larger than the 99%ile point of uncorrected.

    The model is that the filtered pulse height PH should be scaled by (1 +
    a*PTM) where a is an arbitrary parameter computed here, and PTM is the
    difference between each record's pretrigger mean and the median value of all
    pretrigger means. (Or replace "pretrigger mean" with whatever quantity you
    passed in as <indicator>.)
    """
    uncorrected = np.asarray(uncorrected)
    indicator = np.array(indicator)  # make a copy
    ptm_offset = np.median(indicator)
    indicator -= ptm_offset

    if limit is None:
        pct99 = np.percentile(uncorrected, 99)
        limit = 1.25 * pct99

    smoother = HistogramSmoother(0.5, [0, limit])
    assert smoother.nbins < 1e6, "will be crazy slow, should not be possible"

    def entropy(param: NDArray, indicator: NDArray, uncorrected: NDArray, smoother: HistogramSmoother) -> float:
        """Return the entropy of the drift-corrected values"""
        corrected = uncorrected * (1 + indicator * param)
        hsmooth = smoother(corrected)
        w = hsmooth > 0
        return -(np.log(hsmooth[w]) * hsmooth[w]).sum()

    drift_corr_param = sp.optimize.brent(entropy, (indicator, uncorrected, smoother), brack=[0, 0.001])

    drift_correct_info = {"type": "ptmean_gain", "slope": drift_corr_param, "median_pretrig_mean": ptm_offset}
    return drift_corr_param, drift_correct_info

estimateRiseTime(pulse_data, timebase, nPretrig)

Computes the rise time of timeseries , where the time steps are . can be a 2D array where each row is a different pulse record, in which case the return value will be an array last long as the number of rows in .

If nPretrig >= 4, then the samples pulse_data[:nPretrig] are averaged to estimate the baseline. Otherwise, the minimum of pulse_data is assumed to be the baseline.

Specifically, take the first and last of the rising points in the range of 10% to 90% of the peak value, interpolate a line between the two, and use its slope to find the time to rise from 0 to the peak.

Args: pulse_data: An np.ndarray of dimension 1 (a single pulse record) or 2 (an array with each row being a pulse record). timebase: The sampling time. nPretrig: The number of samples that are recorded before the trigger.

Returns: An ndarray of dimension 1, giving the rise times.

Source code in mass2/core/analysis_algorithms.py
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
@njit
def estimateRiseTime(pulse_data: ArrayLike, timebase: float, nPretrig: int) -> NDArray:
    """Computes the rise time of timeseries <pulse_data>, where the time steps are <timebase>.
    <pulse_data> can be a 2D array where each row is a different pulse record, in which case
    the return value will be an array last long as the number of rows in <pulse_data>.

    If nPretrig >= 4, then the samples pulse_data[:nPretrig] are averaged to estimate
    the baseline.  Otherwise, the minimum of pulse_data is assumed to be the baseline.

    Specifically, take the first and last of the rising points in the range of
    10% to 90% of the peak value, interpolate a line between the two, and use its
    slope to find the time to rise from 0 to the peak.

    Args:
        pulse_data: An np.ndarray of dimension 1 (a single pulse record) or 2 (an
            array with each row being a pulse record).
        timebase: The sampling time.
        nPretrig: The number of samples that are recorded before the trigger.

    Returns:
        An ndarray of dimension 1, giving the rise times.
    """
    MINTHRESH, MAXTHRESH = 0.1, 0.9

    # If pulse_data is a 1D array, turn it into 2
    pulse_data = np.asarray(pulse_data)
    ndim = len(pulse_data.shape)
    if ndim > 2 or ndim < 1:
        raise ValueError("input pulse_data should be a 1d or 2d array.")
    if ndim == 1:
        pulse_data.shape = (1, pulse_data.shape[0])

    # The following requires a lot of numpy foo to read. Sorry!
    if nPretrig >= 4:
        baseline_value = pulse_data[:, 0:nPretrig].mean(axis=1)
    else:
        baseline_value = pulse_data.min(axis=1)
        nPretrig = 0
    value_at_peak = pulse_data.max(axis=1) - baseline_value
    idx_last_pk = pulse_data.argmax(axis=1).max()

    npulses = pulse_data.shape[0]
    try:
        rising_data = (pulse_data[:, nPretrig : idx_last_pk + 1] - baseline_value[:, np.newaxis]) / value_at_peak[:, np.newaxis]
        # Find the last and first indices at which the data are in (0.1, 0.9] times the
        # peak value. Then make sure last is at least 1 past first.
        last_idx = (rising_data > MAXTHRESH).argmax(axis=1) - 1
        first_idx = (rising_data > MINTHRESH).argmax(axis=1)
        last_idx[last_idx < first_idx] = first_idx[last_idx < first_idx] + 1
        last_idx[last_idx == rising_data.shape[1]] = rising_data.shape[1] - 1

        pulsenum = np.arange(npulses)
        y_diff = np.asarray(rising_data[pulsenum, last_idx] - rising_data[pulsenum, first_idx], dtype=float)
        y_diff[y_diff < timebase] = timebase
        time_diff = timebase * (last_idx - first_idx)
        rise_time = time_diff / y_diff
        rise_time[y_diff <= 0] = -9.9e-6
        return rise_time

    except ValueError:
        return -9.9e-6 + np.zeros(npulses, dtype=float)

filter_signal_lowpass(sig, fs, fcut)

Tophat lowpass filter using an FFT

Args: sig - the signal to be filtered fs - the sampling frequency of the signal fcut - the frequency at which to cutoff the signal

Returns: the filtered signal

Source code in mass2/core/analysis_algorithms.py
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
@njit
def filter_signal_lowpass(sig: NDArray, fs: float, fcut: float) -> NDArray:
    """Tophat lowpass filter using an FFT

    Args:
        sig - the signal to be filtered
        fs - the sampling frequency of the signal
        fcut - the frequency at which to cutoff the signal

    Returns:
        the filtered signal
    """
    N = sig.shape[0]
    SIG = np.fft.fft(sig)
    freqs = (fs / N) * np.concatenate((np.arange(0, N / 2 + 1), np.arange(N / 2 - 1, 0, -1)))
    filt = np.zeros_like(SIG)
    filt[freqs < fcut] = 1.0
    sig_filt = np.fft.ifft(SIG * filt)
    return sig_filt

make_smooth_histogram(values, smooth_sigma, limit, upper_limit=None)

Convert a vector of arbitrary info a smoothed histogram by histogramming it and smoothing.

This is a convenience function using the HistogramSmoother class.

Args: values: The vector of data to be histogrammed. smooth_sigma: The smoothing Gaussian's width (FWHM) limit, upper_limit: The histogram limits are [limit,upper_limit] or [0,limit] if upper_limit is None.

Returns: The smoothed histogram as an array.

Source code in mass2/core/analysis_algorithms.py
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
@njit
def make_smooth_histogram(values: ArrayLike, smooth_sigma: float, limit: float, upper_limit: float | None = None) -> NDArray:
    """Convert a vector of arbitrary <values> info a smoothed histogram by
    histogramming it and smoothing.

    This is a convenience function using the HistogramSmoother class.

    Args:
        values: The vector of data to be histogrammed.
        smooth_sigma: The smoothing Gaussian's width (FWHM)
        limit, upper_limit: The histogram limits are [limit,upper_limit] or
            [0,limit] if upper_limit is None.

    Returns:
        The smoothed histogram as an array.
    """
    if upper_limit is None:
        limit, upper_limit = 0, limit
    return HistogramSmoother(smooth_sigma, [limit, upper_limit])(values)

nearest_arrivals(reference_times, other_times)

Find the external trigger time immediately before and after each pulse timestamp

Args: pulse_timestamps - 1d array of pulse timestamps whose nearest neighbors need to be found. external_trigger_timestamps - 1d array of possible nearest neighbors.

Returns: (before_times, after_times)

before_times is an ndarray of the same size as pulse_timestamps. before_times[i] contains the difference between the closest lesser time contained in external_trigger_timestamps and pulse_timestamps[i] or inf if there was no earlier time in other_times Note that before_times is always a positive number even though the time difference it represents is negative.

after_times is an ndarray of the same size as pulse_timestamps. after_times[i] contains the difference between pulse_timestamps[i] and the closest greater time contained in other_times or a inf number if there was no later time in external_trigger_timestamps.

Source code in mass2/core/analysis_algorithms.py
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
@njit
def nearest_arrivals(reference_times: ArrayLike, other_times: ArrayLike) -> tuple[NDArray, NDArray]:
    """Find the external trigger time immediately before and after each pulse timestamp

    Args:
        pulse_timestamps - 1d array of pulse timestamps whose nearest neighbors
            need to be found.
        external_trigger_timestamps - 1d array of possible nearest neighbors.

    Returns:
        (before_times, after_times)

    before_times is an ndarray of the same size as pulse_timestamps.
    before_times[i] contains the difference between the closest lesser time
    contained in external_trigger_timestamps and pulse_timestamps[i]  or inf if there was no
    earlier time in other_times Note that before_times is always a positive
    number even though the time difference it represents is negative.

    after_times is an ndarray of the same size as pulse_timestamps.
    after_times[i] contains the difference between pulse_timestamps[i] and the
    closest greater time contained in other_times or a inf number if there was
    no later time in external_trigger_timestamps.
    """
    other_times = np.asarray(other_times)
    nearest_after_index = np.searchsorted(other_times, reference_times)
    # because both sets of arrival times should be sorted, there are faster algorithms than searchsorted
    # for example: https://github.com/kwgoodman/bottleneck/issues/47
    # we could use one if performance becomes an issue
    last_index = np.searchsorted(nearest_after_index, other_times.size, side="left")
    first_index = np.searchsorted(nearest_after_index, 1)

    nearest_before_index = np.copy(nearest_after_index)
    nearest_before_index[:first_index] = 1
    nearest_before_index -= 1
    before_times = reference_times - other_times[nearest_before_index]
    before_times[:first_index] = np.inf

    nearest_after_index[last_index:] = other_times.size - 1
    after_times = other_times[nearest_after_index] - reference_times
    after_times[last_index:] = np.inf

    return before_times, after_times

resample_one_pulse(pulse, shift)

Resample one pulses (in place). The data will be modified and returned.

Shift can be integer or not; if shift has a fractional part, linear interpolation is used.

Positive values of shift delay the values in a pulse record, and the initial value or values are padded by copying the first value of the pulse record. Negative values shift the pulse earlier in the record, and the final value is used to pad the end of the new record.

Parameters:
  • pulse (NDArray) –

    Pulse to resample in place.

  • shift (float | int) –

    The number of samples to shift each pulse. Value can be non-integer, producing a linear interpolation of the data.

Returns:
  • NDArray –

    The pulse input vector, which is modified by this operation.

Source code in mass2/core/analysis_algorithms.py
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
@njit
def resample_one_pulse(pulse: NDArray, shift: float | int) -> NDArray:
    """Resample one pulses (in place). The data will be modified and returned.

    Shift can be integer or not; if shift has a fractional part, linear interpolation is used.

    Positive values of `shift` delay the values in a pulse record, and the initial value or values
    are padded by copying the first value of the pulse record. Negative values shift the pulse earlier
    in the record, and the final value is used to pad the end of the new record.

    Parameters
    ----------
    pulse : NDArray
        Pulse to resample _in place_.
    shift : float | int
        The number of samples to shift each pulse. Value can be non-integer, producing a linear
        interpolation of the data.

    Returns
    -------
    NDArray
        The `pulse` input vector, which is modified by this operation.
    """
    Nsamples = len(pulse)
    if shift > 0.0:
        fullshift = int(shift)
        fracshift = shift - fullshift
        # integer shift
        if fullshift > 0:
            pulse[fullshift:] = pulse[:-fullshift]
        # linear interpolation for the fraction
        pulse[fullshift + 1 :] = np.rint((1.0 - fracshift) * pulse[fullshift + 1 :] + fracshift * pulse[fullshift:-1])
        # fill the initial values
        pulse[:fullshift] = pulse[fullshift]

    elif shift < 0.0:
        fullshift = -int(-shift)
        fracshift = fullshift - shift
        # integer shift
        if fullshift < 0:
            pulse[:fullshift] = pulse[-fullshift:]
        # linear interpolation for the fraction
        pulse[: Nsamples + fullshift - 1] = np.rint(
            (1 - fracshift) * pulse[: Nsamples + fullshift - 1] + fracshift * pulse[1 : Nsamples + fullshift]
        )
        # fill the final values
        pulse[Nsamples + fullshift :] = pulse[-1]
    # Edge case of shift == 0.0 is a no-op and can be ignored.
    return pulse

resample_pulses(pulses, shifts)

Resample multiple pulses (in place). The data will be modified and returned.

Shifts can be integer or not; if shifts have a fractional part, linear interpolation is used.

Positive values of shifts delay the values in a pulse record, and the initial value or values are padded by copying the first value of the pulse record. Negative values shift the pulse earlier in the record, and the final value is used to pad the end of the new record.

Parameters:
  • pulses (NDArray) –

    Array of pulses to resample in place. Size (N,M) for N pulses, each of length M.

  • shifts (ArrayLike | float | int) –

    The number of samples to shift each pulse. Array of size N, or a scalar. If scalar, apply the same shift to all N pulses. Values can be non-integer, producing a linear interpolation of the data.

Returns:
  • NDArray –

    The pulses input array, which is modified by this operation.

Source code in mass2/core/analysis_algorithms.py
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
@njit
def resample_pulses(pulses: NDArray, shifts: ArrayLike | float | int) -> NDArray:
    """Resample multiple pulses (in place). The data will be modified and returned.

    Shifts can be integer or not; if shifts have a fractional part, linear interpolation is used.

    Positive values of `shifts` delay the values in a pulse record, and the initial value or values
    are padded by copying the first value of the pulse record. Negative values shift the pulse earlier
    in the record, and the final value is used to pad the end of the new record.

    Parameters
    ----------
    pulses : NDArray
        Array of pulses to resample _in place_. Size (N,M) for N pulses, each of length M.
    shifts : ArrayLike | float | int
        The number of samples to shift each pulse. Array of size `N`, or a scalar. If scalar,
        apply the same shift to all `N` pulses. Values can be non-integer, producing a linear
        interpolation of the data.

    Returns
    -------
    NDArray
        The `pulses` input array, which is modified by this operation.
    """
    # Positive shift means delay the record and pad the start.
    # Negative shift means rewind the record and pad the end.
    assert len(pulses.shape) == 2
    Npulses, Nsamples = pulses.shape
    if np.isscalar(shifts):
        shift_vec = np.zeros(Npulses, dtype=float)
        shift_vec.fill(shifts)
        return resample_pulses(pulses, shift_vec)

    shifts = np.asarray(shifts)
    assert len(shifts) == Npulses
    assert np.abs(shifts).max() < Nsamples

    for pulse, shift in zip(pulses, shifts):
        resample_one_pulse(pulse, shift)
    return pulses

time_drift_correct(time, uncorrected, w, sec_per_degree=2000, pulses_per_degree=2000, max_degrees=20, ndeg=None, limit=None)

Compute a time-based drift correction that minimizes the spectral entropy.

Args: time: The "time-axis". Correction will be a low-order polynomial in this. uncorrected: A filtered pulse height vector. Same length as indicator. Assumed to have some gain that is linearly related to indicator. w: the kernel width for the Laplace KDE density estimator sec_per_degree: assign as many as one polynomial degree per this many seconds pulses_per_degree: assign as many as one polynomial degree per this many pulses max_degrees: never use more than this many degrees of Legendre polynomial. n_deg: If not None, use this many degrees, regardless of the values of sec_per_degree, pulses_per_degree, and max_degress. In this case, never downsample. limit: The [lower,upper] limit of uncorrected values over which entropy is computed (default None).

The entropy will be computed on corrected values only in the range [limit[0], limit[1]], so limit should be set to a characteristic large value of uncorrected. If limit is None (the default), then it will be computed as 25%% larger than the 99%%ile point of uncorrected.

Possible improvements in the future: * Use Numba to speed up. * Allow the parameters to be function arguments with defaults: photons per degree of freedom, seconds per degree of freedom, and max degrees of freedom. * Figure out how to span the available time with more than one set of legendre polynomials, so that we can have more than 20 d.o.f. eventually, for long runs.

Source code in mass2/core/analysis_algorithms.py
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
def time_drift_correct(  # noqa: PLR0914
    time: ArrayLike,
    uncorrected: ArrayLike,
    w: float,
    sec_per_degree: float = 2000,
    pulses_per_degree: int = 2000,
    max_degrees: int = 20,
    ndeg: int | None = None,
    limit: tuple[float, float] | None = None,
) -> dict[str, Any]:
    """Compute a time-based drift correction that minimizes the spectral entropy.

    Args:
        time: The "time-axis". Correction will be a low-order polynomial in this.
        uncorrected: A filtered pulse height vector. Same length as indicator.
            Assumed to have some gain that is linearly related to indicator.
        w: the kernel width for the Laplace KDE density estimator
        sec_per_degree: assign as many as one polynomial degree per this many seconds
        pulses_per_degree: assign as many as one polynomial degree per this many pulses
        max_degrees: never use more than this many degrees of Legendre polynomial.
        n_deg: If not None, use this many degrees, regardless of the values of
               sec_per_degree, pulses_per_degree, and max_degress. In this case, never downsample.
        limit: The [lower,upper] limit of uncorrected values over which entropy is
            computed (default None).

    The entropy will be computed on corrected values only in the range
    [limit[0], limit[1]], so limit should be set to a characteristic large value
    of uncorrected. If limit is None (the default), then it will be computed as
    25%% larger than the 99%%ile point of uncorrected.

    Possible improvements in the future:
    * Use Numba to speed up.
    * Allow the parameters to be function arguments with defaults: photons per
      degree of freedom, seconds per degree of freedom, and max degrees of freedom.
    * Figure out how to span the available time with more than one set of legendre
      polynomials, so that we can have more than 20 d.o.f. eventually, for long runs.
    """
    time = np.asarray(time)
    uncorrected = np.asarray(uncorrected)
    if limit is None:
        pct99 = np.percentile(uncorrected, 99)
        limit = (0, 1.25 * pct99)

    use = np.logical_and(uncorrected > limit[0], uncorrected < limit[1])
    time = np.asarray(time[use])
    uncorrected = np.asarray(uncorrected[use])

    tmin, tmax = np.min(time), np.max(time)

    def normalize(t: NDArray) -> NDArray:
        """Rescale time to the range [-1,1]"""
        return (t - tmin) / (tmax - tmin) * 2 - 1

    info = {
        "tmin": tmin,
        "tmax": tmax,
        "normalize": normalize,
    }

    dtime = tmax - tmin
    N = len(time)
    if ndeg is None:
        ndeg = int(np.minimum(dtime / sec_per_degree, N / pulses_per_degree))
        ndeg = min(ndeg, max_degrees)
        ndeg = max(ndeg, 1)
        phot_per_degree = N / float(ndeg)
        if phot_per_degree >= 2 * pulses_per_degree:
            downsample = int(phot_per_degree / pulses_per_degree)
            time = time[::downsample]
            uncorrected = uncorrected[::downsample]
            N = len(time)
        else:
            downsample = 1
    else:
        downsample = 1

    LOG.info("Using %2d degrees for %6d photons (after %d downsample)", ndeg, N, downsample)
    LOG.info("That's %6.1f photons per degree, and %6.1f seconds per degree.", N / float(ndeg), dtime / ndeg)

    def model1(pi: NDArray, i: int, param: NDArray, basis: NDArray) -> NDArray:
        "The model function, with one parameter pi varied, others fixed."
        pcopy = np.array(param)
        pcopy[i] = pi
        return 1 + np.dot(basis.T, pcopy)

    def cost1(pi: NDArray, i: int, param: NDArray, y: NDArray, w: float, basis: NDArray) -> float:
        "The cost function (spectral entropy), with one parameter pi varied, others fixed."
        return laplace_entropy(y * model1(pi, i, param, basis), w=w)

    param = np.zeros(ndeg, dtype=float)
    xnorm = np.asarray(normalize(time), dtype=float)
    basis = np.vstack([sp.special.legendre(i + 1)(xnorm) for i in range(ndeg)])

    fc = 0
    model: Callable = np.poly1d([0])
    info["coefficients"] = np.zeros(ndeg, dtype=float)
    for i in range(ndeg):
        result, _fval, _iter, funcalls = sp.optimize.brent(
            cost1, (i, param, uncorrected, w, basis), [-0.001, 0.001], tol=1e-5, full_output=True
        )
        param[i] = result
        fc += funcalls
        model += sp.special.legendre(i + 1) * result
        info["coefficients"][i] = result
    info["funccalls"] = fc

    xk = np.linspace(-1, 1, 1 + 2 * ndeg)
    model2 = CubicSpline(xk, model(xk))
    H1 = laplace_entropy(uncorrected, w=w)
    H2 = laplace_entropy(uncorrected * (1 + model(xnorm)), w=w)
    H3 = laplace_entropy(uncorrected * (1 + model2(xnorm)), w=w)
    if H2 <= 0 or H2 - H1 > 0.0:
        model = np.poly1d([0])
    elif H3 <= 0 or H3 - H2 > 0.00001:
        model = model2

    info["entropies"] = (H1, H2, H3)
    info["model"] = model
    return info

unwrap_n(data, period, mask, n=3)

Unwrap data that has been restricted to a given period.

The algorithm iterates through each data point and compares it to the average of the previous n data points. It then offsets the data point by the multiple of the period that will minimize the difference from that n-point running average.

For the first n data points, there are not enough preceding points to average n of them, so the algorithm will average fewer points.

This code was written by Thomas Baker; integrated into MASS by Dan Becker. Sped up 300x by @njit.

Parameters:
  • data (array of data values) –
  • period (the range over which the data loops) –
  • n (how many preceding points to average, default: 3 ) –
  • mask (ArrayLike) –
Source code in mass2/core/analysis_algorithms.py
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
@njit
def unwrap_n(data: NDArray[np.uint16], period: float, mask: ArrayLike, n: int = 3) -> NDArray:
    """Unwrap data that has been restricted to a given period.

    The algorithm iterates through each data point and compares
    it to the average of the previous n data points. It then
    offsets the data point by the multiple of the period that
    will minimize the difference from that n-point running average.

    For the first n data points, there are not enough preceding
    points to average n of them, so the algorithm will average
    fewer points.

    This code was written by Thomas Baker; integrated into MASS by Dan
    Becker. Sped up 300x by @njit.

    Parameters
    ----------
    data : array of data values
    period : the range over which the data loops
    n : how many preceding points to average
    mask: mask indentifying "good" pulses
    """
    mask = np.asarray(mask)
    udata = np.copy(data)  # make a copy for output
    if n <= 0:
        return udata

    # Iterate through each data point and offset it by
    # an amount that will minimize the difference from the
    # rolling average
    nprior = 0
    firstgoodidx = np.argmax(mask)
    priorvalues = np.full(n, udata[firstgoodidx])
    for i in range(len(data)):
        # Take the average of the previous n data points (only those with mask[i]==True).
        # Offset the data point by the most reasonable multiple of period (make this point closest to the running average).
        if mask[i]:
            avg = np.mean(priorvalues)
            if nprior == 0:
                avg = float(priorvalues[0])
            elif nprior < n:
                avg = np.mean(priorvalues[:nprior])
        udata[i] -= np.round((udata[i] - avg) / period) * period
        if mask[i]:
            priorvalues[nprior % n] = udata[i]
            nprior += 1
    return udata

Provide DriftCorrectStep and DriftCorrection for correcting gain drifts that correlate with pretrigger mean.

DriftCorrectStep dataclass

Bases: RecipeStep

A RecipeStep to apply a linear drift correction to pulse data in a DataFrame.

Source code in mass2/core/drift_correction.py
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
@dataclass(frozen=True)
class DriftCorrectStep(RecipeStep):
    """A RecipeStep to apply a linear drift correction to pulse data in a DataFrame."""

    dc: typing.Any

    def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
        """Apply the drift correction to the input DataFrame and return a new DataFrame with results."""
        indicator_col, uncorrected_col = self.inputs
        slope, offset = self.dc.slope, self.dc.offset
        df2 = df.select((pl.col(uncorrected_col) * (1 + slope * (pl.col(indicator_col) - offset))).alias(self.output[0])).with_columns(
            df
        )
        return df2

    def dbg_plot(self, df_after: pl.DataFrame, **kwargs: Any) -> plt.Axes:
        """Plot the uncorrected and corrected values against the indicator for debugging purposes."""
        indicator_col, uncorrected_col = self.inputs
        # breakpoint()
        df_small = df_after.lazy().filter(self.good_expr).filter(self.use_expr).select(self.inputs + self.output).collect()
        mass2.misc.plot_a_vs_b_series(df_small[indicator_col], df_small[uncorrected_col])
        mass2.misc.plot_a_vs_b_series(
            df_small[indicator_col],
            df_small[self.output[0]],
            plt.gca(),
        )
        plt.legend()
        plt.tight_layout()
        return plt.gca()

    @classmethod
    def learn(
        cls, ch: "Channel", indicator_col: str, uncorrected_col: str, corrected_col: str | None, use_expr: pl.Expr
    ) -> "DriftCorrectStep":
        """Create a DriftCorrectStep by learning the correction from data in the given Channel."""
        if corrected_col is None:
            corrected_col = uncorrected_col + "_dc"
        indicator_s, uncorrected_s = ch.good_serieses([indicator_col, uncorrected_col], use_expr)
        dc = mass2.core.drift_correct(
            indicator=indicator_s.to_numpy(),
            uncorrected=uncorrected_s.to_numpy(),
        )
        step = cls(
            inputs=[indicator_col, uncorrected_col],
            output=[corrected_col],
            good_expr=ch.good_expr,
            use_expr=use_expr,
            dc=dc,
        )
        return step

calc_from_df(df)

Apply the drift correction to the input DataFrame and return a new DataFrame with results.

Source code in mass2/core/drift_correction.py
47
48
49
50
51
52
53
54
def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
    """Apply the drift correction to the input DataFrame and return a new DataFrame with results."""
    indicator_col, uncorrected_col = self.inputs
    slope, offset = self.dc.slope, self.dc.offset
    df2 = df.select((pl.col(uncorrected_col) * (1 + slope * (pl.col(indicator_col) - offset))).alias(self.output[0])).with_columns(
        df
    )
    return df2

dbg_plot(df_after, **kwargs)

Plot the uncorrected and corrected values against the indicator for debugging purposes.

Source code in mass2/core/drift_correction.py
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def dbg_plot(self, df_after: pl.DataFrame, **kwargs: Any) -> plt.Axes:
    """Plot the uncorrected and corrected values against the indicator for debugging purposes."""
    indicator_col, uncorrected_col = self.inputs
    # breakpoint()
    df_small = df_after.lazy().filter(self.good_expr).filter(self.use_expr).select(self.inputs + self.output).collect()
    mass2.misc.plot_a_vs_b_series(df_small[indicator_col], df_small[uncorrected_col])
    mass2.misc.plot_a_vs_b_series(
        df_small[indicator_col],
        df_small[self.output[0]],
        plt.gca(),
    )
    plt.legend()
    plt.tight_layout()
    return plt.gca()

learn(ch, indicator_col, uncorrected_col, corrected_col, use_expr) classmethod

Create a DriftCorrectStep by learning the correction from data in the given Channel.

Source code in mass2/core/drift_correction.py
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
@classmethod
def learn(
    cls, ch: "Channel", indicator_col: str, uncorrected_col: str, corrected_col: str | None, use_expr: pl.Expr
) -> "DriftCorrectStep":
    """Create a DriftCorrectStep by learning the correction from data in the given Channel."""
    if corrected_col is None:
        corrected_col = uncorrected_col + "_dc"
    indicator_s, uncorrected_s = ch.good_serieses([indicator_col, uncorrected_col], use_expr)
    dc = mass2.core.drift_correct(
        indicator=indicator_s.to_numpy(),
        uncorrected=uncorrected_s.to_numpy(),
    )
    step = cls(
        inputs=[indicator_col, uncorrected_col],
        output=[corrected_col],
        good_expr=ch.good_expr,
        use_expr=use_expr,
        dc=dc,
    )
    return step

DriftCorrection dataclass

A linear correction used to attempt remove any correlation between pretrigger mean and pulse height; will work with other quantities instead.

Source code in mass2/core/drift_correction.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
@dataclass
class DriftCorrection:
    """A linear correction used to attempt remove any correlation between pretrigger mean and pulse height;
    will work with other quantities instead."""

    offset: float
    slope: float

    def __call__(self, indicator: ArrayLike, uncorrected: ArrayLike) -> NDArray:
        """Apply the drift correction to the given uncorrected values using the given indicator values."""
        indicator = np.asarray(indicator)
        uncorrected = np.asarray(uncorrected)
        return uncorrected * (1 + (indicator - self.offset) * self.slope)

__call__(indicator, uncorrected)

Apply the drift correction to the given uncorrected values using the given indicator values.

Source code in mass2/core/drift_correction.py
101
102
103
104
105
def __call__(self, indicator: ArrayLike, uncorrected: ArrayLike) -> NDArray:
    """Apply the drift correction to the given uncorrected values using the given indicator values."""
    indicator = np.asarray(indicator)
    uncorrected = np.asarray(uncorrected)
    return uncorrected * (1 + (indicator - self.offset) * self.slope)

drift_correct_mass(indicator, uncorrected)

Determine drift correction parameters using mass2.core.analysis_algorithms.drift_correct.

Source code in mass2/core/drift_correction.py
20
21
22
23
24
def drift_correct_mass(indicator: ArrayLike, uncorrected: ArrayLike) -> "DriftCorrection":
    """Determine drift correction parameters using mass2.core.analysis_algorithms.drift_correct."""
    slope, dc_info = mass2.core.analysis_algorithms.drift_correct(indicator, uncorrected)
    offset = dc_info["median_pretrig_mean"]
    return DriftCorrection(slope=slope, offset=offset)

drift_correct_wip(indicator, uncorrected)

Work in progress to Determine drift correction parameters directly (??).

Source code in mass2/core/drift_correction.py
27
28
29
30
31
32
33
34
35
def drift_correct_wip(indicator: ArrayLike, uncorrected: ArrayLike) -> "DriftCorrection":
    """Work in progress to Determine drift correction parameters directly (??)."""
    opt_result, offset = mass2.core.rough_cal.minimize_entropy_linear(
        np.asarray(indicator),
        np.asarray(uncorrected),
        bin_edges=np.arange(0, 60000, 1),
        fwhm_in_bin_number_units=5,
    )
    return DriftCorrection(offset=float(offset), slope=opt_result.x.astype(np.float64))

Provide OptimalFilterStep, a step to apply an optimal filter to pulse data in a DataFrame.

OptimalFilterStep dataclass

Bases: RecipeStep

A step to apply an optimal filter to pulse data in a DataFrame.

Source code in mass2/core/filter_steps.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
@dataclass(frozen=True)
class OptimalFilterStep(RecipeStep):
    """A step to apply an optimal filter to pulse data in a DataFrame."""

    filter: Filter
    spectrum: NoiseResult | None
    filter_maker: "FilterMaker"
    transform_raw: Callable | None = None

    def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
        """Apply the optimal filter to the input DataFrame and return a new DataFrame with results."""
        dfs = []
        for df_iter in df.iter_slices(10000):
            raw = df_iter[self.inputs[0]].to_numpy()
            if self.transform_raw is not None:
                raw = self.transform_raw(raw)
            peak_y, peak_x = self.filter.filter_records(raw)
            dfs.append(pl.DataFrame({"peak_x": peak_x, "peak_y": peak_y}))
        df2 = pl.concat(dfs).with_columns(df)
        df2 = df2.rename({"peak_x": self.output[0], "peak_y": self.output[1]})
        return df2

    def dbg_plot(self, df_after: pl.DataFrame, **kwargs: Any) -> plt.Axes:
        """Plot the filter shape for debugging purposes."""
        plt.figure()
        axis = plt.subplot(111)
        self.filter.plot(axis)
        return axis

    def drop_debug(self) -> "OptimalFilterStep":
        """Return a copy of this step with debugging information (the NoiseResult) removed."""
        return dataclasses.replace(self, spectrum=None)

calc_from_df(df)

Apply the optimal filter to the input DataFrame and return a new DataFrame with results.

Source code in mass2/core/filter_steps.py
25
26
27
28
29
30
31
32
33
34
35
36
def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
    """Apply the optimal filter to the input DataFrame and return a new DataFrame with results."""
    dfs = []
    for df_iter in df.iter_slices(10000):
        raw = df_iter[self.inputs[0]].to_numpy()
        if self.transform_raw is not None:
            raw = self.transform_raw(raw)
        peak_y, peak_x = self.filter.filter_records(raw)
        dfs.append(pl.DataFrame({"peak_x": peak_x, "peak_y": peak_y}))
    df2 = pl.concat(dfs).with_columns(df)
    df2 = df2.rename({"peak_x": self.output[0], "peak_y": self.output[1]})
    return df2

dbg_plot(df_after, **kwargs)

Plot the filter shape for debugging purposes.

Source code in mass2/core/filter_steps.py
38
39
40
41
42
43
def dbg_plot(self, df_after: pl.DataFrame, **kwargs: Any) -> plt.Axes:
    """Plot the filter shape for debugging purposes."""
    plt.figure()
    axis = plt.subplot(111)
    self.filter.plot(axis)
    return axis

drop_debug()

Return a copy of this step with debugging information (the NoiseResult) removed.

Source code in mass2/core/filter_steps.py
45
46
47
def drop_debug(self) -> "OptimalFilterStep":
    """Return a copy of this step with debugging information (the NoiseResult) removed."""
    return dataclasses.replace(self, spectrum=None)

Classes and functions for reading and handling LJH files.

LJHFile dataclass

Bases: ABC

Represents the header and binary information of a single LJH file.

Includes the complete ASCII header stored both as a dictionary and a string, and key attributes including the number of pulses, number of samples (and presamples) in each pulse record, client information stored by the LJH writer, and the filename.

Also includes a np.memmap to the raw binary data. This memmap always starts with pulse zero and extends to the last full pulse given the file size at the time of object creation. To extend the memmap for files that are growing, use LJHFile.reopen_binary() to return a new object with a possibly longer memmap.

Source code in mass2/core/ljhfiles.py
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
@dataclass(frozen=True)
class LJHFile(ABC):
    """Represents the header and binary information of a single LJH file.

    Includes the complete ASCII header stored both as a dictionary and a string, and
    key attributes including the number of pulses, number of samples (and presamples)
    in each pulse record, client information stored by the LJH writer, and the filename.

    Also includes a `np.memmap` to the raw binary data. This memmap always starts with
    pulse zero and extends to the last full pulse given the file size at the time of object
    creation. To extend the memmap for files that are growing, use `LJHFile.reopen_binary()`
    to return a new object with a possibly longer memmap.
    """

    filename: str
    channum: int
    dtype: np.dtype
    npulses: int
    timebase: float
    nsamples: int
    npresamples: int
    subframediv: int | None
    client: str
    header: dict
    header_string: str
    header_size: int
    binary_size: int
    _mmap: np.memmap
    ljh_version: Version
    max_pulses: int | None = None

    OVERLONG_HEADER: ClassVar[int] = 100

    def __repr__(self) -> str:
        """A string that can be evaluated to re-open this LJH file."""
        return f"""mass2.core.ljhfiles.LJHFile.open("{self.filename}")"""

    @classmethod
    def open(cls, filename: str | Path, max_pulses: int | None = None) -> "LJHFile":
        """Open an LJH file, parsing its header information and returning an LJHFile object."""
        filename = str(filename)
        header_dict, header_string, header_size = cls.read_header(filename)
        channum = header_dict["Channel"]
        timebase = header_dict["Timebase"]
        nsamples = header_dict["Total Samples"]
        npresamples = header_dict["Presamples"]
        client = header_dict.get("Software Version", "UNKNOWN")
        subframediv: int | None = None
        if "Subframe divisions" in header_dict:
            subframediv = header_dict["Subframe divisions"]
        elif "Number of rows" in header_dict:
            subframediv = header_dict["Number of rows"]

        ljh_version = Version(header_dict["Save File Format Version"])
        if ljh_version < Version("2.0.0"):
            raise NotImplementedError("LJH files version 1 are not supported")
        if ljh_version < Version("2.1.0"):
            dtype = np.dtype([
                ("internal_unused", np.uint16),
                ("internal_ms", np.uint32),
                ("data", np.uint16, nsamples),
            ])
            concrete_LJHFile_type: type[LJHFile] = LJHFile_2_0
        elif ljh_version < Version("2.2.0"):
            dtype = np.dtype([
                ("internal_us", np.uint8),
                ("internal_unused", np.uint8),
                ("internal_ms", np.uint32),
                ("data", np.uint16, nsamples),
            ])
            concrete_LJHFile_type = LJHFile_2_1
        else:
            dtype = np.dtype([
                ("subframecount", np.int64),
                ("posix_usec", np.int64),
                ("data", np.uint16, nsamples),
            ])
            concrete_LJHFile_type = LJHFile_2_2
        pulse_size_bytes = dtype.itemsize
        binary_size = os.path.getsize(filename) - header_size

        # Fix long-standing bug in LJH files made by MATTER or XCALDAQ_client:
        # It adds 3 to the "true value" of nPresamples. Assume only DASTARD clients have value correct.
        if "DASTARD" not in client:
            npresamples += 3

        npulses = binary_size // pulse_size_bytes
        if max_pulses is not None:
            npulses = min(max_pulses, npulses)
        mmap = np.memmap(filename, dtype, mode="r", offset=header_size, shape=(npulses,))

        return concrete_LJHFile_type(
            filename,
            channum,
            dtype,
            npulses,
            timebase,
            nsamples,
            npresamples,
            subframediv,
            client,
            header_dict,
            header_string,
            header_size,
            binary_size,
            mmap,
            ljh_version,
            max_pulses,
        )

    @classmethod
    def read_header(cls, filename: str) -> tuple[dict, str, int]:
        """Read in the text header of an LJH file. Return the header parsed into a dictionary,
        the complete header string (in case you want to generate a new LJH file from this one),
        and the size of the header in bytes. The file does not remain open after this method.

        Returns:
            (header_dict, header_string, header_size)

        Args:
            filename: path to the file to be opened.
        """
        # parse header into a dictionary
        header_dict: dict[str, Any] = {}
        with open(filename, "rb") as fp:
            i = 0
            lines = []
            while True:
                line = fp.readline().decode()
                lines.append(line)
                i += 1
                if line.startswith("#End of Header"):
                    break
                elif not line:
                    raise Exception("reached EOF before #End of Header")
                elif i > cls.OVERLONG_HEADER:
                    raise Exception(f"header is too long--seems not to contain '#End of Header'\nin file {filename}")
                # ignore lines without ":"
                elif ":" in line:
                    a, b = line.split(":", maxsplit=1)
                    a = a.strip()
                    b = b.strip()
                    header_dict[a] = b
            header_size = fp.tell()
        header_string = "".join(lines)

        # Convert values from header_dict into numeric types, when appropriate
        header_dict["Filename"] = filename
        for name, datatype in (
            ("Channel", int),
            ("Timebase", float),
            ("Total Samples", int),
            ("Presamples", int),
            ("Number of columns", int),
            ("Number of rows", int),
            ("Subframe divisions", int),
            ("Timestamp offset (s)", float),
        ):
            # Have to convert to float first, as some early LJH have "Channel: 1.0"
            header_dict[name] = datatype(float(header_dict.get(name, -1)))
        return header_dict, header_string, header_size

    @property
    def pulse_size_bytes(self) -> int:
        """The size in bytes of each binary pulse record (including the timestamps)"""
        return self.dtype.itemsize

    def reopen_binary(self, max_pulses: int | None = None) -> "LJHFile":
        """Reopen the underlying binary section of the LJH file, in case its size has changed,
        without re-reading the LJH header section.

        Parameters
        ----------
        max_pulses : Optional[int], optional
            A limit to the number of pulses to memory map or None for no limit, by default None

        Returns
        -------
        Self
            A new `LJHFile` object with the same header but a new memmap and number of pulses.
        """
        current_binary_size = os.path.getsize(self.filename) - self.header_size
        npulses = current_binary_size // self.pulse_size_bytes
        if max_pulses is not None:
            npulses = min(max_pulses, npulses)
        mmap = np.memmap(
            self.filename,
            self.dtype,
            mode="r",
            offset=self.header_size,
            shape=(npulses,),
        )
        return dataclasses.replace(
            self,
            npulses=npulses,
            _mmap=mmap,
            max_pulses=max_pulses,
            binary_size=current_binary_size,
        )

    @property
    def subframecount(self) -> NDArray:
        """Return a copy of the subframecount memory map.

        Old LJH versions don't have this: return zeros, unless overridden by derived class (LJHFile_2_2 will be the only one).

        Returns
        -------
        np.ndarray
            An array of subframecount values for each pulse record.
        """
        return np.zeros(self.npulses, dtype=np.int64)

    @property
    @abstractmethod
    def datatimes_raw(self) -> NDArray:
        """Return a copy of the raw timestamp (posix usec) memory map.

        In mass issue #337, we found that computing on the entire memory map at once was prohibitively
        expensive for large files. To prevent problems, copy chunks of no more than
        `MAXSEGMENT` records at once.

        Returns
        -------
        np.ndarray
            An array of timestamp values for each pulse record, in microseconds since the epoh (1970).
        """
        raise NotImplementedError("illegal: this is an abstract base class")

    @property
    def datatimes_float(self) -> NDArray:
        """Compute pulse record times in floating-point (seconds since the 1970 epoch).

        In mass issue #337, we found that computing on the entire memory map at once was prohibitively
        expensive for large files. To prevent problems, compute on chunks of no more than
        `MAXSEGMENT` records at once.

        Returns
        -------
        np.ndarray
            An array of pulse record times in floating-point (seconds since the 1970 epoch).
        """
        return self.datatimes_raw / 1e6

    def read_trace(self, i: int) -> NDArray:
        """Return a single pulse record from an LJH file.

        Parameters
        ----------
        i : int
            Pulse record number (0-indexed)

        Returns
        -------
        ArrayLike
            A view into the pulse record.
        """
        return self._mmap["data"][i]

    def read_trace_with_timing(self, i: int) -> tuple[int, int, NDArray]:
        """Return a single data trace as (subframecount, posix_usec, pulse_record)."""
        pulse_record = self.read_trace(i)
        return (self.subframecount[i], self.datatimes_raw[i], pulse_record)

    def to_polars(
        self,
        first_pulse: int = 0,
        keep_posix_usec: bool = False,
        force_continuous: bool = False,
    ) -> tuple[pl.DataFrame, pl.DataFrame]:
        """Convert this LJH file to two Polars dataframes: one for the binary data, one for the header.

        Parameters
        ----------
        first_pulse : int, optional
            The pulse dataframe starts with this pulse record number, by default 0
        keep_posix_usec : bool, optional
            Whether to keep the raw `posix_usec` field in the pulse dataframe, by default False
        force_continuous : bool
            Whether to claim that the data stream is actually continuous (because it cannot be learned from
            data for LJH files before version 2.2.0). Only relevant for noise data files.

        Returns
        -------
        tuple[pl.DataFrame, pl.DataFrame]
            (df, header_df)
            df: the dataframe containing raw pulse information, one row per pulse
            header_df: a one-row dataframe containing the information from the LJH file header

            The polars series "timestamp" will be converted to the time zone name
            returned by tzlocal.get_localzone_name()
        """
        data = {
            "pulse": self._mmap["data"][first_pulse:],
            "posix_usec": self.datatimes_raw[first_pulse:],
            "subframecount": self.subframecount[first_pulse:],
        }
        schema: pl._typing.SchemaDict = {
            "pulse": pl.Array(pl.UInt16, self.nsamples),
            "posix_usec": pl.UInt64,
            "subframecount": pl.UInt64,
        }
        df = pl.DataFrame(data, schema=schema)
        df = df.select(
            pl.from_epoch("posix_usec", time_unit="us").dt.convert_time_zone(_local_timezone_name).alias("timestamp")
        ).with_columns(df)
        if not keep_posix_usec:
            df = df.select(pl.exclude("posix_usec"))
        continuous = self.is_continuous or force_continuous
        header_df = pl.DataFrame(self.header).with_columns(continuous=continuous)
        return df, header_df

    def write_truncated_ljh(self, filename: str, npulses: int) -> None:
        """Write an LJH copy of this file, with a limited number of pulses.

        Parameters
        ----------
        filename : str
            The path where a new LJH file will be created (or replaced).
        npulses : int
            Number of pulse records to write
        """
        npulses = max(npulses, self.npulses)
        with open(filename, "wb") as f:
            f.write(self.header_string.encode("utf-8"))
            f.write(self._mmap[:npulses].tobytes())

    @property
    def is_continuous(self) -> bool:
        """Is this LJH file made of a perfectly continuous data stream?

        For pre-version 2.2 LJH files, we cannot discern from the data.
        So just claim False. (LJH_2_2 subtype will override by actually computing.)

        Returns
        -------
        bool
            Whether every record is strictly continuous with the ones before and after
        """
        return False

datatimes_float property

Compute pulse record times in floating-point (seconds since the 1970 epoch).

In mass issue #337, we found that computing on the entire memory map at once was prohibitively expensive for large files. To prevent problems, compute on chunks of no more than MAXSEGMENT records at once.

Returns:
  • ndarray –

    An array of pulse record times in floating-point (seconds since the 1970 epoch).

datatimes_raw abstractmethod property

Return a copy of the raw timestamp (posix usec) memory map.

In mass issue #337, we found that computing on the entire memory map at once was prohibitively expensive for large files. To prevent problems, copy chunks of no more than MAXSEGMENT records at once.

Returns:
  • ndarray –

    An array of timestamp values for each pulse record, in microseconds since the epoh (1970).

is_continuous property

Is this LJH file made of a perfectly continuous data stream?

For pre-version 2.2 LJH files, we cannot discern from the data. So just claim False. (LJH_2_2 subtype will override by actually computing.)

Returns:
  • bool –

    Whether every record is strictly continuous with the ones before and after

pulse_size_bytes property

The size in bytes of each binary pulse record (including the timestamps)

subframecount property

Return a copy of the subframecount memory map.

Old LJH versions don't have this: return zeros, unless overridden by derived class (LJHFile_2_2 will be the only one).

Returns:
  • ndarray –

    An array of subframecount values for each pulse record.

__repr__()

A string that can be evaluated to re-open this LJH file.

Source code in mass2/core/ljhfiles.py
55
56
57
def __repr__(self) -> str:
    """A string that can be evaluated to re-open this LJH file."""
    return f"""mass2.core.ljhfiles.LJHFile.open("{self.filename}")"""

open(filename, max_pulses=None) classmethod

Open an LJH file, parsing its header information and returning an LJHFile object.

Source code in mass2/core/ljhfiles.py
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
@classmethod
def open(cls, filename: str | Path, max_pulses: int | None = None) -> "LJHFile":
    """Open an LJH file, parsing its header information and returning an LJHFile object."""
    filename = str(filename)
    header_dict, header_string, header_size = cls.read_header(filename)
    channum = header_dict["Channel"]
    timebase = header_dict["Timebase"]
    nsamples = header_dict["Total Samples"]
    npresamples = header_dict["Presamples"]
    client = header_dict.get("Software Version", "UNKNOWN")
    subframediv: int | None = None
    if "Subframe divisions" in header_dict:
        subframediv = header_dict["Subframe divisions"]
    elif "Number of rows" in header_dict:
        subframediv = header_dict["Number of rows"]

    ljh_version = Version(header_dict["Save File Format Version"])
    if ljh_version < Version("2.0.0"):
        raise NotImplementedError("LJH files version 1 are not supported")
    if ljh_version < Version("2.1.0"):
        dtype = np.dtype([
            ("internal_unused", np.uint16),
            ("internal_ms", np.uint32),
            ("data", np.uint16, nsamples),
        ])
        concrete_LJHFile_type: type[LJHFile] = LJHFile_2_0
    elif ljh_version < Version("2.2.0"):
        dtype = np.dtype([
            ("internal_us", np.uint8),
            ("internal_unused", np.uint8),
            ("internal_ms", np.uint32),
            ("data", np.uint16, nsamples),
        ])
        concrete_LJHFile_type = LJHFile_2_1
    else:
        dtype = np.dtype([
            ("subframecount", np.int64),
            ("posix_usec", np.int64),
            ("data", np.uint16, nsamples),
        ])
        concrete_LJHFile_type = LJHFile_2_2
    pulse_size_bytes = dtype.itemsize
    binary_size = os.path.getsize(filename) - header_size

    # Fix long-standing bug in LJH files made by MATTER or XCALDAQ_client:
    # It adds 3 to the "true value" of nPresamples. Assume only DASTARD clients have value correct.
    if "DASTARD" not in client:
        npresamples += 3

    npulses = binary_size // pulse_size_bytes
    if max_pulses is not None:
        npulses = min(max_pulses, npulses)
    mmap = np.memmap(filename, dtype, mode="r", offset=header_size, shape=(npulses,))

    return concrete_LJHFile_type(
        filename,
        channum,
        dtype,
        npulses,
        timebase,
        nsamples,
        npresamples,
        subframediv,
        client,
        header_dict,
        header_string,
        header_size,
        binary_size,
        mmap,
        ljh_version,
        max_pulses,
    )

read_header(filename) classmethod

Read in the text header of an LJH file. Return the header parsed into a dictionary, the complete header string (in case you want to generate a new LJH file from this one), and the size of the header in bytes. The file does not remain open after this method.

Returns: (header_dict, header_string, header_size)

Args: filename: path to the file to be opened.

Source code in mass2/core/ljhfiles.py
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
@classmethod
def read_header(cls, filename: str) -> tuple[dict, str, int]:
    """Read in the text header of an LJH file. Return the header parsed into a dictionary,
    the complete header string (in case you want to generate a new LJH file from this one),
    and the size of the header in bytes. The file does not remain open after this method.

    Returns:
        (header_dict, header_string, header_size)

    Args:
        filename: path to the file to be opened.
    """
    # parse header into a dictionary
    header_dict: dict[str, Any] = {}
    with open(filename, "rb") as fp:
        i = 0
        lines = []
        while True:
            line = fp.readline().decode()
            lines.append(line)
            i += 1
            if line.startswith("#End of Header"):
                break
            elif not line:
                raise Exception("reached EOF before #End of Header")
            elif i > cls.OVERLONG_HEADER:
                raise Exception(f"header is too long--seems not to contain '#End of Header'\nin file {filename}")
            # ignore lines without ":"
            elif ":" in line:
                a, b = line.split(":", maxsplit=1)
                a = a.strip()
                b = b.strip()
                header_dict[a] = b
        header_size = fp.tell()
    header_string = "".join(lines)

    # Convert values from header_dict into numeric types, when appropriate
    header_dict["Filename"] = filename
    for name, datatype in (
        ("Channel", int),
        ("Timebase", float),
        ("Total Samples", int),
        ("Presamples", int),
        ("Number of columns", int),
        ("Number of rows", int),
        ("Subframe divisions", int),
        ("Timestamp offset (s)", float),
    ):
        # Have to convert to float first, as some early LJH have "Channel: 1.0"
        header_dict[name] = datatype(float(header_dict.get(name, -1)))
    return header_dict, header_string, header_size

read_trace(i)

Return a single pulse record from an LJH file.

Parameters:
  • i (int) –

    Pulse record number (0-indexed)

Returns:
  • ArrayLike –

    A view into the pulse record.

Source code in mass2/core/ljhfiles.py
266
267
268
269
270
271
272
273
274
275
276
277
278
279
def read_trace(self, i: int) -> NDArray:
    """Return a single pulse record from an LJH file.

    Parameters
    ----------
    i : int
        Pulse record number (0-indexed)

    Returns
    -------
    ArrayLike
        A view into the pulse record.
    """
    return self._mmap["data"][i]

read_trace_with_timing(i)

Return a single data trace as (subframecount, posix_usec, pulse_record).

Source code in mass2/core/ljhfiles.py
281
282
283
284
def read_trace_with_timing(self, i: int) -> tuple[int, int, NDArray]:
    """Return a single data trace as (subframecount, posix_usec, pulse_record)."""
    pulse_record = self.read_trace(i)
    return (self.subframecount[i], self.datatimes_raw[i], pulse_record)

reopen_binary(max_pulses=None)

Reopen the underlying binary section of the LJH file, in case its size has changed, without re-reading the LJH header section.

Parameters:
  • max_pulses (Optional[int], default: None ) –

    A limit to the number of pulses to memory map or None for no limit, by default None

Returns:
  • Self –

    A new LJHFile object with the same header but a new memmap and number of pulses.

Source code in mass2/core/ljhfiles.py
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
def reopen_binary(self, max_pulses: int | None = None) -> "LJHFile":
    """Reopen the underlying binary section of the LJH file, in case its size has changed,
    without re-reading the LJH header section.

    Parameters
    ----------
    max_pulses : Optional[int], optional
        A limit to the number of pulses to memory map or None for no limit, by default None

    Returns
    -------
    Self
        A new `LJHFile` object with the same header but a new memmap and number of pulses.
    """
    current_binary_size = os.path.getsize(self.filename) - self.header_size
    npulses = current_binary_size // self.pulse_size_bytes
    if max_pulses is not None:
        npulses = min(max_pulses, npulses)
    mmap = np.memmap(
        self.filename,
        self.dtype,
        mode="r",
        offset=self.header_size,
        shape=(npulses,),
    )
    return dataclasses.replace(
        self,
        npulses=npulses,
        _mmap=mmap,
        max_pulses=max_pulses,
        binary_size=current_binary_size,
    )

to_polars(first_pulse=0, keep_posix_usec=False, force_continuous=False)

Convert this LJH file to two Polars dataframes: one for the binary data, one for the header.

Parameters:
  • first_pulse (int, default: 0 ) –

    The pulse dataframe starts with this pulse record number, by default 0

  • keep_posix_usec (bool, default: False ) –

    Whether to keep the raw posix_usec field in the pulse dataframe, by default False

  • force_continuous (bool, default: False ) –

    Whether to claim that the data stream is actually continuous (because it cannot be learned from data for LJH files before version 2.2.0). Only relevant for noise data files.

Returns:
  • tuple[DataFrame, DataFrame] –

    (df, header_df) df: the dataframe containing raw pulse information, one row per pulse header_df: a one-row dataframe containing the information from the LJH file header

    The polars series "timestamp" will be converted to the time zone name returned by tzlocal.get_localzone_name()

Source code in mass2/core/ljhfiles.py
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
def to_polars(
    self,
    first_pulse: int = 0,
    keep_posix_usec: bool = False,
    force_continuous: bool = False,
) -> tuple[pl.DataFrame, pl.DataFrame]:
    """Convert this LJH file to two Polars dataframes: one for the binary data, one for the header.

    Parameters
    ----------
    first_pulse : int, optional
        The pulse dataframe starts with this pulse record number, by default 0
    keep_posix_usec : bool, optional
        Whether to keep the raw `posix_usec` field in the pulse dataframe, by default False
    force_continuous : bool
        Whether to claim that the data stream is actually continuous (because it cannot be learned from
        data for LJH files before version 2.2.0). Only relevant for noise data files.

    Returns
    -------
    tuple[pl.DataFrame, pl.DataFrame]
        (df, header_df)
        df: the dataframe containing raw pulse information, one row per pulse
        header_df: a one-row dataframe containing the information from the LJH file header

        The polars series "timestamp" will be converted to the time zone name
        returned by tzlocal.get_localzone_name()
    """
    data = {
        "pulse": self._mmap["data"][first_pulse:],
        "posix_usec": self.datatimes_raw[first_pulse:],
        "subframecount": self.subframecount[first_pulse:],
    }
    schema: pl._typing.SchemaDict = {
        "pulse": pl.Array(pl.UInt16, self.nsamples),
        "posix_usec": pl.UInt64,
        "subframecount": pl.UInt64,
    }
    df = pl.DataFrame(data, schema=schema)
    df = df.select(
        pl.from_epoch("posix_usec", time_unit="us").dt.convert_time_zone(_local_timezone_name).alias("timestamp")
    ).with_columns(df)
    if not keep_posix_usec:
        df = df.select(pl.exclude("posix_usec"))
    continuous = self.is_continuous or force_continuous
    header_df = pl.DataFrame(self.header).with_columns(continuous=continuous)
    return df, header_df

write_truncated_ljh(filename, npulses)

Write an LJH copy of this file, with a limited number of pulses.

Parameters:
  • filename (str) –

    The path where a new LJH file will be created (or replaced).

  • npulses (int) –

    Number of pulse records to write

Source code in mass2/core/ljhfiles.py
334
335
336
337
338
339
340
341
342
343
344
345
346
347
def write_truncated_ljh(self, filename: str, npulses: int) -> None:
    """Write an LJH copy of this file, with a limited number of pulses.

    Parameters
    ----------
    filename : str
        The path where a new LJH file will be created (or replaced).
    npulses : int
        Number of pulse records to write
    """
    npulses = max(npulses, self.npulses)
    with open(filename, "wb") as f:
        f.write(self.header_string.encode("utf-8"))
        f.write(self._mmap[:npulses].tobytes())

LJHFile_2_0 dataclass

Bases: LJHFile

LJH files version 2.0, which have internal_ms fields only, but no subframecount and no µs information.

Source code in mass2/core/ljhfiles.py
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
class LJHFile_2_0(LJHFile):
    """LJH files version 2.0, which have internal_ms fields only, but no subframecount and no µs information."""

    @property
    def datatimes_raw(self) -> NDArray:
        """Return a copy of the raw timestamp (posix usec) memory map.

        In mass issue #337, we found that computing on the entire memory map at once was prohibitively
        expensive for large files. To prevent problems, copy chunks of no more than
        `MAXSEGMENT` records at once.

        Returns
        -------
        np.ndarray
            An array of timestamp values for each pulse record, in microseconds since the epoh (1970).
        """
        usec = np.zeros(self.npulses, dtype=np.int64)
        mmap = self._mmap["internal_ms"]
        scale = 1000
        offset = round(self.header["Timestamp offset (s)"] * 1e6)

        MAXSEGMENT = 4096
        first = 0
        while first < self.npulses:
            last = min(first + MAXSEGMENT, self.npulses)
            usec[first:last] = mmap[first:last]
            first = last
        usec = usec * scale + offset

        return usec

    def to_polars(
        self, first_pulse: int = 0, keep_posix_usec: bool = False, force_continuous: bool = False
    ) -> tuple[pl.DataFrame, pl.DataFrame]:
        """Generate two Polars dataframes from this LJH file: one for the binary data, one for the header."""
        df, df_header = super().to_polars(first_pulse, keep_posix_usec, force_continuous=force_continuous)
        return df.select(pl.exclude("subframecount")), df_header

datatimes_raw property

Return a copy of the raw timestamp (posix usec) memory map.

In mass issue #337, we found that computing on the entire memory map at once was prohibitively expensive for large files. To prevent problems, copy chunks of no more than MAXSEGMENT records at once.

Returns:
  • ndarray –

    An array of timestamp values for each pulse record, in microseconds since the epoh (1970).

to_polars(first_pulse=0, keep_posix_usec=False, force_continuous=False)

Generate two Polars dataframes from this LJH file: one for the binary data, one for the header.

Source code in mass2/core/ljhfiles.py
513
514
515
516
517
518
def to_polars(
    self, first_pulse: int = 0, keep_posix_usec: bool = False, force_continuous: bool = False
) -> tuple[pl.DataFrame, pl.DataFrame]:
    """Generate two Polars dataframes from this LJH file: one for the binary data, one for the header."""
    df, df_header = super().to_polars(first_pulse, keep_posix_usec, force_continuous=force_continuous)
    return df.select(pl.exclude("subframecount")), df_header

LJHFile_2_1 dataclass

Bases: LJHFile

LJH files version 2.1, which have internal_us and internal_ms fields, but no subframecount.

Source code in mass2/core/ljhfiles.py
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
class LJHFile_2_1(LJHFile):
    """LJH files version 2.1, which have internal_us and internal_ms fields, but no subframecount."""

    @property
    def datatimes_raw(self) -> NDArray:
        """Return a copy of the raw timestamp (posix usec) memory map.

        In mass issue #337, we found that computing on the entire memory map at once was prohibitively
        expensive for large files. To prevent problems, copy chunks of no more than
        `MAXSEGMENT` records at once.

        Returns
        -------
        np.ndarray
            An array of timestamp values for each pulse record, in microseconds since the epoh (1970).
        """
        usec = np.zeros(self.npulses, dtype=np.int64)
        mmap = self._mmap["internal_ms"]
        scale = 1000
        offset = round(self.header["Timestamp offset (s)"] * 1e6)

        MAXSEGMENT = 4096
        first = 0
        while first < self.npulses:
            last = min(first + MAXSEGMENT, self.npulses)
            usec[first:last] = mmap[first:last]
            first = last
        usec = usec * scale + offset

        # Add the 4 µs units found in LJH version 2.1
        assert self.dtype.names is not None and "internal_us" in self.dtype.names
        first = 0
        mmap = self._mmap["internal_us"]
        while first < self.npulses:
            last = min(first + MAXSEGMENT, self.npulses)
            usec[first:last] += mmap[first:last] * 4
            first = last

        return usec

    def to_polars(
        self, first_pulse: int = 0, keep_posix_usec: bool = False, force_continuous: bool = False
    ) -> tuple[pl.DataFrame, pl.DataFrame]:
        """Generate two Polars dataframes from this LJH file: one for the binary data, one for the header."""
        df, df_header = super().to_polars(first_pulse, keep_posix_usec, force_continuous=force_continuous)
        return df.select(pl.exclude("subframecount")), df_header

datatimes_raw property

Return a copy of the raw timestamp (posix usec) memory map.

In mass issue #337, we found that computing on the entire memory map at once was prohibitively expensive for large files. To prevent problems, copy chunks of no more than MAXSEGMENT records at once.

Returns:
  • ndarray –

    An array of timestamp values for each pulse record, in microseconds since the epoh (1970).

to_polars(first_pulse=0, keep_posix_usec=False, force_continuous=False)

Generate two Polars dataframes from this LJH file: one for the binary data, one for the header.

Source code in mass2/core/ljhfiles.py
474
475
476
477
478
479
def to_polars(
    self, first_pulse: int = 0, keep_posix_usec: bool = False, force_continuous: bool = False
) -> tuple[pl.DataFrame, pl.DataFrame]:
    """Generate two Polars dataframes from this LJH file: one for the binary data, one for the header."""
    df, df_header = super().to_polars(first_pulse, keep_posix_usec, force_continuous=force_continuous)
    return df.select(pl.exclude("subframecount")), df_header

LJHFile_2_2 dataclass

Bases: LJHFile

LJH files version 2.2 and later, which have subframecount and posix_usec fields.

Source code in mass2/core/ljhfiles.py
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
class LJHFile_2_2(LJHFile):
    """LJH files version 2.2 and later, which have subframecount and posix_usec fields."""

    @property
    def subframecount(self) -> NDArray:
        """Return a copy of the subframecount memory map.

        In mass issue #337, we found that computing on the entire memory map at once was prohibitively
        expensive for large files. To prevent problems, copy chunks of no more than
        `MAXSEGMENT` records at once.

        Returns
        -------
        np.ndarray
            An array of subframecount values for each pulse record.
        """
        subframecount = np.zeros(self.npulses, dtype=np.int64)
        mmap = self._mmap["subframecount"]
        MAXSEGMENT = 4096
        first = 0
        while first < self.npulses:
            last = min(first + MAXSEGMENT, self.npulses)
            subframecount[first:last] = mmap[first:last]
            first = last
        return subframecount

    @property
    def datatimes_raw(self) -> NDArray:
        """Return a copy of the raw timestamp (posix usec) memory map.

        In mass issue #337, we found that computing on the entire memory map at once was prohibitively
        expensive for large files. To prevent problems, copy chunks of no more than
        `MAXSEGMENT` records at once.

        Returns
        -------
        np.ndarray
            An array of timestamp values for each pulse record, in microseconds since the epoh (1970).
        """
        usec = np.zeros(self.npulses, dtype=np.int64)
        assert self.dtype.names is not None and "posix_usec" in self.dtype.names
        mmap = self._mmap["posix_usec"]

        MAXSEGMENT = 4096
        first = 0
        while first < self.npulses:
            last = min(first + MAXSEGMENT, self.npulses)
            usec[first:last] = mmap[first:last]
            first = last
        return usec

    @property
    def is_continuous(self) -> bool:
        """Is this LJH file made of a perfectly continuous data stream?

        We generally do take noise data in this mode, and it's useful to analyze the noise
        data by gluing many records together. This property says whether such gluing is valid.

        Returns
        -------
        bool
            Whether every record is strictly continuous with the ones before and after
        """
        if self.subframediv is None or self.npulses <= 1:
            return False
        expected_subframe_diff = self.nsamples * self.subframediv
        subframe = self._mmap["subframecount"]
        return np.max(np.diff(subframe)) <= expected_subframe_diff

datatimes_raw property

Return a copy of the raw timestamp (posix usec) memory map.

In mass issue #337, we found that computing on the entire memory map at once was prohibitively expensive for large files. To prevent problems, copy chunks of no more than MAXSEGMENT records at once.

Returns:
  • ndarray –

    An array of timestamp values for each pulse record, in microseconds since the epoh (1970).

is_continuous property

Is this LJH file made of a perfectly continuous data stream?

We generally do take noise data in this mode, and it's useful to analyze the noise data by gluing many records together. This property says whether such gluing is valid.

Returns:
  • bool –

    Whether every record is strictly continuous with the ones before and after

subframecount property

Return a copy of the subframecount memory map.

In mass issue #337, we found that computing on the entire memory map at once was prohibitively expensive for large files. To prevent problems, copy chunks of no more than MAXSEGMENT records at once.

Returns:
  • ndarray –

    An array of subframecount values for each pulse record.

Utility functions for handling and finding LJH files, and opening them as Channel or Channels objects.

experiment_state_path_from_ljh_path(ljh_path)

Find the experiment_state.txt file in the directory of the given ljh file.

Source code in mass2/core/ljhutil.py
169
170
171
172
173
174
175
176
def experiment_state_path_from_ljh_path(
    ljh_path: str | pathlib.Path,
) -> pathlib.Path:
    """Find the experiment_state.txt file in the directory of the given ljh file."""
    ljh_path = pathlib.Path(ljh_path)  # Convert to Path if it's a string
    base_name = ljh_path.name.split("_chan")[0]
    new_file_name = f"{base_name}_experiment_state.txt"
    return ljh_path.parent / new_file_name

external_trigger_bin_path_from_ljh_path(ljh_path)

Find the external_trigger.bin file in the directory of the given ljh file.

Source code in mass2/core/ljhutil.py
179
180
181
182
183
184
185
186
def external_trigger_bin_path_from_ljh_path(
    ljh_path: str | pathlib.Path,
) -> pathlib.Path:
    """Find the external_trigger.bin file in the directory of the given ljh file."""
    ljh_path = pathlib.Path(ljh_path)  # Convert to Path if it's a string
    base_name = ljh_path.name.split("_chan")[0]
    new_file_name = f"{base_name}_external_trigger.bin"
    return ljh_path.parent / new_file_name

extract_channel_number(file_path)

Extracts the channel number from the .ljh file name.

Args: - file_path (str): The path to the .ljh file.

Returns: - int: The channel number.

Source code in mass2/core/ljhutil.py
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
def extract_channel_number(file_path: str) -> int:
    """
    Extracts the channel number from the .ljh file name.

    Args:
    - file_path (str): The path to the .ljh file.

    Returns:
    - int: The channel number.
    """
    match = re.search(r"_chan(\d+)\..*$", file_path)
    if match:
        return int(match.group(1))
    else:
        raise ValueError(f"File path does not match expected pattern: {file_path}")

filename_glob_expand(pattern)

Return the result of glob-expansion on the input pattern.

:param pattern: Aglob pattern and return the glob-result as a list. :type pattern: str :return: filenames; the result is sorted first by str.sort, then by ljh_sort_filenames_numerically() :rtype: list

Source code in mass2/core/ljhutil.py
217
218
219
220
221
222
223
224
225
226
def filename_glob_expand(pattern: str) -> list[str]:
    """Return the result of glob-expansion on the input pattern.

    :param pattern: Aglob pattern and return the glob-result as a list.
    :type pattern: str
    :return: filenames; the result is sorted first by str.sort, then by ljh_sort_filenames_numerically()
    :rtype: list
    """
    result = glob.glob(pattern)
    return ljh_sort_filenames_numerically(result)

find_folders_with_extension(root_path, extensions)

Finds all folders within the root_path that contain at least one file with the given extension.

Args: - root_path (str): The root directory to start the search from. - extension (str): The file extension to search for (e.g., '.txt').

Returns: - list[str]: A list of paths to directories containing at least one file with the given extension.

Source code in mass2/core/ljhutil.py
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
def find_folders_with_extension(root_path: str, extensions: list[str]) -> list[str]:
    """
    Finds all folders within the root_path that contain at least one file with the given extension.

    Args:
    - root_path (str): The root directory to start the search from.
    - extension (str): The file extension to search for (e.g., '.txt').

    Returns:
    - list[str]: A list of paths to directories containing at least one file with the given extension.
    """
    matching_folders = set()

    # Walk through the directory tree
    for dirpath, _, filenames in os.walk(root_path):
        # Check if any file in the current directory has the given extension
        for filename in filenames:
            for extension in extensions:
                if filename.endswith(extension):
                    matching_folders.add(dirpath)
                    break  # No need to check further, move to the next directory

    return list(matching_folders)

find_ljh_files(folder, ext='.ljh', search_subdirectories=False, exclude_ch_nums=[], include_ch_nums=None)

Finds all files of a specific file extension in the given folder and (optionally) its subfolders.

An optional list of channel numbers can be excluded from the results. Also optionally, the results can be restricted only to a specific list of channel numbers.

Parameters:
  • folder (str | Path) –

    Folder to search for data files

  • ext (str, default: '.ljh' ) –

    The filename extension to search for, by default ".ljh"

  • search_subdirectories (bool, default: False ) –

    Whether to search the subdirectories of folder recursively, by default False

  • exclude_ch_nums (list[int], default: [] ) –

    List of channel numbers to exclude from the results, by default []

  • include_ch_nums (list[int] | None, default: None ) –

    If not None, then a list of channel # such that results are excluded if they don't appear in the list, by default None

Returns:
  • list[str] –

    A list of paths to .ljh files.

Raises:
  • ValueError –

    When the include_ch_nums list exists and contains one or more channels also in exclude_ch_nums.

Source code in mass2/core/ljhutil.py
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
def find_ljh_files(
    folder: str | pathlib.Path,
    ext: str = ".ljh",
    search_subdirectories: bool = False,
    exclude_ch_nums: list[int] = [],
    include_ch_nums: list[int] | None = None,
) -> list[str]:
    """Finds all files of a specific file extension in the given folder and (optionally) its subfolders.

    An optional list of channel numbers can be excluded from the results. Also optionally, the results
    can be restricted only to a specific list of channel numbers.

    Parameters
    ----------
    folder : str | pathlib.Path
        Folder to search for data files
    ext : str, optional
        The filename extension to search for, by default ".ljh"
    search_subdirectories : bool, optional
        Whether to search the subdirectories of `folder` recursively, by default False
    exclude_ch_nums : list[int], optional
        List of channel numbers to exclude from the results, by default []
    include_ch_nums : list[int] | None, optional
        If not None, then a list of channel # such that results are excluded if they don't appear in the list, by default None

    Returns
    -------
    list[str]
        A list of paths to .ljh files.

    Raises
    ------
    ValueError
        When the `include_ch_nums` list exists and contains one or more channels also in `exclude_ch_nums`.
    """
    if include_ch_nums is not None:
        overlap = set(include_ch_nums).intersection(exclude_ch_nums)
        if len(overlap) > 0:
            raise ValueError(f"exclude and include lists should not overlap, but both include channels {overlap}")

    folder = str(folder)
    ljh_files = []
    if search_subdirectories:
        pathgen = os.walk(folder)
    else:
        pathgen = zip([folder], [[""]], [os.listdir(folder)])
    for dirpath, _, filenames in pathgen:
        for filename in filenames:
            if filename.endswith(ext):
                ch_num = extract_channel_number(filename)
                if ch_num in exclude_ch_nums:
                    continue
                if include_ch_nums is None or (ch_num in include_ch_nums):
                    ljh_files.append(os.path.join(dirpath, filename))
    return ljh_files

helper_write_pulse(dest, src, i)

Write a single pulse from one LJHFile to another open file.

Source code in mass2/core/ljhutil.py
229
230
231
232
233
234
235
236
def helper_write_pulse(dest: BinaryIO, src: LJHFile, i: int) -> None:
    """Write a single pulse from one LJHFile to another open file."""
    subframecount, timestamp_usec, trace = src.read_trace_with_timing(i)
    prefix = struct.pack("<Q", int(subframecount))
    dest.write(prefix)
    prefix = struct.pack("<Q", int(timestamp_usec))
    dest.write(prefix)
    trace.tofile(dest, sep="")

ljh_append_traces(src_name, dest_name, pulses=None)

Append traces from one LJH file onto another. The destination file is assumed to be version 2.2.0.

Can be used to grab specific traces from some other ljh file, and append them onto an existing ljh file.

Args: src_name: the name of the source file dest_name: the name of the destination file pulses: indices of the pulses to copy (default: None, meaning copy all)

Source code in mass2/core/ljhutil.py
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
def ljh_append_traces(src_name: str, dest_name: str, pulses: range | None = None) -> None:
    """Append traces from one LJH file onto another. The destination file is
    assumed to be version 2.2.0.

    Can be used to grab specific traces from some other ljh file, and append them onto an existing ljh file.

    Args:
        src_name: the name of the source file
        dest_name: the name of the destination file
        pulses: indices of the pulses to copy (default: None, meaning copy all)
    """

    src = LJHFile.open(src_name)
    if pulses is None:
        pulses = range(src.npulses)
    with open(dest_name, "ab") as dest_fp:
        for i in pulses:
            helper_write_pulse(dest_fp, src, i)

ljh_merge(out_path, filenames, overwrite)

Merge a set of LJH files to a single output file.

Source code in mass2/core/ljhutil.py
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
def ljh_merge(out_path: str, filenames: list[str], overwrite: bool) -> None:
    """Merge a set of LJH files to a single output file."""
    if not overwrite and os.path.isfile(out_path):
        raise OSError(f"To overwrite destination {out_path}, use the --force flag")
    shutil.copy(filenames[0], out_path)
    f = LJHFile.open(out_path)
    channum = f.channum
    print(f"Combining {len(filenames)} LJH files from channel {channum}")
    print(f"<-- {filenames[0]}")

    for in_fname in filenames[1:]:
        f = LJHFile.open(in_fname)
        if f.channum != channum:
            raise RuntimeError(f"file '{in_fname}' channel={f.channum}, but want {channum}")
        print(f"<-- {in_fname}")
        ljh_append_traces(in_fname, out_path)

    size = os.stat(out_path).st_size
    print(f"--> {out_path}    size: {size} bytes.\n")

ljh_sort_filenames_numerically(fnames, inclusion_list=None)

Sort filenames of the form '_chanXXX.', according to the numerical value of channel number XXX.

Filenames are first sorted by the usual string comparisons, then by channel number. In this way, the standard sort is applied to all files with the same channel number.

:param fnames: A sequence of filenames of the form '_chan.*' :type fnames: list of str :param inclusion_list: If not None, a container with channel numbers. All files whose channel numbers are not on this list will be omitted from the output, defaults to None :type inclusion_list: sequence of int, optional :return: A list containg the same filenames, sorted according to the numerical value of channel number. :rtype: list

Source code in mass2/core/ljhutil.py
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
def ljh_sort_filenames_numerically(fnames: list[str], inclusion_list: list[int] | None = None) -> list[str]:
    """Sort filenames of the form '*_chanXXX.*', according to the numerical value of channel number XXX.

    Filenames are first sorted by the usual string comparisons, then by channel number. In this way,
    the standard sort is applied to all files with the same channel number.

    :param fnames: A sequence of filenames of the form '*_chan*.*'
    :type fnames: list of str
    :param inclusion_list: If not None, a container with channel numbers. All files
        whose channel numbers are not on this list will be omitted from the
        output, defaults to None
    :type inclusion_list: sequence of int, optional
    :return: A list containg the same filenames, sorted according to the numerical value of channel number.
    :rtype: list
    """
    if fnames is None or len(fnames) == 0:
        return []

    if inclusion_list is not None:
        fnames = list(filter(lambda n: extract_channel_number(n) in inclusion_list, fnames))

    # Sort the results first by raw filename, then sort numerically by LJH channel number.
    # Because string sort and the builtin `sorted` are both stable, we ensure that the first
    # sort is used to break ties in channel number.
    fnames.sort()
    return sorted(fnames, key=extract_channel_number)

ljh_truncate(input_filename, output_filename, n_pulses=None, timestamp=None)

Truncate an LJH file.

Writes a new copy of an LJH file, with the same header but fewer raw data pulses.

Arguments: input_filename -- name of file to truncate output_filename -- filename for truncated file n_pulses -- truncate to include only this many pulses (default None) timestamp -- truncate to include only pulses with timestamp earlier than this number (default None)

Exactly one of n_pulses and timestamp must be specified.

Source code in mass2/core/ljhutil.py
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
def ljh_truncate(input_filename: str, output_filename: str, n_pulses: int | None = None, timestamp: float | None = None) -> None:
    """Truncate an LJH file.

    Writes a new copy of an LJH file, with the same header but fewer raw data pulses.

    Arguments:
    input_filename  -- name of file to truncate
    output_filename -- filename for truncated file
    n_pulses        -- truncate to include only this many pulses (default None)
    timestamp       -- truncate to include only pulses with timestamp earlier
                       than this number (default None)

    Exactly one of n_pulses and timestamp must be specified.
    """

    if (n_pulses is None and timestamp is None) or (n_pulses is not None and timestamp is not None):
        msg = "Must specify exactly one of n_pulses, timestamp."
        msg += f" Values were {str(n_pulses)}, {str(timestamp)}"
        raise Exception(msg)

    # Check for file problems, then open the input and output LJH files.
    if os.path.exists(output_filename):
        if os.path.samefile(input_filename, output_filename):
            msg = f"Input '{input_filename}' and output '{output_filename}' are the same file, which is not allowed"
            raise ValueError(msg)

    infile = LJHFile.open(input_filename)
    if infile.ljh_version < Version("2.2.0"):
        raise Exception(f"Don't know how to truncate this LJH version [{infile.ljh_version}]")

    with open(output_filename, "wb") as outfile:
        # write the header as a single string.
        for k, v in infile.header.items():
            outfile.write(bytes(f"{k}: {v}\n", encoding="utf-8"))
        outfile.write(b"#End of Header\n")

        # Write pulses.
        if n_pulses is None:
            n_pulses = infile.npulses
        for i in range(n_pulses):
            if timestamp is not None and infile.datatimes_float[i] > timestamp:
                break
            prefix = struct.pack("<Q", np.uint64(infile.subframecount[i]))
            outfile.write(prefix)
            prefix = struct.pack("<Q", np.uint64(infile.datatimes_raw[i]))
            outfile.write(prefix)
            trace = infile.read_trace(i)
            trace.tofile(outfile, sep="")

main_ljh_merge()

Merge all LJH files that match a pattern to a single output file.

The idea is that all such files come from a single TES and could have been (but were not) written as a single continuous file.

The pattern should be of the form "blah_blah_*_chan1.ljh" or something. The output will then be "merged_chan1.ljh" in the directory of the first file found (or alter the directory with the --outdir argument). It is not (currently) possible to merge data from LJH files that represent channels with different numbers.

Source code in mass2/core/ljhutil.py
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
def main_ljh_merge() -> None:
    """
    Merge all LJH files that match a pattern to a single output file.

    The idea is that all such files come from a single TES and could have been
    (but were not) written as a single continuous file.

    The pattern should be of the form "blah_blah_*_chan1.ljh" or something.
    The output will then be "merged_chan1.ljh" in the directory of the first file found
    (or alter the directory with the --outdir argument). It is not (currently) possible to
    merge data from LJH files that represent channels with different numbers.
    """
    parser = argparse.ArgumentParser(
        description="Merge a set of LJH files",
        epilog="Beware! Python glob does not perform brace-expansion, so braces must be expanded by the shell.",
    )
    parser.add_argument(
        "patterns", type=str, nargs="+", help='glob pattern of files to process, e.g. "20171116_*_chan1.ljh" (suggest double quotes)'
    )

    parser.add_argument(
        "-d",
        "--outdir",
        type=str,
        default="",
        help="directory to place output file (default: same as directory of first file to be merged",
    )
    # TODO: add way to control the output _filename_
    parser.add_argument("-F", "--force", action="store_true", help="force overwrite of existing target? (default: False)")
    parser.add_argument("-v", "--verbose", action="store_true", help="list files found before merging (default: False)")
    parser.add_argument("-n", "--dry-run", action="store_true", help="list files found, then quit without merging (default: False)")

    args = parser.parse_args()

    filenames: list[str] = []
    for pattern in args.patterns:
        filenames.extend(filename_glob_expand(pattern))
    assert len(filenames) > 0
    if args.verbose or args.dry_run:
        print(f"Will expand the following {len(filenames)} files:")
        for f in filenames:
            print("  - ", f)
        if args.dry_run:
            return

    ljh = LJHFile.open(filenames[0])
    channum = ljh.channum

    out_dir = args.outdir
    if not out_dir:
        out_dir = os.path.split(filenames[0])[0]
    out_path = os.path.join(out_dir, f"merged_chan{channum}.ljh")

    overwrite: bool = args.force
    ljh_merge(out_path, filenames, overwrite=overwrite)

main_ljh_truncate()

A convenience script to truncate all LJH files that match a pattern, writing a new LJH file for each that contains only the first N pulse records.

Source code in mass2/core/ljhutil.py
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
def main_ljh_truncate() -> None:
    """
    A convenience script to truncate all LJH files that match a pattern, writing a new LJH file for each
    that contains only the first N pulse records.
    """
    parser = argparse.ArgumentParser(description="Truncate a set of LJH files")
    parser.add_argument("pattern", type=str, help="basename of files to process, e.g. 20171116_152922")
    parser.add_argument("out", type=str, help="string to append to basename when creating output filename")
    group = parser.add_mutually_exclusive_group(required=True)
    group.add_argument("--npulses", type=int, help="Number of pulses to keep")
    group.add_argument("--timestamp", type=float, help="Keep only pulses before this timestamp")
    args = parser.parse_args()

    pattern = f"{args.pattern}_chan*.ljh"

    filenames = filename_glob_expand(pattern)
    for in_fname in filenames:
        matches = re.search(r"chan(\d+)\.ljh", in_fname)
        if matches:
            ch = matches.groups()[0]
            out_fname = f"{args.pattern}_{args.out}_chan{ch}.ljh"
            ljh_truncate(in_fname, out_fname, n_pulses=args.npulses, timestamp=args.timestamp)

match_files_by_channel(folder1, folder2, limit=None, exclude_ch_nums=[], include_ch_nums=None)

Matches .ljh files from two folders by channel number.

Args: - folder1 (str): The first root directory. - folder2 (str): The second root directory.

Returns: - list[Iterator[tuple[str, str]]]: A list of iterators, each containing pairs of paths with matching channel numbers.

Raises:
  • ValueError –

    When the include_ch_nums list exists and contains one or more channels also in exclude_ch_nums.

Source code in mass2/core/ljhutil.py
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
def match_files_by_channel(
    folder1: str, folder2: str, limit: int | None = None, exclude_ch_nums: list[int] = [], include_ch_nums: list[int] | None = None
) -> list[tuple[str, str]]:
    """
    Matches .ljh files from two folders by channel number.

    Args:
    - folder1 (str): The first root directory.
    - folder2 (str): The second root directory.

    Returns:
    - list[Iterator[tuple[str, str]]]: A list of iterators, each containing pairs of paths with matching channel numbers.

    Raises
    ------
    ValueError
        When the `include_ch_nums` list exists and contains one or more channels also in `exclude_ch_nums`.
    """
    files1 = find_ljh_files(folder1, exclude_ch_nums=exclude_ch_nums, include_ch_nums=include_ch_nums)
    files2 = find_ljh_files(folder2, exclude_ch_nums=exclude_ch_nums, include_ch_nums=include_ch_nums)
    # print(f"in folder {folder1} found {len(files1)} files")
    # print(f"in folder {folder2} found {len(files2)} files")

    def collect_to_dict_error_on_repeat_channel(files: list[str]) -> dict:
        """
        Collects files into a dictionary by channel number, raising an error if a channel number is repeated.
        """
        files_by_channel: dict[int, str] = {}
        for file in files:
            channel = extract_channel_number(file)
            if channel in files_by_channel.keys():
                existing_file = files_by_channel[channel]
                raise ValueError(f"Duplicate channel number found: {channel} in file {file} and already in {existing_file}")
            files_by_channel[channel] = file
        return files_by_channel

    # we could have repeat channels even in the same folder, so we should error on that
    files1_by_channel = collect_to_dict_error_on_repeat_channel(files1)
    files2_by_channel = collect_to_dict_error_on_repeat_channel(files2)

    matching_pairs = []
    for channel in sorted(files1_by_channel.keys()):
        if channel in files2_by_channel.keys():
            matching_pairs.append((files1_by_channel[channel], files2_by_channel[channel]))
    if limit is not None:
        matching_pairs = matching_pairs[:limit]
    return matching_pairs

Miscellaneous utility functions used in mass2 for plotting, pickling, statistics, and DataFrame manipulation.

alwaysTrue()

alwaysTrue: a factory function to generate a new copy of polars literal True for class construction

Returns:
  • Expr –

    Literal True

Source code in mass2/core/misc.py
210
211
212
213
214
215
216
217
218
def alwaysTrue() -> pl.Expr:
    """alwaysTrue: a factory function to generate a new copy of polars literal True for class construction

    Returns
    -------
    pl.Expr
        Literal True
    """
    return pl.lit(True)

concat_dfs_with_concat_state(df1, df2, concat_state_col='concat_state')

Concatenate two DataFrames vertically, adding a column concat_state (or named according to concat_state_col) to indicate which DataFrame each row came from.

Source code in mass2/core/misc.py
177
178
179
180
181
182
183
184
185
186
187
188
189
190
def concat_dfs_with_concat_state(df1: pl.DataFrame, df2: pl.DataFrame, concat_state_col: str = "concat_state") -> pl.DataFrame:
    """Concatenate two DataFrames vertically, adding a column `concat_state` (or named according to `concat_state_col`)
    to indicate which DataFrame each row came from."""
    if concat_state_col in df1.columns:
        # Continue incrementing from the last known concat_state
        max_state = df1[concat_state_col][-1]
        df2 = df2.with_columns(pl.lit(max_state + 1).alias(concat_state_col))
    else:
        # Fresh concat: label first as 0, second as 1
        df1 = df1.with_columns(pl.lit(0).alias(concat_state_col))
        df2 = df2.with_columns(pl.lit(1).alias(concat_state_col))

    df_out = pl.concat([df1, df2], how="vertical")
    return df_out

extract_column_names_from_polars_expr(expr)

Recursively extract all column names from a Polars expression.

Source code in mass2/core/misc.py
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
def extract_column_names_from_polars_expr(expr: pl.Expr) -> list[str]:
    """Recursively extract all column names from a Polars expression."""
    names = set()
    if hasattr(expr, "meta"):
        meta = expr.meta
        if hasattr(meta, "root_names"):
            # For polars >=0.19.0
            names.update(meta.root_names())
        elif hasattr(meta, "output_name"):
            # For older polars
            names.add(meta.output_name())
        if hasattr(meta, "inputs"):
            for subexpr in meta.inputs():
                names.update(extract_column_names_from_polars_expr(subexpr))
    return list(names)

good_series(df, col, good_expr, use_expr)

Return a Series from the given DataFrame column, filtered by the given good_expr and use_expr.

Source code in mass2/core/misc.py
50
51
52
53
54
55
56
57
def good_series(df: pl.DataFrame, col: str, good_expr: pl.Expr, use_expr: bool | pl.Expr) -> pl.Series:
    """Return a Series from the given DataFrame column, filtered by the given good_expr and use_expr."""
    # This uses lazy before filtering. We hope this will allow polars to only access the data needed to filter
    # and the data needed to output what we want.
    good_df = df.lazy().filter(good_expr)
    if use_expr is not True:
        good_df = good_df.filter(use_expr)
    return good_df.select(pl.col(col)).collect().to_series()

hist_of_series(series, bin_edges)

Return the bin centers and counts of a histogram of the given Series using the given bin edges.

Source code in mass2/core/misc.py
 96
 97
 98
 99
100
101
def hist_of_series(series: pl.Series, bin_edges: ArrayLike) -> tuple[NDArray, NDArray]:
    """Return the bin centers and counts of a histogram of the given Series using the given bin edges."""
    bin_edges = np.asarray(bin_edges)
    bin_centers, _ = midpoints_and_step_size(bin_edges)
    counts = series.rename("count").hist(list(bin_edges), include_category=False, include_breakpoint=False)
    return bin_centers, counts.to_numpy().T[0]

launch_examples()

Launch marimo edit in the examples folder.

Source code in mass2/core/misc.py
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
def launch_examples() -> None:
    """Launch marimo edit in the examples folder."""
    examples_folder = pathlib.Path(__file__).parent.parent.parent / "examples"
    # use relative path to avoid this bug: https://github.com/marimo-team/marimo/issues/1895
    examples_folder_relative = str(examples_folder.relative_to(pathlib.Path.cwd()))
    # Prepare the command
    command = ["marimo", "edit", examples_folder_relative] + sys.argv[1:]

    # Execute the command
    print(f"launching marimo edit in {examples_folder_relative}")
    try:
        # Execute the command and directly forward stdout and stderr
        process = subprocess.Popen(command, stdout=sys.stdout, stderr=sys.stderr)
        process.communicate()

    except KeyboardInterrupt:
        # Handle cleanup on Ctrl-C
        try:
            process.terminate()
        except OSError:
            pass
        process.wait()
        sys.exit(1)

    # Check if the command was successful
    if process.returncode != 0:
        sys.exit(process.returncode)

median_absolute_deviation(x)

Return the median absolute deviation of the input, unnormalized.

Source code in mass2/core/misc.py
60
61
62
63
def median_absolute_deviation(x: ArrayLike) -> float:
    """Return the median absolute deviation of the input, unnormalized."""
    x = np.asarray(x)
    return float(np.median(np.abs(x - np.median(x))))

merge_dicts_ordered_by_keys(dict1, dict2)

Merge two dictionaries and return a new dictionary with items ordered by key.

Source code in mass2/core/misc.py
163
164
165
166
167
168
169
170
171
172
173
174
def merge_dicts_ordered_by_keys(dict1: dict[int, Any], dict2: dict[int, Any]) -> dict[int, Any]:
    """Merge two dictionaries and return a new dictionary with items ordered by key."""
    # Combine both dictionaries' items (key, value) into a list of tuples
    combined_items = list(dict1.items()) + list(dict2.items())

    # Sort the combined list of tuples by key
    combined_items.sort(key=lambda item: item[0])

    # Convert the sorted list of tuples back into a dictionary
    merged_dict: dict[int, Any] = {key: value for key, value in combined_items}

    return merged_dict

midpoints_and_step_size(x)

return midpoints, step_size for bin edges x

Source code in mass2/core/misc.py
87
88
89
90
91
92
93
def midpoints_and_step_size(x: ArrayLike) -> tuple[NDArray, float]:
    """return midpoints, step_size for bin edges x"""
    x = np.asarray(x)
    d = np.diff(x)
    step_size = float(d[0])
    assert np.allclose(d, step_size, atol=1e-9), f"{d=}"
    return x[:-1] + step_size, step_size

outlier_resistant_nsigma_above_mid(x, nsigma=5)

RReturn the value that is nsigma median absolute deviations (MADs) above the median of the input.

Source code in mass2/core/misc.py
72
73
74
75
76
def outlier_resistant_nsigma_above_mid(x: ArrayLike, nsigma: float = 5) -> float:
    """RReturn the value that is `nsigma` median absolute deviations (MADs) above the median of the input."""
    x = np.asarray(x)
    mid = np.median(x)
    return mid + nsigma * sigma_mad(x)

outlier_resistant_nsigma_range_from_mid(x, nsigma=5)

Return the values that are nsigma median absolute deviations (MADs) below and above the median of the input

Source code in mass2/core/misc.py
79
80
81
82
83
84
def outlier_resistant_nsigma_range_from_mid(x: ArrayLike, nsigma: float = 5) -> tuple[float, float]:
    """Return the values that are `nsigma` median absolute deviations (MADs) below and above the median of the input"""
    x = np.asarray(x)
    mid = np.median(x)
    smad = sigma_mad(x)
    return mid - nsigma * smad, mid + nsigma * smad

pickle_object(obj, filename)

Pickle the given object to the given filename using dill. Mass2 Recipe objects are compatible with dill but not with the standard pickle module.

Source code in mass2/core/misc.py
25
26
27
28
29
def pickle_object(obj: Any, filename: str | Path) -> None:
    """Pickle the given object to the given filename using dill.
    Mass2 Recipe objects are compatible with `dill` but _not_ with the standard `pickle` module."""
    with open(filename, "wb") as file:
        dill.dump(obj, file)

plot_a_vs_b_series(a, b, axis=None, **plotkwarg)

Plot the two given Series as a scatterplot on the given axis (or a new one if None).

Source code in mass2/core/misc.py
118
119
120
121
122
123
124
125
def plot_a_vs_b_series(a: pl.Series, b: pl.Series, axis: plt.Axes | None = None, **plotkwarg: dict) -> None:
    """Plot the two given Series as a scatterplot on the given axis (or a new one if None)."""
    if axis is None:
        plt.figure()
        axis = plt.gca()
    axis.plot(a, b, ".", label=b.name, **plotkwarg)
    axis.set_xlabel(a.name)
    axis.set_ylabel(b.name)

plot_hist_of_series(series, bin_edges, axis=None, **plotkwarg)

Plot a histogram of the given Series using the given bin edges on the given axis (or a new one if None).

Source code in mass2/core/misc.py
104
105
106
107
108
109
110
111
112
113
114
115
def plot_hist_of_series(series: pl.Series, bin_edges: ArrayLike, axis: plt.Axes | None = None, **plotkwarg: dict) -> plt.Axes:
    """Plot a histogram of the given Series using the given bin edges on the given axis (or a new one if None)."""
    if axis is None:
        plt.figure()
        axis = plt.gca()
    bin_edges = np.asarray(bin_edges)
    bin_centers, step_size = midpoints_and_step_size(bin_edges)
    hist = series.rename("count").hist(list(bin_edges), include_category=False, include_breakpoint=False)
    axis.plot(bin_centers, hist, label=series.name, **plotkwarg)
    axis.set_xlabel(series.name)
    axis.set_ylabel(f"counts per {step_size:.2f} unit bin")
    return axis

root_mean_squared(x, axis=None)

Return the root mean square of the input along the given axis or axes. Does not subtract the mean first.

Source code in mass2/core/misc.py
157
158
159
160
def root_mean_squared(x: ArrayLike, axis: int | tuple[int] | None = None) -> float:
    """Return the root mean square of the input along the given axis or axes.
    Does _not_ subtract the mean first."""
    return np.sqrt(np.mean(np.asarray(x) ** 2, axis))

show(fig=None)

Create a Marimo interactive view of the given Matplotlib figure (or the current figure if None).

Source code in mass2/core/misc.py
18
19
20
21
22
def show(fig: plt.Figure | None = None) -> mo.Html:
    """Create a Marimo interactive view of the given Matplotlib figure (or the current figure if None)."""
    if fig is None:
        fig = plt.gcf()
    return mo.mpl.interactive(fig)

sigma_mad(x)

Return the nomrlized median absolute deviation of the input, rescaled to give the standard deviation if distribution is Gaussian. This method is more robust to outliers than calculating the standard deviation directly.

Source code in mass2/core/misc.py
66
67
68
69
def sigma_mad(x: ArrayLike) -> float:
    """Return the nomrlized median absolute deviation of the input, rescaled to give the standard deviation
    if distribution is Gaussian. This method is more robust to outliers than calculating the standard deviation directly."""
    return median_absolute_deviation(x) * 1.4826

smallest_positive_real(arr)

Return the smallest positive real number in the given array-like object.

Source code in mass2/core/misc.py
39
40
41
42
43
44
45
46
47
def smallest_positive_real(arr: ArrayLike) -> float:
    """Return the smallest positive real number in the given array-like object."""

    def is_positive_real(x: Any) -> bool:
        "Is `x` a positive real number?"
        return x > 0 and np.isreal(x)

    positive_real_numbers = np.array(list(filter(is_positive_real, np.asarray(arr))))
    return np.min(positive_real_numbers)

unpickle_object(filename)

Unpickle an object from the given filename using dill.

Source code in mass2/core/misc.py
32
33
34
35
36
def unpickle_object(filename: str | Path) -> Any:
    """Unpickle an object from the given filename using dill."""
    with open(filename, "rb") as file:
        obj = dill.load(file)
        return obj

Tools for fitting multiple spectral lines in a single pass.

FitSpec dataclass

Specification of a single line fit within a MultiFit.

Source code in mass2/core/multifit.py
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
@dataclass(frozen=True)
class FitSpec:
    """Specification of a single line fit within a MultiFit."""

    model: GenericLineModel
    bin_edges: np.ndarray
    use_expr: pl.Expr
    params_update: lmfit.parameter.Parameters

    def params(self, bin_centers: NDArray, counts: NDArray) -> lmfit.Parameters:
        """Return a reasonable guess at the parameters given the spectrum to be fit."""
        params = self.model.make_params()
        params = self.model.guess(counts, bin_centers=bin_centers, dph_de=1)
        params["dph_de"].set(1.0, vary=False)
        params = params.update(self.params_update)
        return params

    def fit_series_without_use_expr(self, series: pl.Series) -> LineModelResult:
        """Fit the given Series without applying a use_expr filter."""
        bin_centers, counts = mass2.misc.hist_of_series(series, self.bin_edges)
        params = self.params(bin_centers, counts)
        bin_centers, bin_size = mass2.misc.midpoints_and_step_size(self.bin_edges)
        result = self.model.fit(counts, params, bin_centers=bin_centers)
        result.set_label_hints(
            binsize=bin_size,
            ds_shortname="??",
            unit_str="eV",
            attr_str=series.name,
            states_hint=f"{self.use_expr}",
            cut_hint="",
        )
        return result

    def fit_df(self, df: pl.DataFrame, col: str, good_expr: pl.Expr) -> LineModelResult:
        """Fit the given DataFrame column `col` after applying good_expr and use_expr filters."""
        series = mass2.misc.good_series(df, col, good_expr, use_expr=self.use_expr)
        return self.fit_series_without_use_expr(series)

    def fit_ch(self, ch: "Channel", col: str) -> LineModelResult:
        """Fit the given Channel's DataFrame column `col` after applying the Channel's good_expr and
        this FitSpec's use_expr filters."""
        return self.fit_df(ch.df, col, ch.good_expr)

fit_ch(ch, col)

Fit the given Channel's DataFrame column col after applying the Channel's good_expr and this FitSpec's use_expr filters.

Source code in mass2/core/multifit.py
76
77
78
79
def fit_ch(self, ch: "Channel", col: str) -> LineModelResult:
    """Fit the given Channel's DataFrame column `col` after applying the Channel's good_expr and
    this FitSpec's use_expr filters."""
    return self.fit_df(ch.df, col, ch.good_expr)

fit_df(df, col, good_expr)

Fit the given DataFrame column col after applying good_expr and use_expr filters.

Source code in mass2/core/multifit.py
71
72
73
74
def fit_df(self, df: pl.DataFrame, col: str, good_expr: pl.Expr) -> LineModelResult:
    """Fit the given DataFrame column `col` after applying good_expr and use_expr filters."""
    series = mass2.misc.good_series(df, col, good_expr, use_expr=self.use_expr)
    return self.fit_series_without_use_expr(series)

fit_series_without_use_expr(series)

Fit the given Series without applying a use_expr filter.

Source code in mass2/core/multifit.py
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def fit_series_without_use_expr(self, series: pl.Series) -> LineModelResult:
    """Fit the given Series without applying a use_expr filter."""
    bin_centers, counts = mass2.misc.hist_of_series(series, self.bin_edges)
    params = self.params(bin_centers, counts)
    bin_centers, bin_size = mass2.misc.midpoints_and_step_size(self.bin_edges)
    result = self.model.fit(counts, params, bin_centers=bin_centers)
    result.set_label_hints(
        binsize=bin_size,
        ds_shortname="??",
        unit_str="eV",
        attr_str=series.name,
        states_hint=f"{self.use_expr}",
        cut_hint="",
    )
    return result

params(bin_centers, counts)

Return a reasonable guess at the parameters given the spectrum to be fit.

Source code in mass2/core/multifit.py
47
48
49
50
51
52
53
def params(self, bin_centers: NDArray, counts: NDArray) -> lmfit.Parameters:
    """Return a reasonable guess at the parameters given the spectrum to be fit."""
    params = self.model.make_params()
    params = self.model.guess(counts, bin_centers=bin_centers, dph_de=1)
    params["dph_de"].set(1.0, vary=False)
    params = params.update(self.params_update)
    return params

MultiFit dataclass

Specification of multiple emission-line fits to be done in one pass.

Source code in mass2/core/multifit.py
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
@dataclass(frozen=True)
class MultiFit:
    """Specification of multiple emission-line fits to be done in one pass."""

    default_fit_width: float = 50
    default_bin_size: float = 0.5
    default_use_expr: pl.Expr = field(default_factory=alwaysTrue)
    default_params_update: dict = field(default_factory=lmfit.Parameters)
    fitspecs: list[FitSpec] = field(default_factory=list)
    results: list | None = None

    def with_line(
        self,
        line: GenericLineModel | SpectralLine | str | float,
        dlo: float | None = None,
        dhi: float | None = None,
        bin_size: float | None = None,
        use_expr: pl.Expr | None = None,
        params_update: lmfit.Parameters | None = None,
    ) -> "MultiFit":
        """Return a copy of this MultiFit with an additional FitSpec for the given line."""
        model = get_model(line)
        peak_energy = model.spect.peak_energy
        dlo = handle_none(dlo, self.default_fit_width / 2)
        dhi = handle_none(dhi, self.default_fit_width / 2)
        bin_size = handle_none(bin_size, self.default_bin_size)
        params_update = handle_none(params_update, self.default_params_update)
        use_expr = handle_none(use_expr, self.default_use_expr)
        bin_edges = np.arange(-dlo, dhi + bin_size, bin_size) + peak_energy
        fitspec = FitSpec(model, bin_edges, use_expr, params_update)
        return self.with_fitspec(fitspec)

    def with_fitspec(self, fitspec: FitSpec) -> "MultiFit":
        """Return a copy of this MultiFit with a new FitSpec added."""
        # make sure they're always sorted by energy
        newfitspecs = sorted(self.fitspecs + [fitspec], key=lambda x: x.model.spect.peak_energy)
        return dataclasses.replace(self, fitspecs=newfitspecs)

    def with_results(self, results: list) -> "MultiFit":
        """Return a copy of this MultiFit with the given results added."""
        return dataclasses.replace(self, results=results)

    def results_params_as_df(self) -> pl.DataFrame:
        """Return a DataFrame made from the fit parameters from the results."""
        assert self.results is not None
        result = self.results[0]
        param_names = result.params.keys()
        d: dict[str, list] = {}
        d["line"] = [fitspec.model.spect.shortname for fitspec in self.fitspecs]
        d["peak_energy_ref"] = [fitspec.model.spect.peak_energy for fitspec in self.fitspecs]
        d["peak_energy_ref_err"] = []
        # for quickline, position_uncertainty is a string
        # translate that into a large value for uncertainty so we can proceed without crashing
        for fitspec in self.fitspecs:
            if isinstance(fitspec.model.spect.position_uncertainty, str):
                v = 0.1 * fitspec.model.spect.peak_energy  # 10% error is large!
            else:
                v = fitspec.model.spect.position_uncertainty
            d["peak_energy_ref_err"].append(v)
        for param_name in param_names:
            d[param_name] = [result.params[param_name].value for result in self.results]
            d[param_name + "_stderr"] = [result.params[param_name].stderr for result in self.results]
        return pl.DataFrame(d)

    def fit_series_without_use_expr(self, series: pl.Series) -> "MultiFit":
        "Fit all the FitSpecs in this MultiFit to the given Series without applying any use_expr filter."
        results = [fitspec.fit_series_without_use_expr(series) for fitspec in self.fitspecs]
        return self.with_results(results)

    def fit_df(self, df: pl.DataFrame, col: str, good_expr: pl.Expr) -> "MultiFit":
        """Fit all the FitSpecs in this MultiFit to the given DataFrame column `col` after applying good_expr filter."""
        results = []
        for fitspec in self.fitspecs:
            result = fitspec.fit_df(df, col, good_expr)
            results.append(result)
        return self.with_results(results)

    def fit_ch(self, ch: "Channel", col: str) -> "MultiFit":
        """Fit all the FitSpecs in this MultiFit to the given Channel's DataFrame column `col`
        after applying the Channel's good_expr filter."""
        return self.fit_df(ch.df, col, ch.good_expr)

    def plot_results(self, n_extra_axes: int = 0) -> tuple[plt.Figure, plt.Axes]:
        """Plot all the fit results in subplots, with n_extra_axes empty subplots included at the end."""
        assert self.results is not None
        n = len(self.results) + n_extra_axes
        cols = min(3, n)
        rows = math.ceil(n / cols)
        fig, axes = plt.subplots(rows, cols, figsize=(cols * 4, rows * 4))  # Adjust figure size as needed

        # If there's only one subplot, axes is not a list but a single Axes object.
        if rows == 1 and cols == 1:
            axes = [axes]
        elif rows == 1 or cols == 1:
            axes = axes.flatten()
        else:
            axes = axes.ravel()

        for result, ax in zip(self.results, axes):
            result.plotm(ax=ax)

        # Hide any remaining empty subplots
        for ax in axes[n:]:
            ax.axis("off")

        plt.tight_layout()
        return fig, axes

    def plot_results_and_pfit(self, uncalibrated_name: str, previous_energy2ph: Callable, n_extra_axes: int = 0) -> plt.Axes:
        """Plot all the fit results in subplots, and also plot the gain curve on an extra axis."""
        assert self.results is not None
        _fig, axes = self.plot_results(n_extra_axes=1 + n_extra_axes)
        ax = axes[len(self.results)]
        multifit_df = self.results_params_as_df()
        peaks_in_energy_rough_cal = multifit_df["peak_ph"].to_numpy()
        peaks_uncalibrated = previous_energy2ph(peaks_in_energy_rough_cal)
        peaks_in_energy_reference = multifit_df["peak_energy_ref"].to_numpy()
        pfit_gain, rms_residual_energy = self.to_pfit_gain(previous_energy2ph)
        plt.sca(ax)
        x = np.linspace(0, np.amax(peaks_uncalibrated), 100)
        plt.plot(x, pfit_gain(x), "k", label="fit")
        gain = peaks_uncalibrated / peaks_in_energy_reference
        plt.plot(peaks_uncalibrated, gain, "o")
        plt.xlabel(uncalibrated_name)
        plt.ylabel("gain")
        plt.title(f"{rms_residual_energy=:.3f}")
        for name, x, y in zip(multifit_df["line"], peaks_uncalibrated, gain):
            ax.annotate(str(name), (x, y))
        return axes

    def to_pfit_gain(self, previous_energy2ph: Callable) -> tuple[np.polynomial.Polynomial, float]:
        """Return a best-fit 2nd degree polynomial for gain (ph/energy) vs uncalibrated ph,
        and the rms residual in energy after applying that gain correction."""
        multifit_df = self.results_params_as_df()
        peaks_in_energy_rough_cal = multifit_df["peak_ph"].to_numpy()
        peaks_uncalibrated = np.array([previous_energy2ph(e) for e in peaks_in_energy_rough_cal]).ravel()
        peaks_in_energy_reference = multifit_df["peak_energy_ref"].to_numpy()
        gain = peaks_uncalibrated / peaks_in_energy_reference
        pfit_gain = np.polynomial.Polynomial.fit(peaks_uncalibrated, gain, deg=2)

        def ph2energy(ph: NDArray) -> NDArray:
            "Given an array `ph` of pulse heights, return the corresponding energies as an array."
            gain = pfit_gain(ph)
            return ph / gain

        e_predicted = ph2energy(peaks_uncalibrated)
        rms_residual_energy = mass2.misc.root_mean_squared(e_predicted - peaks_in_energy_reference)
        return pfit_gain, rms_residual_energy

    def to_mass_cal(
        self, previous_energy2ph: Callable, curvetype: Curvetypes = Curvetypes.GAIN, approximate: bool = False
    ) -> EnergyCalibration:
        """Return a calibration object made from the fit results in this MultiFit."""
        df = self.results_params_as_df()
        maker = EnergyCalibrationMaker(
            ph=np.array([previous_energy2ph(x) for x in df["peak_ph"].to_numpy()]),
            energy=df["peak_energy_ref"].to_numpy(),
            dph=df["peak_ph_stderr"].to_numpy(),
            de=df["peak_energy_ref_err"].to_numpy(),
            names=[name for name in df["line"]],
        )
        cal = maker.make_calibration(curvename=curvetype, approximate=approximate)
        return cal

fit_ch(ch, col)

Fit all the FitSpecs in this MultiFit to the given Channel's DataFrame column col after applying the Channel's good_expr filter.

Source code in mass2/core/multifit.py
159
160
161
162
def fit_ch(self, ch: "Channel", col: str) -> "MultiFit":
    """Fit all the FitSpecs in this MultiFit to the given Channel's DataFrame column `col`
    after applying the Channel's good_expr filter."""
    return self.fit_df(ch.df, col, ch.good_expr)

fit_df(df, col, good_expr)

Fit all the FitSpecs in this MultiFit to the given DataFrame column col after applying good_expr filter.

Source code in mass2/core/multifit.py
151
152
153
154
155
156
157
def fit_df(self, df: pl.DataFrame, col: str, good_expr: pl.Expr) -> "MultiFit":
    """Fit all the FitSpecs in this MultiFit to the given DataFrame column `col` after applying good_expr filter."""
    results = []
    for fitspec in self.fitspecs:
        result = fitspec.fit_df(df, col, good_expr)
        results.append(result)
    return self.with_results(results)

fit_series_without_use_expr(series)

Fit all the FitSpecs in this MultiFit to the given Series without applying any use_expr filter.

Source code in mass2/core/multifit.py
146
147
148
149
def fit_series_without_use_expr(self, series: pl.Series) -> "MultiFit":
    "Fit all the FitSpecs in this MultiFit to the given Series without applying any use_expr filter."
    results = [fitspec.fit_series_without_use_expr(series) for fitspec in self.fitspecs]
    return self.with_results(results)

plot_results(n_extra_axes=0)

Plot all the fit results in subplots, with n_extra_axes empty subplots included at the end.

Source code in mass2/core/multifit.py
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
def plot_results(self, n_extra_axes: int = 0) -> tuple[plt.Figure, plt.Axes]:
    """Plot all the fit results in subplots, with n_extra_axes empty subplots included at the end."""
    assert self.results is not None
    n = len(self.results) + n_extra_axes
    cols = min(3, n)
    rows = math.ceil(n / cols)
    fig, axes = plt.subplots(rows, cols, figsize=(cols * 4, rows * 4))  # Adjust figure size as needed

    # If there's only one subplot, axes is not a list but a single Axes object.
    if rows == 1 and cols == 1:
        axes = [axes]
    elif rows == 1 or cols == 1:
        axes = axes.flatten()
    else:
        axes = axes.ravel()

    for result, ax in zip(self.results, axes):
        result.plotm(ax=ax)

    # Hide any remaining empty subplots
    for ax in axes[n:]:
        ax.axis("off")

    plt.tight_layout()
    return fig, axes

plot_results_and_pfit(uncalibrated_name, previous_energy2ph, n_extra_axes=0)

Plot all the fit results in subplots, and also plot the gain curve on an extra axis.

Source code in mass2/core/multifit.py
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
def plot_results_and_pfit(self, uncalibrated_name: str, previous_energy2ph: Callable, n_extra_axes: int = 0) -> plt.Axes:
    """Plot all the fit results in subplots, and also plot the gain curve on an extra axis."""
    assert self.results is not None
    _fig, axes = self.plot_results(n_extra_axes=1 + n_extra_axes)
    ax = axes[len(self.results)]
    multifit_df = self.results_params_as_df()
    peaks_in_energy_rough_cal = multifit_df["peak_ph"].to_numpy()
    peaks_uncalibrated = previous_energy2ph(peaks_in_energy_rough_cal)
    peaks_in_energy_reference = multifit_df["peak_energy_ref"].to_numpy()
    pfit_gain, rms_residual_energy = self.to_pfit_gain(previous_energy2ph)
    plt.sca(ax)
    x = np.linspace(0, np.amax(peaks_uncalibrated), 100)
    plt.plot(x, pfit_gain(x), "k", label="fit")
    gain = peaks_uncalibrated / peaks_in_energy_reference
    plt.plot(peaks_uncalibrated, gain, "o")
    plt.xlabel(uncalibrated_name)
    plt.ylabel("gain")
    plt.title(f"{rms_residual_energy=:.3f}")
    for name, x, y in zip(multifit_df["line"], peaks_uncalibrated, gain):
        ax.annotate(str(name), (x, y))
    return axes

results_params_as_df()

Return a DataFrame made from the fit parameters from the results.

Source code in mass2/core/multifit.py
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
def results_params_as_df(self) -> pl.DataFrame:
    """Return a DataFrame made from the fit parameters from the results."""
    assert self.results is not None
    result = self.results[0]
    param_names = result.params.keys()
    d: dict[str, list] = {}
    d["line"] = [fitspec.model.spect.shortname for fitspec in self.fitspecs]
    d["peak_energy_ref"] = [fitspec.model.spect.peak_energy for fitspec in self.fitspecs]
    d["peak_energy_ref_err"] = []
    # for quickline, position_uncertainty is a string
    # translate that into a large value for uncertainty so we can proceed without crashing
    for fitspec in self.fitspecs:
        if isinstance(fitspec.model.spect.position_uncertainty, str):
            v = 0.1 * fitspec.model.spect.peak_energy  # 10% error is large!
        else:
            v = fitspec.model.spect.position_uncertainty
        d["peak_energy_ref_err"].append(v)
    for param_name in param_names:
        d[param_name] = [result.params[param_name].value for result in self.results]
        d[param_name + "_stderr"] = [result.params[param_name].stderr for result in self.results]
    return pl.DataFrame(d)

to_mass_cal(previous_energy2ph, curvetype=Curvetypes.GAIN, approximate=False)

Return a calibration object made from the fit results in this MultiFit.

Source code in mass2/core/multifit.py
231
232
233
234
235
236
237
238
239
240
241
242
243
244
def to_mass_cal(
    self, previous_energy2ph: Callable, curvetype: Curvetypes = Curvetypes.GAIN, approximate: bool = False
) -> EnergyCalibration:
    """Return a calibration object made from the fit results in this MultiFit."""
    df = self.results_params_as_df()
    maker = EnergyCalibrationMaker(
        ph=np.array([previous_energy2ph(x) for x in df["peak_ph"].to_numpy()]),
        energy=df["peak_energy_ref"].to_numpy(),
        dph=df["peak_ph_stderr"].to_numpy(),
        de=df["peak_energy_ref_err"].to_numpy(),
        names=[name for name in df["line"]],
    )
    cal = maker.make_calibration(curvename=curvetype, approximate=approximate)
    return cal

to_pfit_gain(previous_energy2ph)

Return a best-fit 2nd degree polynomial for gain (ph/energy) vs uncalibrated ph, and the rms residual in energy after applying that gain correction.

Source code in mass2/core/multifit.py
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
def to_pfit_gain(self, previous_energy2ph: Callable) -> tuple[np.polynomial.Polynomial, float]:
    """Return a best-fit 2nd degree polynomial for gain (ph/energy) vs uncalibrated ph,
    and the rms residual in energy after applying that gain correction."""
    multifit_df = self.results_params_as_df()
    peaks_in_energy_rough_cal = multifit_df["peak_ph"].to_numpy()
    peaks_uncalibrated = np.array([previous_energy2ph(e) for e in peaks_in_energy_rough_cal]).ravel()
    peaks_in_energy_reference = multifit_df["peak_energy_ref"].to_numpy()
    gain = peaks_uncalibrated / peaks_in_energy_reference
    pfit_gain = np.polynomial.Polynomial.fit(peaks_uncalibrated, gain, deg=2)

    def ph2energy(ph: NDArray) -> NDArray:
        "Given an array `ph` of pulse heights, return the corresponding energies as an array."
        gain = pfit_gain(ph)
        return ph / gain

    e_predicted = ph2energy(peaks_uncalibrated)
    rms_residual_energy = mass2.misc.root_mean_squared(e_predicted - peaks_in_energy_reference)
    return pfit_gain, rms_residual_energy

with_fitspec(fitspec)

Return a copy of this MultiFit with a new FitSpec added.

Source code in mass2/core/multifit.py
114
115
116
117
118
def with_fitspec(self, fitspec: FitSpec) -> "MultiFit":
    """Return a copy of this MultiFit with a new FitSpec added."""
    # make sure they're always sorted by energy
    newfitspecs = sorted(self.fitspecs + [fitspec], key=lambda x: x.model.spect.peak_energy)
    return dataclasses.replace(self, fitspecs=newfitspecs)

with_line(line, dlo=None, dhi=None, bin_size=None, use_expr=None, params_update=None)

Return a copy of this MultiFit with an additional FitSpec for the given line.

Source code in mass2/core/multifit.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
def with_line(
    self,
    line: GenericLineModel | SpectralLine | str | float,
    dlo: float | None = None,
    dhi: float | None = None,
    bin_size: float | None = None,
    use_expr: pl.Expr | None = None,
    params_update: lmfit.Parameters | None = None,
) -> "MultiFit":
    """Return a copy of this MultiFit with an additional FitSpec for the given line."""
    model = get_model(line)
    peak_energy = model.spect.peak_energy
    dlo = handle_none(dlo, self.default_fit_width / 2)
    dhi = handle_none(dhi, self.default_fit_width / 2)
    bin_size = handle_none(bin_size, self.default_bin_size)
    params_update = handle_none(params_update, self.default_params_update)
    use_expr = handle_none(use_expr, self.default_use_expr)
    bin_edges = np.arange(-dlo, dhi + bin_size, bin_size) + peak_energy
    fitspec = FitSpec(model, bin_edges, use_expr, params_update)
    return self.with_fitspec(fitspec)

with_results(results)

Return a copy of this MultiFit with the given results added.

Source code in mass2/core/multifit.py
120
121
122
def with_results(self, results: list) -> "MultiFit":
    """Return a copy of this MultiFit with the given results added."""
    return dataclasses.replace(self, results=results)

MultiFitMassCalibrationStep dataclass

Bases: RecipeStep

A RecipeStep to apply a mass-style calibration derived from a MultiFit.

Source code in mass2/core/multifit.py
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
@dataclass(frozen=True)
class MultiFitMassCalibrationStep(RecipeStep):
    """A RecipeStep to apply a mass-style calibration derived from a MultiFit."""

    cal: EnergyCalibration
    multifit: MultiFit | None

    def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
        """Calibrate energy and return a new DataFrame with results."""
        # only works with in memory data, but just takes it as numpy data and calls function
        # is much faster than map_elements approach, but wouldn't work with out of core data without some extra book keeping
        inputs_np = [df[input].to_numpy() for input in self.inputs]
        out = self.ph2energy(inputs_np[0])
        df2 = pl.DataFrame({self.output[0]: out}).with_columns(df)
        return df2

    def drop_debug(self) -> "MultiFitMassCalibrationStep":
        "For slimmer object pickling, return a copy of self with the fat debugging info removed"
        return dataclasses.replace(self, multifit=None)

    def dbg_plot(self, df_after: pl.DataFrame, **kwargs: Any) -> plt.Axes:
        "Plot the fit results and gain curve for debugging purposes."
        assert self.multifit is not None
        axes = self.multifit.plot_results_and_pfit(
            uncalibrated_name=self.inputs[0],
            previous_energy2ph=self.energy2ph,
        )
        return axes

    def ph2energy(self, ph: ArrayLike) -> NDArray:
        "The quadratic gain calibration curve: ph -> energy"
        ph = np.asarray(ph)
        return self.cal.ph2energy(ph)

    def energy2ph(self, energy: ArrayLike) -> NDArray:
        """The inverse of the quadratic gain calibration curve: energy -> ph"""
        energy = np.asarray(energy)
        return self.cal.energy2ph(energy)

    @classmethod
    def learn(
        cls,
        ch: "Channel",
        multifit_spec: MultiFit,
        previous_cal_step_index: int,
        calibrated_col: str,
        use_expr: pl.Expr = pl.lit(True),
    ) -> "MultiFitMassCalibrationStep":
        """multifit then make a mass calibration object with curve_type=Curvetypes.GAIN and approx=False
        TODO: support more options"""
        previous_cal_step = ch.steps[previous_cal_step_index]
        assert hasattr(previous_cal_step, "energy2ph")
        rough_energy_col = previous_cal_step.output[0]
        uncalibrated_col = previous_cal_step.inputs[0]

        multifit_with_results = multifit_spec.fit_ch(ch, col=rough_energy_col)
        cal = multifit_with_results.to_mass_cal(previous_cal_step.energy2ph)
        step = cls(
            [uncalibrated_col],
            [calibrated_col],
            ch.good_expr,
            use_expr,
            cal,
            multifit_with_results,
        )
        return step

calc_from_df(df)

Calibrate energy and return a new DataFrame with results.

Source code in mass2/core/multifit.py
333
334
335
336
337
338
339
340
def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
    """Calibrate energy and return a new DataFrame with results."""
    # only works with in memory data, but just takes it as numpy data and calls function
    # is much faster than map_elements approach, but wouldn't work with out of core data without some extra book keeping
    inputs_np = [df[input].to_numpy() for input in self.inputs]
    out = self.ph2energy(inputs_np[0])
    df2 = pl.DataFrame({self.output[0]: out}).with_columns(df)
    return df2

dbg_plot(df_after, **kwargs)

Plot the fit results and gain curve for debugging purposes.

Source code in mass2/core/multifit.py
346
347
348
349
350
351
352
353
def dbg_plot(self, df_after: pl.DataFrame, **kwargs: Any) -> plt.Axes:
    "Plot the fit results and gain curve for debugging purposes."
    assert self.multifit is not None
    axes = self.multifit.plot_results_and_pfit(
        uncalibrated_name=self.inputs[0],
        previous_energy2ph=self.energy2ph,
    )
    return axes

drop_debug()

For slimmer object pickling, return a copy of self with the fat debugging info removed

Source code in mass2/core/multifit.py
342
343
344
def drop_debug(self) -> "MultiFitMassCalibrationStep":
    "For slimmer object pickling, return a copy of self with the fat debugging info removed"
    return dataclasses.replace(self, multifit=None)

energy2ph(energy)

The inverse of the quadratic gain calibration curve: energy -> ph

Source code in mass2/core/multifit.py
360
361
362
363
def energy2ph(self, energy: ArrayLike) -> NDArray:
    """The inverse of the quadratic gain calibration curve: energy -> ph"""
    energy = np.asarray(energy)
    return self.cal.energy2ph(energy)

learn(ch, multifit_spec, previous_cal_step_index, calibrated_col, use_expr=pl.lit(True)) classmethod

multifit then make a mass calibration object with curve_type=Curvetypes.GAIN and approx=False TODO: support more options

Source code in mass2/core/multifit.py
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
@classmethod
def learn(
    cls,
    ch: "Channel",
    multifit_spec: MultiFit,
    previous_cal_step_index: int,
    calibrated_col: str,
    use_expr: pl.Expr = pl.lit(True),
) -> "MultiFitMassCalibrationStep":
    """multifit then make a mass calibration object with curve_type=Curvetypes.GAIN and approx=False
    TODO: support more options"""
    previous_cal_step = ch.steps[previous_cal_step_index]
    assert hasattr(previous_cal_step, "energy2ph")
    rough_energy_col = previous_cal_step.output[0]
    uncalibrated_col = previous_cal_step.inputs[0]

    multifit_with_results = multifit_spec.fit_ch(ch, col=rough_energy_col)
    cal = multifit_with_results.to_mass_cal(previous_cal_step.energy2ph)
    step = cls(
        [uncalibrated_col],
        [calibrated_col],
        ch.good_expr,
        use_expr,
        cal,
        multifit_with_results,
    )
    return step

ph2energy(ph)

The quadratic gain calibration curve: ph -> energy

Source code in mass2/core/multifit.py
355
356
357
358
def ph2energy(self, ph: ArrayLike) -> NDArray:
    "The quadratic gain calibration curve: ph -> energy"
    ph = np.asarray(ph)
    return self.cal.ph2energy(ph)

MultiFitQuadraticGainStep dataclass

Bases: RecipeStep

A RecipeStep to apply a quadratic gain curve, after fitting multiple emission lines.

Source code in mass2/core/multifit.py
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
@dataclass(frozen=True)
class MultiFitQuadraticGainStep(RecipeStep):
    """A RecipeStep to apply a quadratic gain curve, after fitting multiple emission lines."""

    pfit_gain: np.polynomial.Polynomial
    multifit: MultiFit | None
    rms_residual_energy: float

    def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
        """Calibrate energy and return a new DataFrame with results."""
        # only works with in memory data, but just takes it as numpy data and calls function
        # is much faster than map_elements approach, but wouldn't work with out of core data without some extra book keeping
        inputs_np = [df[input].to_numpy() for input in self.inputs]
        out = self.ph2energy(inputs_np[0])
        df2 = pl.DataFrame({self.output[0]: out}).with_columns(df)
        return df2

    def dbg_plot(self, df_after: pl.DataFrame, **kwargs: Any) -> plt.Axes:
        """Plot the fit results and gain curve for debugging purposes."""
        if self.multifit is not None:
            self.multifit.plot_results_and_pfit(uncalibrated_name=self.inputs[0], previous_energy2ph=self.energy2ph)
        return plt.gca()

    def drop_debug(self) -> "MultiFitQuadraticGainStep":
        "For slimmer object pickling, return a copy of self with the fat debugging info removed"
        return dataclasses.replace(self, multifit=None)

    def ph2energy(self, ph: ArrayLike) -> NDArray:
        "The quadratic gain calibration curve: ph -> energy"
        ph = np.asarray(ph)
        gain = self.pfit_gain(ph)
        return ph / gain

    def energy2ph(self, energy: ArrayLike) -> NDArray:
        """The inverse of the quadratic gain calibration curve: energy -> ph"""
        # ph2energy is equivalent to this with y=energy, x=ph
        # y = x/(c + b*x + a*x^2)
        # so
        # y*c + (y*b-1)*x + a*x^2 = 0
        # and given that we've selected for well formed calibrations,
        # we know which root we want
        energy = np.asarray(energy)
        cba = self.pfit_gain.convert().coef
        c, bb, a = cba * energy
        b = bb - 1
        ph = (-b - np.sqrt(b**2 - 4 * a * c)) / (2 * a)
        assert math.isclose(self.ph2energy(ph), energy, rel_tol=1e-6, abs_tol=1e-3)
        return ph

    @classmethod
    def learn(
        cls,
        ch: "Channel",
        multifit_spec: MultiFit,
        previous_cal_step_index: int,
        calibrated_col: str,
        use_expr: pl.Expr = pl.lit(True),
    ) -> "MultiFitQuadraticGainStep":
        """Perform a multifit then make a quadratic gain calibration object."""
        previous_cal_step = ch.steps[previous_cal_step_index]
        assert hasattr(previous_cal_step, "energy2ph")
        rough_energy_col = previous_cal_step.output[0]
        uncalibrated_col = previous_cal_step.inputs[0]

        multifit_with_results = multifit_spec.fit_ch(ch, col=rough_energy_col)
        # multifit_df = multifit_with_results.results_params_as_df()
        pfit_gain, rms_residual_energy = multifit_with_results.to_pfit_gain(previous_cal_step.energy2ph)
        step = cls(
            [uncalibrated_col],
            [calibrated_col],
            ch.good_expr,
            use_expr,
            pfit_gain,
            multifit_with_results,
            rms_residual_energy,
        )
        return step

calc_from_df(df)

Calibrate energy and return a new DataFrame with results.

Source code in mass2/core/multifit.py
255
256
257
258
259
260
261
262
def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
    """Calibrate energy and return a new DataFrame with results."""
    # only works with in memory data, but just takes it as numpy data and calls function
    # is much faster than map_elements approach, but wouldn't work with out of core data without some extra book keeping
    inputs_np = [df[input].to_numpy() for input in self.inputs]
    out = self.ph2energy(inputs_np[0])
    df2 = pl.DataFrame({self.output[0]: out}).with_columns(df)
    return df2

dbg_plot(df_after, **kwargs)

Plot the fit results and gain curve for debugging purposes.

Source code in mass2/core/multifit.py
264
265
266
267
268
def dbg_plot(self, df_after: pl.DataFrame, **kwargs: Any) -> plt.Axes:
    """Plot the fit results and gain curve for debugging purposes."""
    if self.multifit is not None:
        self.multifit.plot_results_and_pfit(uncalibrated_name=self.inputs[0], previous_energy2ph=self.energy2ph)
    return plt.gca()

drop_debug()

For slimmer object pickling, return a copy of self with the fat debugging info removed

Source code in mass2/core/multifit.py
270
271
272
def drop_debug(self) -> "MultiFitQuadraticGainStep":
    "For slimmer object pickling, return a copy of self with the fat debugging info removed"
    return dataclasses.replace(self, multifit=None)

energy2ph(energy)

The inverse of the quadratic gain calibration curve: energy -> ph

Source code in mass2/core/multifit.py
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
def energy2ph(self, energy: ArrayLike) -> NDArray:
    """The inverse of the quadratic gain calibration curve: energy -> ph"""
    # ph2energy is equivalent to this with y=energy, x=ph
    # y = x/(c + b*x + a*x^2)
    # so
    # y*c + (y*b-1)*x + a*x^2 = 0
    # and given that we've selected for well formed calibrations,
    # we know which root we want
    energy = np.asarray(energy)
    cba = self.pfit_gain.convert().coef
    c, bb, a = cba * energy
    b = bb - 1
    ph = (-b - np.sqrt(b**2 - 4 * a * c)) / (2 * a)
    assert math.isclose(self.ph2energy(ph), energy, rel_tol=1e-6, abs_tol=1e-3)
    return ph

learn(ch, multifit_spec, previous_cal_step_index, calibrated_col, use_expr=pl.lit(True)) classmethod

Perform a multifit then make a quadratic gain calibration object.

Source code in mass2/core/multifit.py
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
@classmethod
def learn(
    cls,
    ch: "Channel",
    multifit_spec: MultiFit,
    previous_cal_step_index: int,
    calibrated_col: str,
    use_expr: pl.Expr = pl.lit(True),
) -> "MultiFitQuadraticGainStep":
    """Perform a multifit then make a quadratic gain calibration object."""
    previous_cal_step = ch.steps[previous_cal_step_index]
    assert hasattr(previous_cal_step, "energy2ph")
    rough_energy_col = previous_cal_step.output[0]
    uncalibrated_col = previous_cal_step.inputs[0]

    multifit_with_results = multifit_spec.fit_ch(ch, col=rough_energy_col)
    # multifit_df = multifit_with_results.results_params_as_df()
    pfit_gain, rms_residual_energy = multifit_with_results.to_pfit_gain(previous_cal_step.energy2ph)
    step = cls(
        [uncalibrated_col],
        [calibrated_col],
        ch.good_expr,
        use_expr,
        pfit_gain,
        multifit_with_results,
        rms_residual_energy,
    )
    return step

ph2energy(ph)

The quadratic gain calibration curve: ph -> energy

Source code in mass2/core/multifit.py
274
275
276
277
278
def ph2energy(self, ph: ArrayLike) -> NDArray:
    "The quadratic gain calibration curve: ph -> energy"
    ph = np.asarray(ph)
    gain = self.pfit_gain(ph)
    return ph / gain

handle_none(val, default)

If val is None, return a copy of default, else return val.

Source code in mass2/core/multifit.py
31
32
33
34
35
def handle_none(val: T | None, default: T) -> T:
    "If val is None, return a copy of default, else return val."
    if val is None:
        return copy.copy(default)
    return val

Algorithms to analyze noise data.

NoiseResult dataclass

A dataclass to hold the results of noise analysis, both power-spectral density and (optionally) autocorrelation.

Source code in mass2/core/noise_algorithms.py
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
@dataclass
class NoiseResult:
    """A dataclass to hold the results of noise analysis, both power-spectral density and (optionally) autocorrelation."""

    psd: np.ndarray
    autocorr_vec: np.ndarray | None
    frequencies: np.ndarray

    def plot(
        self,
        axis: plt.Axes | None = None,
        arb_to_unit_scale_and_label: tuple[int, str] = (1, "arb"),
        sqrt_psd: bool = True,
        loglog: bool = True,
        **plotkwarg: dict,
    ) -> None:
        """Plot the power spectral density."""
        if axis is None:
            plt.figure()
            axis = plt.gca()
        arb_to_unit_scale, unit_label = arb_to_unit_scale_and_label
        psd = self.psd[1:] * (arb_to_unit_scale**2)
        freq = self.frequencies[1:]
        if sqrt_psd:
            axis.plot(freq, np.sqrt(psd), **plotkwarg)
            axis.set_ylabel(f"Amplitude Spectral Density ({unit_label}$/\\sqrt{{Hz}}$)")
        else:
            axis.plot(freq, psd, **plotkwarg)
            axis.set_ylabel(f"Power Spectral Density ({unit_label}$^2$ Hz$^{{-1}}$)")
        if loglog:
            plt.loglog()
        axis.grid()
        axis.set_xlabel("Frequency (Hz)")
        plt.title(f"noise from records of length {len(self.frequencies) * 2 - 2}")
        axis.figure.tight_layout()

    def plot_log_rebinned(
        self,
        bins_per_decade: int = 10,
        axis: plt.Axes | None = None,
        arb_to_unit_scale_and_label: tuple[int, str] = (1, "arb"),
        sqrt_psd: bool = True,
        **plotkwarg: dict,
    ) -> None:
        """Plot PSD rebinned into logarithmically spaced frequency bins."""
        if axis is None:
            plt.figure()
            axis = plt.gca()

        arb_to_unit_scale, unit_label = arb_to_unit_scale_and_label
        psd = self.psd[1:] * (arb_to_unit_scale**2)
        freq = self.frequencies[1:]

        # define logarithmically spaced bin edges
        fmin, fmax = freq[0], freq[-1]
        n_decades = np.log10(fmax / fmin)
        n_bins = int(bins_per_decade * n_decades)
        bin_edges = np.logspace(np.log10(fmin), np.log10(fmax), n_bins + 1)

        # digitize frequencies into bins
        inds = np.digitize(freq, bin_edges)

        # average PSD per bin
        binned_freqs = np.zeros(n_bins, dtype=float)
        binned_psd = np.zeros(n_bins, dtype=float)
        for i in range(1, len(bin_edges)):
            mask = inds == i
            if np.any(mask):
                binned_freqs[i - 1] = np.exp(np.mean(np.log(freq[mask])))  # geometric mean
                binned_psd[i - 1] = np.mean(psd[mask])

        if sqrt_psd:
            axis.plot(binned_freqs, np.sqrt(binned_psd), **plotkwarg)
            axis.set_ylabel(f"Amplitude Spectral Density ({unit_label}$/\\sqrt{{Hz}}$)")
        else:
            axis.plot(binned_freqs, binned_psd, **plotkwarg)
            axis.set_ylabel(f"Power Spectral Density ({unit_label}$^2$ Hz$^{{-1}}$)")

        axis.set_xscale("log")
        axis.set_yscale("log")
        axis.grid(True, which="both")
        axis.set_xlabel("Frequency (Hz)")
        axis.set_title(f"Log-rebinned noise from {len(self.frequencies) * 2 - 2} samples")
        axis.figure.tight_layout()

plot(axis=None, arb_to_unit_scale_and_label=(1, 'arb'), sqrt_psd=True, loglog=True, **plotkwarg)

Plot the power spectral density.

Source code in mass2/core/noise_algorithms.py
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
def plot(
    self,
    axis: plt.Axes | None = None,
    arb_to_unit_scale_and_label: tuple[int, str] = (1, "arb"),
    sqrt_psd: bool = True,
    loglog: bool = True,
    **plotkwarg: dict,
) -> None:
    """Plot the power spectral density."""
    if axis is None:
        plt.figure()
        axis = plt.gca()
    arb_to_unit_scale, unit_label = arb_to_unit_scale_and_label
    psd = self.psd[1:] * (arb_to_unit_scale**2)
    freq = self.frequencies[1:]
    if sqrt_psd:
        axis.plot(freq, np.sqrt(psd), **plotkwarg)
        axis.set_ylabel(f"Amplitude Spectral Density ({unit_label}$/\\sqrt{{Hz}}$)")
    else:
        axis.plot(freq, psd, **plotkwarg)
        axis.set_ylabel(f"Power Spectral Density ({unit_label}$^2$ Hz$^{{-1}}$)")
    if loglog:
        plt.loglog()
    axis.grid()
    axis.set_xlabel("Frequency (Hz)")
    plt.title(f"noise from records of length {len(self.frequencies) * 2 - 2}")
    axis.figure.tight_layout()

plot_log_rebinned(bins_per_decade=10, axis=None, arb_to_unit_scale_and_label=(1, 'arb'), sqrt_psd=True, **plotkwarg)

Plot PSD rebinned into logarithmically spaced frequency bins.

Source code in mass2/core/noise_algorithms.py
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
def plot_log_rebinned(
    self,
    bins_per_decade: int = 10,
    axis: plt.Axes | None = None,
    arb_to_unit_scale_and_label: tuple[int, str] = (1, "arb"),
    sqrt_psd: bool = True,
    **plotkwarg: dict,
) -> None:
    """Plot PSD rebinned into logarithmically spaced frequency bins."""
    if axis is None:
        plt.figure()
        axis = plt.gca()

    arb_to_unit_scale, unit_label = arb_to_unit_scale_and_label
    psd = self.psd[1:] * (arb_to_unit_scale**2)
    freq = self.frequencies[1:]

    # define logarithmically spaced bin edges
    fmin, fmax = freq[0], freq[-1]
    n_decades = np.log10(fmax / fmin)
    n_bins = int(bins_per_decade * n_decades)
    bin_edges = np.logspace(np.log10(fmin), np.log10(fmax), n_bins + 1)

    # digitize frequencies into bins
    inds = np.digitize(freq, bin_edges)

    # average PSD per bin
    binned_freqs = np.zeros(n_bins, dtype=float)
    binned_psd = np.zeros(n_bins, dtype=float)
    for i in range(1, len(bin_edges)):
        mask = inds == i
        if np.any(mask):
            binned_freqs[i - 1] = np.exp(np.mean(np.log(freq[mask])))  # geometric mean
            binned_psd[i - 1] = np.mean(psd[mask])

    if sqrt_psd:
        axis.plot(binned_freqs, np.sqrt(binned_psd), **plotkwarg)
        axis.set_ylabel(f"Amplitude Spectral Density ({unit_label}$/\\sqrt{{Hz}}$)")
    else:
        axis.plot(binned_freqs, binned_psd, **plotkwarg)
        axis.set_ylabel(f"Power Spectral Density ({unit_label}$^2$ Hz$^{{-1}}$)")

    axis.set_xscale("log")
    axis.set_yscale("log")
    axis.grid(True, which="both")
    axis.set_xlabel("Frequency (Hz)")
    axis.set_title(f"Log-rebinned noise from {len(self.frequencies) * 2 - 2} samples")
    axis.figure.tight_layout()

calc_autocorrelation_times(n, dt)

Compute the timesteps for an autocorrelation function

Parameters:
  • n (int) –

    Number of lags

  • dt (float) –

    Sample time

Returns:
  • NDArray –

    The time delays for each lag

Source code in mass2/core/noise_algorithms.py
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
def calc_autocorrelation_times(n: int, dt: float) -> NDArray:
    """Compute the timesteps for an autocorrelation function

    Parameters
    ----------
    n : int
        Number of lags
    dt : float
        Sample time

    Returns
    -------
    NDArray
        The time delays for each lag
    """
    return np.arange(n) * dt

calc_continuous_autocorrelation(data, n_lags, max_excursion=1000)

Calculate the autocorrelation of the input data, assuming the entire array is continuous.

Parameters:
  • data (ArrayLike) –

    Data to be autocorrelated. Arrays of 2+ dimensions will be converted to a 1D array via .ravel().

  • n_lags (int) –

    Compute the autocorrelation for lags in the range [0, n_lags-1].

  • max_excursion (int, default: 1000 ) –

    Chunks of data with max absolute excursion from the mean this large will be excluded from the calculation, by default 1000

Returns:
  • NDArray –

    The autocorrelation array

Raises:
  • ValueError –

    If the data are too short to provide the requested number of lags, or the data contain apparent pulses.

Source code in mass2/core/noise_algorithms.py
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
def calc_continuous_autocorrelation(data: ArrayLike, n_lags: int, max_excursion: int = 1000) -> NDArray:
    """Calculate the autocorrelation of the input data, assuming the entire array is continuous.

    Parameters
    ----------
    data : ArrayLike
        Data to be autocorrelated. Arrays of 2+ dimensions will be converted to a 1D array via `.ravel()`.
    n_lags : int
        Compute the autocorrelation for lags in the range `[0, n_lags-1]`.
    max_excursion : int, optional
        Chunks of data with max absolute excursion from the mean this large will be excluded from the calculation, by default 1000

    Returns
    -------
    NDArray
        The autocorrelation array

    Raises
    ------
    ValueError
        If the data are too short to provide the requested number of lags, or the data contain apparent pulses.
    """
    data = np.asarray(data).ravel()
    n_data = len(data)
    assert n_lags < n_data

    def padded_length(n: int) -> int:
        """Return a sensible number in the range [n, 2n] which is not too
        much larger than n, yet is good for FFTs.

        Returns:
            A number: (1, 3, or 5)*(a power of two), whichever is smallest.
        """
        pow2 = np.round(2 ** np.ceil(np.log2(n)))
        if n == pow2:
            return int(n)
        elif n > 0.75 * pow2:
            return int(pow2)
        elif n > 0.625 * pow2:
            return int(np.round(0.75 * pow2))
        else:
            return int(np.round(0.625 * pow2))

    # When there are 10 million data points and only 10,000 lags wanted,
    # it's hugely inefficient to compute the full autocorrelation, especially
    # in memory.  Instead, compute it on chunks several times the length of the desired
    # correlation, and average.
    CHUNK_MULTIPLE = 15
    if n_data < CHUNK_MULTIPLE * n_lags:
        msg = f"There are too few data values ({n_data=}) to compute at least {n_lags} lags."
        raise ValueError(msg)

    # Be sure to pad chunksize samples by AT LEAST n_lags zeros, to prevent
    # unwanted wraparound in the autocorrelation.
    # padded_data is what we do DFT/InvDFT on; ac is the unnormalized output.
    chunksize = CHUNK_MULTIPLE * n_lags
    padsize = n_lags
    padded_data = np.zeros(padded_length(padsize + chunksize), dtype=float)

    ac = np.zeros(n_lags, dtype=float)

    entries = 0

    Nchunks = n_data // chunksize
    datachunks = data[: Nchunks * chunksize].reshape(Nchunks, chunksize)
    for data in datachunks:
        padded_data[:chunksize] = data - np.asarray(data).mean()
        if np.abs(padded_data).max() > max_excursion:
            continue

        ft = np.fft.rfft(padded_data)
        ft[0] = 0  # this redundantly removes the mean of the data set
        power = (ft * ft.conj()).real
        acsum = np.fft.irfft(power)
        ac += acsum[:n_lags]
        entries += 1

    if entries == 0:
        raise ValueError("Apparently all 'noise' chunks had large excursions from baseline, so no autocorrelation was computed")

    ac /= entries
    ac /= np.arange(chunksize, chunksize - n_lags + 0.5, -1.0, dtype=float)
    return ac

calc_discontinuous_autocorrelation(data, max_excursion=1000)

Calculate the autocorrelation of the input data, assuming the rows of the array are NOT continuous in time.

Parameters:
  • data (ArrayLike) –

    A 2D array of noise data. Shape is (ntraces, nsamples).

  • max_excursion (int, default: 1000 ) –

    description, by default 1000

Returns:
  • NDArray –

    The mean autocorrelation of the rows ("traces") in the input data, from lags [0, nsamples-1].

Source code in mass2/core/noise_algorithms.py
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def calc_discontinuous_autocorrelation(data: ArrayLike, max_excursion: int = 1000) -> NDArray:
    """Calculate the autocorrelation of the input data, assuming the rows of the array are NOT
    continuous in time.

    Parameters
    ----------
    data : ArrayLike
        A 2D array of noise data. Shape is `(ntraces, nsamples)`.
    max_excursion : int, optional
        _description_, by default 1000

    Returns
    -------
    NDArray
        The mean autocorrelation of the rows ("traces") in the input `data`, from lags `[0, nsamples-1]`.
    """
    data = np.asarray(data)
    ntraces, nsamples = data.shape
    ac = np.zeros(nsamples, dtype=float)

    traces_used = 0
    for i in range(ntraces):
        pulse = data[i, :] - data[i, :].mean()
        if np.abs(pulse).max() > max_excursion:
            continue
        ac += np.correlate(pulse, pulse, "full")[nsamples - 1 :]
        traces_used += 1

    ac /= traces_used
    ac /= nsamples - np.arange(nsamples, dtype=float)
    return ac

calc_noise_result(data, dt, continuous, window=None, skip_autocorr_if_length_over=100000)

Analyze the noise as Mass has always done.

  • Compute autocorrelation with a lower noise at longer lags when data are known to be continuous
  • Subtract the mean before computing the power spectrum
Parameters:
  • data (ArrayLike) –

    A 2d array of noise data, of shape (npulses, nsamples)

  • dt (float) –

    Periodic sampling time, in seconds

  • continuous (bool) –

    Whether the "pulses" in the data array are continuous in time

  • window (callable, default: None ) –

    A function to compute a data window (or if None, no windowing), by default None

Returns:
  • NoiseResult –

    The derived noise spectrum and autocorrelation

Source code in mass2/core/noise_algorithms.py
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
def calc_noise_result(
    data: ArrayLike, dt: float, continuous: bool, window: Callable | None = None, skip_autocorr_if_length_over: int = 100_000
) -> "NoiseResult":
    """Analyze the noise as Mass has always done.

    * Compute autocorrelation with a lower noise at longer lags when data are known to be continuous
    * Subtract the mean before computing the power spectrum

    Parameters
    ----------
    data : ArrayLike
        A 2d array of noise data, of shape `(npulses, nsamples)`
    dt : float
        Periodic sampling time, in seconds
    continuous : bool
        Whether the "pulses" in the `data` array are continuous in time
    window : callable, optional
        A function to compute a data window (or if None, no windowing), by default None

    Returns
    -------
    NoiseResult
        The derived noise spectrum and autocorrelation
    """
    data = np.asarray(data)
    data_zeromean = data - np.mean(data)
    (n_pulses, nsamples) = data_zeromean.shape
    # see test_ravel_behavior to be sure this is written correctly
    f_mass, psd_mass = mass2.mathstat.power_spectrum.computeSpectrum(data_zeromean.ravel(), segfactor=n_pulses, dt=dt, window=window)
    if nsamples <= skip_autocorr_if_length_over:
        if continuous:
            autocorr_vec = calc_continuous_autocorrelation(data_zeromean.ravel(), n_lags=nsamples)
        else:
            autocorr_vec = calc_discontinuous_autocorrelation(data_zeromean)
    else:
        print(
            """warning: noise_psd_mass skipping autocorrelation calculation for long traces,
            use skip_autocorr_if_length_over argument to override this"""
        )
        autocorr_vec = None
    return NoiseResult(psd=psd_mass, autocorr_vec=autocorr_vec, frequencies=f_mass)

noise_psd_periodogram(data, dt, window='boxcar', detrend=False)

Compute the noise power spectral density using scipy's periodogram function and the autocorrelation.

Source code in mass2/core/noise_algorithms.py
151
152
153
154
155
156
157
158
159
def noise_psd_periodogram(data: ndarray, dt: float, window: ArrayLike | str = "boxcar", detrend: bool = False) -> "NoiseResult":
    """Compute the noise power spectral density using scipy's periodogram function and the autocorrelation."""
    f, Pxx = sp.signal.periodogram(data, fs=1 / dt, window=window, axis=-1, detrend=detrend)
    # len(f) = data.shape[1]//2+1
    # Pxx[i, j] is the PSD at frequency f[j] for the i‑th trace data[i, :]
    Pxx_mean = np.mean(Pxx, axis=0)
    # Pxx_mean[j] is the averaged PSD at frequency f[j] over all traces
    autocorr_vec = calc_discontinuous_autocorrelation(data)
    return NoiseResult(psd=Pxx_mean, autocorr_vec=autocorr_vec, frequencies=f)

Hold a class to represent a channel with noise data only, and to analyze its noise characteristics.

NoiseChannel dataclass

A class to represent a channel with noise data only, and to analyze its noise characteristics.

Source code in mass2/core/noise_channel.py
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
@dataclass(frozen=True)
class NoiseChannel:
    """A class to represent a channel with noise data only, and to analyze its noise characteristics."""

    df: pl.DataFrame  # DO NOT MUTATE THIS!!!
    header_df: pl.DataFrame  # DO NOT MUTATE THIS!!
    frametime_s: float

    # @functools.cache
    def calc_max_excursion(
        self, trace_col_name: str = "pulse", n_limit: int = 10000, excursion_nsigma: float = 5
    ) -> tuple[pl.DataFrame, float]:
        """Compute the maximum excursion from the median for each noise record, and store in dataframe."""

        def excursion2d(noise_trace: NDArray) -> float:
            """Return the excursion (max - min) for each trace in a 2D array of traces."""
            return np.amax(noise_trace, axis=1) - np.amin(noise_trace, axis=1)

        noise_traces = self.df.limit(n_limit)[trace_col_name].to_numpy()
        excursion = excursion2d(noise_traces)
        max_excursion = mass2.misc.outlier_resistant_nsigma_above_mid(excursion, nsigma=excursion_nsigma)
        df_noise2 = self.df.limit(n_limit).with_columns(excursion=excursion)
        return df_noise2, max_excursion

    def get_records_2d(
        self,
        trace_col_name: str = "pulse",
        n_limit: int = 10000,
        excursion_nsigma: float = 5,
        trunc_front: int = 0,
        trunc_back: int = 0,
    ) -> NDArray:
        """
        Return a 2D NumPy array of cleaned noise traces from the specified column.

        This method identifies noise traces with excursions below a threshold and
        optionally truncates the beginning and/or end of each trace.

        Parameters:
        ----------
        trace_col_name : str, optional
            Name of the column containing trace data. Default is "pulse".
        n_limit : int, optional
            Maximum number of traces to analyze. Default is 10000.
        excursion_nsigma : float, optional
            Threshold for maximum excursion in units of noise sigma. Default is 5.
        trunc_front : int, optional
            Number of samples to truncate from the front of each trace. Default is 0.
        trunc_back : int, optional
            Number of samples to truncate from the back of each trace. Must be >= 0. Default is 0.

        Returns:
        -------
        np.ndarray
            A 2D array of cleaned and optionally truncated noise traces.

            Shape: (n_pulses, len(pulse))
        """
        df_noise2, max_excursion = self.calc_max_excursion(trace_col_name, n_limit, excursion_nsigma)
        noise_traces_clean = df_noise2.filter(pl.col("excursion") <= max_excursion)["pulse"].to_numpy()
        if trunc_back == 0:
            noise_traces_clean2 = noise_traces_clean[:, trunc_front:]
        elif trunc_back > 0:
            noise_traces_clean2 = noise_traces_clean[:, trunc_front:-trunc_back]
        else:
            raise ValueError("trunc_back must be >= 0")
        assert noise_traces_clean2.shape[0] > 0
        return noise_traces_clean2

    # @functools.cache
    def spectrum(
        self,
        trace_col_name: str = "pulse",
        n_limit: int = 10000,
        excursion_nsigma: float = 5,
        trunc_front: int = 0,
        trunc_back: int = 0,
        skip_autocorr_if_length_over: int = 100_000,
    ) -> NoiseResult:
        """Compute and return the noise result from the noise traces."""
        records = self.get_records_2d(trace_col_name, n_limit, excursion_nsigma, trunc_front, trunc_back)
        spectrum = mass2.core.noise_algorithms.calc_noise_result(
            records, continuous=self.is_continuous, dt=self.frametime_s, skip_autocorr_if_length_over=skip_autocorr_if_length_over
        )
        return spectrum

    def __hash__(self) -> int:
        """A hash function based on the object's id."""
        # needed to make functools.cache work
        # if self or self.anything is mutated, assumptions will be broken
        # and we may get nonsense results
        return hash(id(self))

    def __eq__(self, other: Any) -> bool:
        """Equality based on object identity."""
        return id(self) == id(other)

    @property
    def is_continuous(self) -> bool:
        "Whether this channel is continuous data (True) or triggered records with arbitrary gaps (False)."
        if "continuous" in self.header_df:
            return self.header_df["continuous"][0]
        return False

    @classmethod
    def from_ljh(cls, path: str | Path) -> "NoiseChannel":
        """Create a NoiseChannel by loading data from the given LJH file path."""
        ljh = mass2.LJHFile.open(path)
        df, header_df = ljh.to_polars()
        noise_channel = cls(df, header_df, header_df["Timebase"][0])
        return noise_channel

is_continuous property

Whether this channel is continuous data (True) or triggered records with arbitrary gaps (False).

__eq__(other)

Equality based on object identity.

Source code in mass2/core/noise_channel.py
108
109
110
def __eq__(self, other: Any) -> bool:
    """Equality based on object identity."""
    return id(self) == id(other)

__hash__()

A hash function based on the object's id.

Source code in mass2/core/noise_channel.py
101
102
103
104
105
106
def __hash__(self) -> int:
    """A hash function based on the object's id."""
    # needed to make functools.cache work
    # if self or self.anything is mutated, assumptions will be broken
    # and we may get nonsense results
    return hash(id(self))

calc_max_excursion(trace_col_name='pulse', n_limit=10000, excursion_nsigma=5)

Compute the maximum excursion from the median for each noise record, and store in dataframe.

Source code in mass2/core/noise_channel.py
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def calc_max_excursion(
    self, trace_col_name: str = "pulse", n_limit: int = 10000, excursion_nsigma: float = 5
) -> tuple[pl.DataFrame, float]:
    """Compute the maximum excursion from the median for each noise record, and store in dataframe."""

    def excursion2d(noise_trace: NDArray) -> float:
        """Return the excursion (max - min) for each trace in a 2D array of traces."""
        return np.amax(noise_trace, axis=1) - np.amin(noise_trace, axis=1)

    noise_traces = self.df.limit(n_limit)[trace_col_name].to_numpy()
    excursion = excursion2d(noise_traces)
    max_excursion = mass2.misc.outlier_resistant_nsigma_above_mid(excursion, nsigma=excursion_nsigma)
    df_noise2 = self.df.limit(n_limit).with_columns(excursion=excursion)
    return df_noise2, max_excursion

from_ljh(path) classmethod

Create a NoiseChannel by loading data from the given LJH file path.

Source code in mass2/core/noise_channel.py
119
120
121
122
123
124
125
@classmethod
def from_ljh(cls, path: str | Path) -> "NoiseChannel":
    """Create a NoiseChannel by loading data from the given LJH file path."""
    ljh = mass2.LJHFile.open(path)
    df, header_df = ljh.to_polars()
    noise_channel = cls(df, header_df, header_df["Timebase"][0])
    return noise_channel

get_records_2d(trace_col_name='pulse', n_limit=10000, excursion_nsigma=5, trunc_front=0, trunc_back=0)

Return a 2D NumPy array of cleaned noise traces from the specified column.

This method identifies noise traces with excursions below a threshold and optionally truncates the beginning and/or end of each trace.

Parameters:

trace_col_name : str, optional Name of the column containing trace data. Default is "pulse". n_limit : int, optional Maximum number of traces to analyze. Default is 10000. excursion_nsigma : float, optional Threshold for maximum excursion in units of noise sigma. Default is 5. trunc_front : int, optional Number of samples to truncate from the front of each trace. Default is 0. trunc_back : int, optional Number of samples to truncate from the back of each trace. Must be >= 0. Default is 0.

Returns:

np.ndarray A 2D array of cleaned and optionally truncated noise traces.

Shape: (n_pulses, len(pulse))
Source code in mass2/core/noise_channel.py
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def get_records_2d(
    self,
    trace_col_name: str = "pulse",
    n_limit: int = 10000,
    excursion_nsigma: float = 5,
    trunc_front: int = 0,
    trunc_back: int = 0,
) -> NDArray:
    """
    Return a 2D NumPy array of cleaned noise traces from the specified column.

    This method identifies noise traces with excursions below a threshold and
    optionally truncates the beginning and/or end of each trace.

    Parameters:
    ----------
    trace_col_name : str, optional
        Name of the column containing trace data. Default is "pulse".
    n_limit : int, optional
        Maximum number of traces to analyze. Default is 10000.
    excursion_nsigma : float, optional
        Threshold for maximum excursion in units of noise sigma. Default is 5.
    trunc_front : int, optional
        Number of samples to truncate from the front of each trace. Default is 0.
    trunc_back : int, optional
        Number of samples to truncate from the back of each trace. Must be >= 0. Default is 0.

    Returns:
    -------
    np.ndarray
        A 2D array of cleaned and optionally truncated noise traces.

        Shape: (n_pulses, len(pulse))
    """
    df_noise2, max_excursion = self.calc_max_excursion(trace_col_name, n_limit, excursion_nsigma)
    noise_traces_clean = df_noise2.filter(pl.col("excursion") <= max_excursion)["pulse"].to_numpy()
    if trunc_back == 0:
        noise_traces_clean2 = noise_traces_clean[:, trunc_front:]
    elif trunc_back > 0:
        noise_traces_clean2 = noise_traces_clean[:, trunc_front:-trunc_back]
    else:
        raise ValueError("trunc_back must be >= 0")
    assert noise_traces_clean2.shape[0] > 0
    return noise_traces_clean2

spectrum(trace_col_name='pulse', n_limit=10000, excursion_nsigma=5, trunc_front=0, trunc_back=0, skip_autocorr_if_length_over=100000)

Compute and return the noise result from the noise traces.

Source code in mass2/core/noise_channel.py
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
def spectrum(
    self,
    trace_col_name: str = "pulse",
    n_limit: int = 10000,
    excursion_nsigma: float = 5,
    trunc_front: int = 0,
    trunc_back: int = 0,
    skip_autocorr_if_length_over: int = 100_000,
) -> NoiseResult:
    """Compute and return the noise result from the noise traces."""
    records = self.get_records_2d(trace_col_name, n_limit, excursion_nsigma, trunc_front, trunc_back)
    spectrum = mass2.core.noise_algorithms.calc_noise_result(
        records, continuous=self.is_continuous, dt=self.frametime_s, skip_autocorr_if_length_over=skip_autocorr_if_length_over
    )
    return spectrum

Code copied from Mass version 1. Not up to date with latest style. Sorry.

Supported off versions: 0.1.0 has projectors and basis in json with base64 encoding 0.2.0 has projectors and basis after json as binary 0.3.0 adds pretrigDelta field

OffFile

Working with an OFF file: off = OffFile("filename") print off.dtype # show the fields available off[0] # get record 0 off[0]["coefs"] # get the model coefs for record 0 x,y = off.recordXY(0) plot(x,y) # plot record 0

Source code in mass2/core/offfiles.py
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
class OffFile:
    """
    Working with an OFF file:
    off = OffFile("filename")
    print off.dtype # show the fields available
    off[0] # get record 0
    off[0]["coefs"] # get the model coefs for record 0
    x,y = off.recordXY(0)
    plot(x,y) # plot record 0
    """

    def __init__(self, filename: str):
        self.filename = filename
        with open(self.filename, "rb") as f:
            self.headerString = readJsonString(f)
            # self.headerStringLength = f.tell() # doesn't work on windows because readline uses a readahead buffer
            self.headerStringLength = len(self.headerString)
        self.header = json.loads(self.headerString)
        self.dtype = recordDtype(self.header["FileFormatVersion"], self.header["NumberOfBases"])
        self._dtype_non_descriptive = recordDtype(
            self.header["FileFormatVersion"], self.header["NumberOfBases"], descriptive_coefs_names=False
        )
        self.framePeriodSeconds = float(self.header["FramePeriodSeconds"])

        # Estimate subframe division rate. If explicitly given in ReadoutInfo, use that.
        # Otherwise, if it's a Lancero-TDM system, then we know it's equal to the # of rows.
        # Otherwise, we don't know any way to estimate subframe divisions. (But add it if you think of any!)
        self.subframediv: int | None = None
        try:
            self.subframediv = self.header["ReadoutInfo"]["Subframedivisions"]
        except KeyError:
            try:
                if self.header["CreationInfo"]["SourceName"] == "Lancero":
                    self.subframediv = self.header["ReadoutInfo"]["NumberOfRows"]
            except KeyError:
                self.subframediv = None

        self.validateHeader()
        self._mmap: np.memmap | None = None
        self.projectors: NDArray | None = None
        self.basis: NDArray | None = None
        self._decodeModelInfo()  # calculates afterHeaderPos used by _updateMmap
        self._updateMmap()

    def close(self) -> None:
        """Close the memory map and projectors and basis memmaps"""
        del self._mmap
        del self.projectors
        del self.basis

    def validateHeader(self) -> None:
        "Check that the header looks like we expect, with a valid version code"
        with open(self.filename, "rb") as f:
            f.seek(self.headerStringLength - 2)
            if not f.readline().decode("utf-8") == "}\n":
                raise Exception("failed to find end of header")
        if self.header["FileFormat"] != "OFF":
            raise Exception("FileFormatVersion is {}, want OFF".format(self.header["FileFormatVersion"]))

    def _updateMmap(self, _nRecords: int | None = None) -> None:
        """Memory map an OFF file's data.
        `_nRecords` maps only a subset--designed for testing only
        """
        fileSize = os.path.getsize(self.filename)
        recordSize = fileSize - self.afterHeaderPos
        if _nRecords is None:
            self.nRecords = recordSize // self.dtype.itemsize
        else:  # for testing only
            self.nRecords = _nRecords
        self._mmap = np.memmap(self.filename, self.dtype, mode="r", offset=self.afterHeaderPos, shape=(self.nRecords,))
        self.shape = self._mmap.shape

    def __getitem__(self, *args: Any, **kwargs: Any) -> Any:
        "Make indexing into the off the same as indexing into the memory mapped array"
        assert self._mmap is not None
        return self._mmap.__getitem__(*args, **kwargs)

    def __len__(self) -> int:
        """Number of records in the OFF file"""
        assert self._mmap is not None
        return len(self._mmap)

    def __sizeof__(self) -> int:
        """Size of the memory mapped array in bytes"""
        assert self._mmap is not None
        return self._mmap.__sizeof__()

    def _decodeModelInfo(self) -> None:
        """Decode the model info (projectors and basis) from the OFF file, either from base64 in json
        or a later proprietary, binary format"""
        if (
            "RowMajorFloat64ValuesBase64" in self.header["ModelInfo"]["Projectors"]
            and "RowMajorFloat64ValuesBase64" in self.header["ModelInfo"]["Basis"]
        ):
            # should only be in version 0.1.0 files
            self._decodeModelInfoBase64()
        else:
            self._decodeModelInfoMmap()

    def _decodeModelInfoBase64(self) -> None:
        """Decode the model info (projectors and basis) from the OFF file, from base64 in json."""
        projectorsData = decodebytes(self.header["ModelInfo"]["Projectors"]["RowMajorFloat64ValuesBase64"].encode())
        projectorsRows = int(self.header["ModelInfo"]["Projectors"]["Rows"])
        projectorsCols = int(self.header["ModelInfo"]["Projectors"]["Cols"])
        self.projectors = np.frombuffer(projectorsData, np.float64)
        self.projectors = self.projectors.reshape((projectorsRows, projectorsCols))
        basisData = decodebytes(self.header["ModelInfo"]["Basis"]["RowMajorFloat64ValuesBase64"].encode())
        basisRows = int(self.header["ModelInfo"]["Basis"]["Rows"])
        basisCols = int(self.header["ModelInfo"]["Basis"]["Cols"])
        self.basis = np.frombuffer(basisData, np.float64)
        self.basis = self.basis.reshape((basisRows, basisCols))
        if basisRows != projectorsCols or basisCols != projectorsRows or self.header["NumberOfBases"] != projectorsRows:
            raise Exception(
                "basis shape should be transpose of projectors shape. have basis "
                f"({basisCols},{basisRows}), projectors ({projectorsCols},{projectorsRows}), "
                f"NumberOfBases {self.header['NumberOfBases']}"
            )
        self.afterHeaderPos = self.headerStringLength

    def _decodeModelInfoMmap(self) -> None:
        """Decode the model info (projectors and basis) from the OFF file, from binary."""
        projectorsRows = int(self.header["ModelInfo"]["Projectors"]["Rows"])
        projectorsCols = int(self.header["ModelInfo"]["Projectors"]["Cols"])
        basisRows = int(self.header["ModelInfo"]["Basis"]["Rows"])
        basisCols = int(self.header["ModelInfo"]["Basis"]["Cols"])
        # 8 for float64, basis and projectors have the same number of elements and therefore of bytes
        nBytes = basisCols * basisRows * 8
        projectorsPos = self.headerStringLength
        basisPos = projectorsPos + nBytes
        self.afterHeaderPos = basisPos + nBytes
        self.projectors = np.memmap(self.filename, np.float64, mode="r", offset=projectorsPos, shape=(projectorsRows, projectorsCols))
        self.basis = np.memmap(self.filename, np.float64, mode="r", offset=basisPos, shape=(basisRows, basisCols))
        if basisRows != projectorsCols or basisCols != projectorsRows or self.header["NumberOfBases"] != projectorsRows:
            raise Exception(
                "basis shape should be transpose of projectors shape. have basis "
                f"({basisCols},{basisRows}), projectors ({projectorsCols},{projectorsRows}), "
                f"NumberOfBases {self.header['NumberOfBases']}"
            )

    def __repr__(self) -> str:
        """Return a string representation of the OffFile object."""
        return "<OFF file> {}, {} records, {} length basis\n".format(self.filename, self.nRecords, self.header["NumberOfBases"])

    def sampleTimes(self, i: int) -> NDArray:
        """return a vector of sample times for record i, approriate for plotting"""
        recordSamples = self[i]["recordSamples"]
        recordPreSamples = self[i]["recordPreSamples"]
        return np.arange(-recordPreSamples, recordSamples - recordPreSamples) * self.framePeriodSeconds

    def modeledPulse(self, i: int) -> NDArray:
        """return a vector of the modeled pulse samples, the best available value of the actual raw samples"""
        # projectors has size (n,z) where it is (rows,cols)
        # basis has size (z,n)
        # coefs has size (n,1)
        # coefs (n,1) = projectors (n,z) * data (z,1)
        # modelData (z,1) = basis (z,n) * coefs (n,1)
        # n = number of basis (eg 3)
        # z = record length (eg 4)

        # .view(self._dtype_non_descriptive) should be a copy-free way of changing
        # the dtype so we can access the coefs all together
        assert self.basis is not None
        allVals = np.matmul(self.basis, self._mmap_with_coefs[i]["coefs"])
        return np.asarray(allVals)

    def recordXY(self, i: int) -> tuple[NDArray, NDArray]:
        """return (x,y) for record i, where x is time and y is modeled pulse"""
        return self.sampleTimes(i), self.modeledPulse(i)

    @property
    def _mmap_with_coefs(self) -> NDArray:
        """Return a view of the memmap with the coefs all together in one field"""
        assert self._mmap is not None
        return self._mmap.view(self._dtype_non_descriptive)

    def view(self, *args: Any) -> NDArray:
        """Return a view of the memmap with the given args"""
        assert self._mmap is not None
        return self._mmap.view(*args)

__getitem__(*args, **kwargs)

Make indexing into the off the same as indexing into the memory mapped array

Source code in mass2/core/offfiles.py
143
144
145
146
def __getitem__(self, *args: Any, **kwargs: Any) -> Any:
    "Make indexing into the off the same as indexing into the memory mapped array"
    assert self._mmap is not None
    return self._mmap.__getitem__(*args, **kwargs)

__len__()

Number of records in the OFF file

Source code in mass2/core/offfiles.py
148
149
150
151
def __len__(self) -> int:
    """Number of records in the OFF file"""
    assert self._mmap is not None
    return len(self._mmap)

__repr__()

Return a string representation of the OffFile object.

Source code in mass2/core/offfiles.py
210
211
212
def __repr__(self) -> str:
    """Return a string representation of the OffFile object."""
    return "<OFF file> {}, {} records, {} length basis\n".format(self.filename, self.nRecords, self.header["NumberOfBases"])

__sizeof__()

Size of the memory mapped array in bytes

Source code in mass2/core/offfiles.py
153
154
155
156
def __sizeof__(self) -> int:
    """Size of the memory mapped array in bytes"""
    assert self._mmap is not None
    return self._mmap.__sizeof__()

close()

Close the memory map and projectors and basis memmaps

Source code in mass2/core/offfiles.py
115
116
117
118
119
def close(self) -> None:
    """Close the memory map and projectors and basis memmaps"""
    del self._mmap
    del self.projectors
    del self.basis

modeledPulse(i)

return a vector of the modeled pulse samples, the best available value of the actual raw samples

Source code in mass2/core/offfiles.py
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
def modeledPulse(self, i: int) -> NDArray:
    """return a vector of the modeled pulse samples, the best available value of the actual raw samples"""
    # projectors has size (n,z) where it is (rows,cols)
    # basis has size (z,n)
    # coefs has size (n,1)
    # coefs (n,1) = projectors (n,z) * data (z,1)
    # modelData (z,1) = basis (z,n) * coefs (n,1)
    # n = number of basis (eg 3)
    # z = record length (eg 4)

    # .view(self._dtype_non_descriptive) should be a copy-free way of changing
    # the dtype so we can access the coefs all together
    assert self.basis is not None
    allVals = np.matmul(self.basis, self._mmap_with_coefs[i]["coefs"])
    return np.asarray(allVals)

recordXY(i)

return (x,y) for record i, where x is time and y is modeled pulse

Source code in mass2/core/offfiles.py
236
237
238
def recordXY(self, i: int) -> tuple[NDArray, NDArray]:
    """return (x,y) for record i, where x is time and y is modeled pulse"""
    return self.sampleTimes(i), self.modeledPulse(i)

sampleTimes(i)

return a vector of sample times for record i, approriate for plotting

Source code in mass2/core/offfiles.py
214
215
216
217
218
def sampleTimes(self, i: int) -> NDArray:
    """return a vector of sample times for record i, approriate for plotting"""
    recordSamples = self[i]["recordSamples"]
    recordPreSamples = self[i]["recordPreSamples"]
    return np.arange(-recordPreSamples, recordSamples - recordPreSamples) * self.framePeriodSeconds

validateHeader()

Check that the header looks like we expect, with a valid version code

Source code in mass2/core/offfiles.py
121
122
123
124
125
126
127
128
def validateHeader(self) -> None:
    "Check that the header looks like we expect, with a valid version code"
    with open(self.filename, "rb") as f:
        f.seek(self.headerStringLength - 2)
        if not f.readline().decode("utf-8") == "}\n":
            raise Exception("failed to find end of header")
    if self.header["FileFormat"] != "OFF":
        raise Exception("FileFormatVersion is {}, want OFF".format(self.header["FileFormatVersion"]))

view(*args)

Return a view of the memmap with the given args

Source code in mass2/core/offfiles.py
246
247
248
249
def view(self, *args: Any) -> NDArray:
    """Return a view of the memmap with the given args"""
    assert self._mmap is not None
    return self._mmap.view(*args)

readJsonString(f)

look in file f for a line "}\n" and return all contents up to that point for an OFF file this can be parsed by json.dumps and all remaining data is records

Source code in mass2/core/offfiles.py
57
58
59
60
61
62
63
64
65
66
67
68
def readJsonString(f: io.BufferedReader) -> str:
    """look in file f for a line "}\\n" and return all contents up to that point
    for an OFF file this can be parsed by json.dumps
    and all remaining data is records"""
    lines: list[str] = []
    while True:
        line = f.readline().decode("utf-8")
        lines += line
        if line == "}\n":
            return "".join(lines)
        if len(line) == 0:
            raise Exception("""reached end of file without finding an end-of-JSON line "}\\n" """)

recordDtype(offVersion, nBasis, descriptive_coefs_names=True)

return a np.dtype matching the record datatype for the given offVersion and nBasis descriptive_coefs_names - determines how the modeled pulse coefficients are name, you usually want True For True, the names will be derivLike, pulseLike, and if nBasis>3, also extraCoefs For False, they will all have the single name coefs. False is to make implementing recordXY and other methods that want access to all coefs simultaneously easier

Source code in mass2/core/offfiles.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def recordDtype(offVersion: str, nBasis: int, descriptive_coefs_names: bool = True) -> np.dtype:
    """return a np.dtype matching the record datatype for the given offVersion and nBasis
    descriptive_coefs_names - determines how the modeled pulse coefficients are name, you usually want True
    For True, the names will be `derivLike`, `pulseLike`, and if nBasis>3, also `extraCoefs`
    For False, they will all have the single name `coefs`. False is to make implementing recordXY and other
    methods that want access to all coefs simultaneously easier"""
    if offVersion in {"0.1.0", "0.2.0"}:
        # start of the dtype is identical for all cases
        dt_list: list[tuple[str, type] | tuple[str, type, int]] = [
            ("recordSamples", np.int32),
            ("recordPreSamples", np.int32),
            ("framecount", np.int64),
            ("unixnano", np.int64),
            ("pretriggerMean", np.float32),
            ("residualStdDev", np.float32),
        ]
    elif offVersion == "0.3.0":
        dt_list = [
            ("recordSamples", np.int32),
            ("recordPreSamples", np.int32),
            ("framecount", np.int64),
            ("unixnano", np.int64),
            ("pretriggerMean", np.float32),
            ("pretriggerDelta", np.float32),
            ("residualStdDev", np.float32),
        ]
    else:
        raise Exception(f"dtype for OFF version {offVersion} not implemented")

    if descriptive_coefs_names:
        dt_list += [("pulseMean", np.float32), ("derivativeLike", np.float32), ("filtValue", np.float32)]
        if nBasis > 3:
            dt_list += [("extraCoefs", np.float32, (nBasis - 3))]
    else:
        dt_list += [("coefs", np.float32, (nBasis))]
    return np.dtype(dt_list)

Classes to create time-domain and Fourier-domain optimal filters.

Filter dataclass

Bases: ABC

A single optimal filter, possibly with optimal estimators of the Delta-t and of the DC level.

Returns:
  • Filter –

    A set of optimal filter values.

    These values are chosen with the following specifics: * one model of the pulses and of the noise, including pulse record length, * a first-order arrival-time detection filter is (optionally) computed * filtering model (1-lag, or other odd # of lags), * low-pass smoothing of the filter itself, * a fixed number of samples "cut" (given zero weight) at the start and/or end of records.

    The object also stores the pulse shape and (optionally) the delta-T shape used to generate it, and the low-pass filter's fmax or f_3db (cutoff or rolloff) frequency.

    It also stores the predicted variance due to noise and the resulting predicted_v_over_dv, the ratio of the filtered pulse height to the (FWHM) noise, in pulse height units. Both of these values assume pulses of the same size as that used to generate the filter: nominal_peak.

Source code in mass2/core/optimal_filtering.py
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
@dataclass(frozen=True)
class Filter(ABC):
    """A single optimal filter, possibly with optimal estimators of the Delta-t and of the DC level.

    Returns
    -------
    Filter
        A set of optimal filter values.

        These values are chosen with the following specifics:
        * one model of the pulses and of the noise, including pulse record length,
        * a first-order arrival-time detection filter is (optionally) computed
        * filtering model (1-lag, or other odd # of lags),
        * low-pass smoothing of the filter itself,
        * a fixed number of samples "cut" (given zero weight) at the start and/or end of records.

        The object also stores the pulse shape and (optionally) the delta-T shape used to generate it,
        and the low-pass filter's fmax or f_3db (cutoff or rolloff) frequency.

        It also stores the predicted `variance` due to noise and the resulting `predicted_v_over_dv`,
        the ratio of the filtered pulse height to the (FWHM) noise, in pulse height units. Both
        of these values assume pulses of the same size as that used to generate the filter: `nominal_peak`.

    """

    values: np.ndarray
    nominal_peak: float
    variance: float
    predicted_v_over_dv: float
    n_pretrigger: int
    dt_values: np.ndarray | None
    const_values: np.ndarray | None
    signal_model: np.ndarray | None
    dt_model: np.ndarray | None
    convolution_lags: int = 1
    fmax: float | None = None
    f_3db: float | None = None
    cut_pre: int = 0
    cut_post: int = 0

    @property
    @abstractmethod
    def is_arrival_time_safe(self) -> bool:
        """Is this an arrival-time-safe filter?"""
        return False

    @property
    @abstractmethod
    def _filter_type(self) -> str:
        """The name for this filter type"""
        return "illegal: this is supposed to be an abstract base class"

    def plot(self, axis: plt.Axes | None = None, **kwargs: Any) -> None:
        """Make a plot of the filter

        Parameters
        ----------
        axis : plt.Axes, optional
            A pre-existing axis to plot on, by default None
        """
        if axis is None:
            plt.clf()
            axis = plt.subplot(111)
        t = np.arange(len(self.values)) - self.n_pretrigger
        axis.plot(t, self.values, label="mass 5lag filter", **kwargs)
        axis.grid()
        axis.set_title(f"Filter type={self._filter_type} V/dV={self.predicted_v_over_dv:.2f}")
        axis.set_ylabel("filter value")
        axis.set_xlabel("Samples")
        plt.gcf().tight_layout()

    def report(self, std_energy: float = 5898.8) -> None:
        """Report on estimated V/dV for the filter.

        Parameters
        ----------
        std_energy : float, optional
            Energy (in eV) of a "standard" pulse.  Resolution will be given in eV at this energy,
                assuming linear devices, by default 5898.8
        """
        var = self.variance
        v_dv = self.predicted_v_over_dv
        fwhm_eV = std_energy / v_dv
        print(f"v/\u03b4v: {v_dv: .2f}, variance: {var:.2f} \u03b4E: {fwhm_eV:.2f} eV (FWHM), assuming standard E={std_energy:.2f} eV")

    @abstractmethod
    def filter_records(self, x: ArrayLike) -> tuple[np.ndarray, np.ndarray]:
        """Filter one microcalorimeter record or an array of records."""
        pass

is_arrival_time_safe abstractmethod property

Is this an arrival-time-safe filter?

filter_records(x) abstractmethod

Filter one microcalorimeter record or an array of records.

Source code in mass2/core/optimal_filtering.py
318
319
320
321
@abstractmethod
def filter_records(self, x: ArrayLike) -> tuple[np.ndarray, np.ndarray]:
    """Filter one microcalorimeter record or an array of records."""
    pass

plot(axis=None, **kwargs)

Make a plot of the filter

Parameters:
  • axis (Axes, default: None ) –

    A pre-existing axis to plot on, by default None

Source code in mass2/core/optimal_filtering.py
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
def plot(self, axis: plt.Axes | None = None, **kwargs: Any) -> None:
    """Make a plot of the filter

    Parameters
    ----------
    axis : plt.Axes, optional
        A pre-existing axis to plot on, by default None
    """
    if axis is None:
        plt.clf()
        axis = plt.subplot(111)
    t = np.arange(len(self.values)) - self.n_pretrigger
    axis.plot(t, self.values, label="mass 5lag filter", **kwargs)
    axis.grid()
    axis.set_title(f"Filter type={self._filter_type} V/dV={self.predicted_v_over_dv:.2f}")
    axis.set_ylabel("filter value")
    axis.set_xlabel("Samples")
    plt.gcf().tight_layout()

report(std_energy=5898.8)

Report on estimated V/dV for the filter.

Parameters:
  • std_energy (float, default: 5898.8 ) –

    Energy (in eV) of a "standard" pulse. Resolution will be given in eV at this energy, assuming linear devices, by default 5898.8

Source code in mass2/core/optimal_filtering.py
304
305
306
307
308
309
310
311
312
313
314
315
316
def report(self, std_energy: float = 5898.8) -> None:
    """Report on estimated V/dV for the filter.

    Parameters
    ----------
    std_energy : float, optional
        Energy (in eV) of a "standard" pulse.  Resolution will be given in eV at this energy,
            assuming linear devices, by default 5898.8
    """
    var = self.variance
    v_dv = self.predicted_v_over_dv
    fwhm_eV = std_energy / v_dv
    print(f"v/\u03b4v: {v_dv: .2f}, variance: {var:.2f} \u03b4E: {fwhm_eV:.2f} eV (FWHM), assuming standard E={std_energy:.2f} eV")

Filter1Lag dataclass

Bases: Filter

Represent an optimal filter, specifically one intended for single-lag convolution with data

Returns:
  • Filter1Lag –

    An optimal filter, for single-lag convolution (i.e., a dot product) with the data

Source code in mass2/core/optimal_filtering.py
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
@dataclass(frozen=True)
class Filter1Lag(Filter):
    """Represent an optimal filter, specifically one intended for single-lag convolution with data

    Returns
    -------
    Filter1Lag
        An optimal filter, for single-lag convolution (i.e., a dot product) with the data
    """

    def __post_init__(self) -> None:
        """Post-init checks that this filter, indeed, is a 1-lag one"""
        assert self.convolution_lags == 1

    @property
    def is_arrival_time_safe(self) -> bool:
        """Is this an arrival-time-safe filter?"""
        return False

    @property
    def _filter_type(self) -> str:
        """Name for this filter type"""
        return "1lag"

    def filter_records(self, x: ArrayLike) -> tuple[np.ndarray, np.ndarray]:
        """Filter one microcalorimeter record or an array of records.

        Parameters
        ----------
        x : ArrayLike
            A 1-d array, a single pulse record, or a 2-d array, where `x[i, :]` is pulse record number `i`.

        Returns
        -------
        tuple[np.ndarray, np.ndarray]
            1. The optimally filtered value, or an array (one per row) if the input is a 2-d array.
            2. The phase, or arrival-time estimate in samples. Same shape as the filtered value.

        Raises
        ------
        AssertionError
            If the input array is the wrong length
        """
        x = np.asarray(x)
        if x.ndim == 1:
            x = x.reshape((1, len(x)))
        _, nsamp = x.shape
        assert nsamp == len(self.values)
        dotproduct = np.dot(x, self.values)

        peak_x = np.zeros_like(x)
        peak_y = dotproduct
        return peak_y, peak_x

is_arrival_time_safe property

Is this an arrival-time-safe filter?

__post_init__()

Post-init checks that this filter, indeed, is a 1-lag one

Source code in mass2/core/optimal_filtering.py
414
415
416
def __post_init__(self) -> None:
    """Post-init checks that this filter, indeed, is a 1-lag one"""
    assert self.convolution_lags == 1

filter_records(x)

Filter one microcalorimeter record or an array of records.

Parameters:
  • x (ArrayLike) –

    A 1-d array, a single pulse record, or a 2-d array, where x[i, :] is pulse record number i.

Returns:
  • tuple[ndarray, ndarray] –
    1. The optimally filtered value, or an array (one per row) if the input is a 2-d array.
    2. The phase, or arrival-time estimate in samples. Same shape as the filtered value.
Raises:
  • AssertionError –

    If the input array is the wrong length

Source code in mass2/core/optimal_filtering.py
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
def filter_records(self, x: ArrayLike) -> tuple[np.ndarray, np.ndarray]:
    """Filter one microcalorimeter record or an array of records.

    Parameters
    ----------
    x : ArrayLike
        A 1-d array, a single pulse record, or a 2-d array, where `x[i, :]` is pulse record number `i`.

    Returns
    -------
    tuple[np.ndarray, np.ndarray]
        1. The optimally filtered value, or an array (one per row) if the input is a 2-d array.
        2. The phase, or arrival-time estimate in samples. Same shape as the filtered value.

    Raises
    ------
    AssertionError
        If the input array is the wrong length
    """
    x = np.asarray(x)
    if x.ndim == 1:
        x = x.reshape((1, len(x)))
    _, nsamp = x.shape
    assert nsamp == len(self.values)
    dotproduct = np.dot(x, self.values)

    peak_x = np.zeros_like(x)
    peak_y = dotproduct
    return peak_y, peak_x

Filter5Lag dataclass

Bases: Filter

Represent an optimal filter, specifically one intended for 5-lag convolution with data

The traditional 5-lag filter used by default until 2015.
Returns:
  • Filter5Lag –

    An optimal filter, for convolution with data (at 5 lags, obviously)

Source code in mass2/core/optimal_filtering.py
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
@dataclass(frozen=True)
class Filter5Lag(Filter):
    """Represent an optimal filter, specifically one intended for 5-lag convolution with data

        The traditional 5-lag filter used by default until 2015.

    Returns
    -------
    Filter5Lag
        An optimal filter, for convolution with data (at 5 lags, obviously)
    """

    def __post_init__(self) -> None:
        """Post-init checks that this filter, indeed, is a 5-lag one"""
        assert self.convolution_lags == 5

    @property
    def is_arrival_time_safe(self) -> bool:
        """Is this an arrival-time-safe filter?"""
        return False

    @property
    def _filter_type(self) -> str:
        """Name for this filter type"""
        return "5lag"

    # These parameters fit a parabola to any 5 evenly-spaced points
    FIVELAG_FITTER = (
        np.array(
            (
                (-6, 24, 34, 24, -6),
                (-14, -7, 0, 7, 14),
                (10, -5, -10, -5, 10),
            ),
            dtype=float,
        )
        / 70.0
    )

    def filter_records(self, x: ArrayLike) -> tuple[np.ndarray, np.ndarray]:
        """Filter one microcalorimeter record or an array of records.

        Parameters
        ----------
        x : ArrayLike
            A 1-d array, a single pulse record, or a 2-d array, where `x[i, :]` is pulse record number `i`.

        Returns
        -------
        tuple[np.ndarray, np.ndarray]
            1. The optimally filtered value, or an array (one per row) if the input is a 2-d array.
            2. The phase, or arrival-time estimate in samples. Same shape as the filtered value.

        Raises
        ------
        AssertionError
            If the input array is the wrong length
        """
        x = np.asarray(x)
        if x.ndim == 1:
            x = x.reshape((1, len(x)))
        nrec, nsamp = x.shape
        nlags = self.convolution_lags
        assert nsamp == len(self.values) + nlags - 1
        nrec = x.shape[0]
        conv = np.zeros((nlags, nrec), dtype=float)
        for i in range(nlags - 1):
            conv[i, :] = np.dot(x[:, i : i + 1 - nlags], self.values)
        conv[nlags - 1, :] = np.dot(x[:, nlags - 1 :], self.values)

        # Least-squares fit of 5 values to a parabola.
        # Order is row 0 = constant ... row 2 = quadratic coefficients.
        if nlags != 5:
            raise NotImplementedError("Currently require 5 lags to estimate peak x, y")
        param = np.dot(self.FIVELAG_FITTER, conv)
        peak_x = -0.5 * param[1, :] / param[2, :]
        peak_y = param[0, :] - 0.25 * param[1, :] ** 2 / param[2, :]
        return peak_y, peak_x

is_arrival_time_safe property

Is this an arrival-time-safe filter?

__post_init__()

Post-init checks that this filter, indeed, is a 5-lag one

Source code in mass2/core/optimal_filtering.py
336
337
338
def __post_init__(self) -> None:
    """Post-init checks that this filter, indeed, is a 5-lag one"""
    assert self.convolution_lags == 5

filter_records(x)

Filter one microcalorimeter record or an array of records.

Parameters:
  • x (ArrayLike) –

    A 1-d array, a single pulse record, or a 2-d array, where x[i, :] is pulse record number i.

Returns:
  • tuple[ndarray, ndarray] –
    1. The optimally filtered value, or an array (one per row) if the input is a 2-d array.
    2. The phase, or arrival-time estimate in samples. Same shape as the filtered value.
Raises:
  • AssertionError –

    If the input array is the wrong length

Source code in mass2/core/optimal_filtering.py
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
def filter_records(self, x: ArrayLike) -> tuple[np.ndarray, np.ndarray]:
    """Filter one microcalorimeter record or an array of records.

    Parameters
    ----------
    x : ArrayLike
        A 1-d array, a single pulse record, or a 2-d array, where `x[i, :]` is pulse record number `i`.

    Returns
    -------
    tuple[np.ndarray, np.ndarray]
        1. The optimally filtered value, or an array (one per row) if the input is a 2-d array.
        2. The phase, or arrival-time estimate in samples. Same shape as the filtered value.

    Raises
    ------
    AssertionError
        If the input array is the wrong length
    """
    x = np.asarray(x)
    if x.ndim == 1:
        x = x.reshape((1, len(x)))
    nrec, nsamp = x.shape
    nlags = self.convolution_lags
    assert nsamp == len(self.values) + nlags - 1
    nrec = x.shape[0]
    conv = np.zeros((nlags, nrec), dtype=float)
    for i in range(nlags - 1):
        conv[i, :] = np.dot(x[:, i : i + 1 - nlags], self.values)
    conv[nlags - 1, :] = np.dot(x[:, nlags - 1 :], self.values)

    # Least-squares fit of 5 values to a parabola.
    # Order is row 0 = constant ... row 2 = quadratic coefficients.
    if nlags != 5:
        raise NotImplementedError("Currently require 5 lags to estimate peak x, y")
    param = np.dot(self.FIVELAG_FITTER, conv)
    peak_x = -0.5 * param[1, :] / param[2, :]
    peak_y = param[0, :] - 0.25 * param[1, :] ** 2 / param[2, :]
    return peak_y, peak_x

FilterATS dataclass

Bases: Filter

Represent an optimal filter according to the arrival-time-safe, single-lag design of 2015.

Returns:
  • FilterATS –

    An optimal filter, for convolution with data (at 5 lags, obviously)

Source code in mass2/core/optimal_filtering.py
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
@dataclass(frozen=True)
class FilterATS(Filter):
    """Represent an optimal filter according to the arrival-time-safe, single-lag design of 2015.

    Returns
    -------
    FilterATS
        An optimal filter, for convolution with data (at 5 lags, obviously)
    """

    def __post_init__(self) -> None:
        """Post-init checks that this filter aexpects one lag, and has a dt_values array"""
        assert self.convolution_lags == 1
        assert self.dt_values is not None

    @property
    def is_arrival_time_safe(self) -> bool:
        """Is this an arrival-time-safe filter?"""
        return True

    @property
    def _filter_type(self) -> str:
        """Return the name for this filter type"""
        return "ats"

    def filter_records(self, x: ArrayLike) -> tuple[np.ndarray, np.ndarray]:
        """Filter one microcalorimeter record or an array of records.

        Parameters
        ----------
        x : ArrayLike
            A 1-d array, a single pulse record, or a 2-d array, each row a pulse records.

        Returns
        -------
        tuple[np.ndarray, np.ndarray]
            1. The optimally filtered value, or an array (one per row) if the input is a 2-d array.
            2. The phase, or arrival-time estimate in samples. Same shape as the filtered value.

        Raises
        ------
        AssertionError
            If the input array is the wrong length
        """
        x = np.asarray(x)
        if x.ndim == 1:
            x = x.reshape((1, len(x)))
        _, nsamp = x.shape

        assert nsamp == len(self.values)
        return _filter_records_ats(x, self.values, self.dt_values)

is_arrival_time_safe property

Is this an arrival-time-safe filter?

__post_init__()

Post-init checks that this filter aexpects one lag, and has a dt_values array

Source code in mass2/core/optimal_filtering.py
469
470
471
472
def __post_init__(self) -> None:
    """Post-init checks that this filter aexpects one lag, and has a dt_values array"""
    assert self.convolution_lags == 1
    assert self.dt_values is not None

filter_records(x)

Filter one microcalorimeter record or an array of records.

Parameters:
  • x (ArrayLike) –

    A 1-d array, a single pulse record, or a 2-d array, each row a pulse records.

Returns:
  • tuple[ndarray, ndarray] –
    1. The optimally filtered value, or an array (one per row) if the input is a 2-d array.
    2. The phase, or arrival-time estimate in samples. Same shape as the filtered value.
Raises:
  • AssertionError –

    If the input array is the wrong length

Source code in mass2/core/optimal_filtering.py
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
def filter_records(self, x: ArrayLike) -> tuple[np.ndarray, np.ndarray]:
    """Filter one microcalorimeter record or an array of records.

    Parameters
    ----------
    x : ArrayLike
        A 1-d array, a single pulse record, or a 2-d array, each row a pulse records.

    Returns
    -------
    tuple[np.ndarray, np.ndarray]
        1. The optimally filtered value, or an array (one per row) if the input is a 2-d array.
        2. The phase, or arrival-time estimate in samples. Same shape as the filtered value.

    Raises
    ------
    AssertionError
        If the input array is the wrong length
    """
    x = np.asarray(x)
    if x.ndim == 1:
        x = x.reshape((1, len(x)))
    _, nsamp = x.shape

    assert nsamp == len(self.values)
    return _filter_records_ats(x, self.values, self.dt_values)

FilterMaker dataclass

An object capable of creating optimal filter based on a single signal and noise set.

Parameters:
  • signal_model (ArrayLike) –

    The average signal shape. Filters will be rescaled so that the output upon putting this signal into the filter equals the peak value of this filter (that is, peak value relative to the baseline level).

  • n_pretrigger (int) –

    The number of leading samples in the average signal that are considered to be pre-trigger samples. The avg_signal in this section is replaced by its constant averaged value before creating filters. Also, one filter (filt_baseline_pretrig) is designed to infer the baseline using only n_pretrigger samples at the start of a record.

  • noise_autocorr (Optional[ArrayLike], default: None ) –

    The autocorrelation function of the noise, where the lag spacing is assumed to be the same as the sample period of avg_signal.

  • noise_psd (Optional[ArrayLike], default: None ) –

    The noise power spectral density. If not None, then it must be of length (2N+1), where N is the length of avg_signal, and its values are assumed to cover the non-negative frequencies from 0, 1/Delta, 2/Delta,.... up to the Nyquist frequency. If None, then method compute_fourier() will not work.

  • whitener (Optional[ToeplitzWhitener], default: None ) –

    An optional function object which, when called, whitens a vector or the columns of a matrix. Supersedes noise_autocorr if both are given.

  • sample_time_sec (float, default: 0.0 ) –

    The time step between samples in avg_signal and noise_autocorr (in seconds). This must be given if fmax or f_3db are ever to be used.

  • peak (float, default: 0.0 ) –

    The peak amplitude of the standard signal

Notes
  • If both noise_autocorr and whitener are None, then methods compute_5lag and compute_ats will both fail, as they require a time-domain characterization of the noise.

  • The units of noise_autocorr are the square of the units used in signal_model and/or peak. The units of whitener are the inverse of the signal units. Any rescaling of the noise autocorrelation or whitener does not affect any filter values, but only the predicted signal/noise ratios.

  • The units of noise_psd are square signal units, per Hertz.

Returns:
  • FilterMaker –

    An object that can make a variety of optimal filters, assuming a single signal and noise analysis.

Source code in mass2/core/optimal_filtering.py
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
@dataclass(frozen=True)
class FilterMaker:
    """An object capable of creating optimal filter based on a single signal and noise set.

    Parameters
    ---------
    signal_model : ArrayLike
        The average signal shape.  Filters will be rescaled so that the output
        upon putting this signal into the filter equals the *peak value* of this
        filter (that is, peak value relative to the baseline level).
    n_pretrigger : int
        The number of leading samples in the average signal that are considered
        to be pre-trigger samples.  The avg_signal in this section is replaced by
        its constant averaged value before creating filters.  Also, one filter
        (filt_baseline_pretrig) is designed to infer the baseline using only
        `n_pretrigger` samples at the start of a record.
    noise_autocorr : Optional[ArrayLike]
        The autocorrelation function of the noise, where the lag spacing is
        assumed to be the same as the sample period of `avg_signal`.
    noise_psd : Optional[ArrayLike]
        The noise power spectral density.  If not None, then it must be of length (2N+1),
        where N is the length of `avg_signal`, and its values are assumed to cover the
        non-negative frequencies from 0, 1/Delta, 2/Delta,.... up to the Nyquist frequency.
        If None, then method `compute_fourier()` will not work.
    whitener : Optional[ToeplitzWhitener]
        An optional function object which, when called, whitens a vector or the
        columns of a matrix. Supersedes `noise_autocorr` if both are given.
    sample_time_sec : float
        The time step between samples in `avg_signal` and `noise_autocorr` (in seconds).
        This must be given if `fmax` or `f_3db` are ever to be used.
    peak : float
        The peak amplitude of the standard signal


    Notes
    -----

    * If both `noise_autocorr` and `whitener` are None, then methods `compute_5lag` and
    `compute_ats` will both fail, as they require a time-domain characterization of the
    noise.

    * The units of `noise_autocorr` are the square of the units used in `signal_model` and/or
    `peak`. The units of `whitener` are the inverse of the signal units.  Any rescaling of the
    noise autocorrelation or whitener does not affect any filter values, but only
    the predicted signal/noise ratios.

    * The units of `noise_psd` are square signal units, per Hertz.

    Returns
    -------
    FilterMaker
        An object that can make a variety of optimal filters, assuming a single signal and noise analysis.
    """

    signal_model: NDArray
    n_pretrigger: int
    noise_autocorr: NDArray | None = None
    noise_psd: NDArray | None = None
    dt_model: NDArray | None = None
    whitener: ToeplitzWhitener | None = None
    sample_time_sec: float = 0.0
    peak: float = 0.0

    def compute_constrained_1lag(
        self,
        constraints: ArrayLike | None = None,
        fmax: float | None = None,
        f_3db: float | None = None,
        cut_pre: int = 0,
        cut_post: int = 0,
    ) -> Filter:
        """Compute a single constrained optimal filter, with optional low-pass filtering, and with optional zero
        weights at the pre-trigger or post-trigger end of the filter. Can be used with 0-lag "convolution" only,
        so cannot estimate arrival time.

        Either or both of `fmax` and `f_3db` are allowed.

        Parameters
        ----------
        constraints: ndarray, optional
            The vector or vectors to which the filter should be orthogonal. If a 2d array, each _row_
            is a constraint, and the number of columns should be equal to the len(self.signal_model)
            minus `(cut_pre+cut_post)`.
        fmax : Optional[float], optional
            The strict maximum frequency to be passed in all filters, by default None
        f_3db : Optional[float], optional
            The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None
        cut_pre : int
            The number of initial samples to be given zero weight, by default 0
        cut_post : int
            The number of samples at the end of a record to be given zero weight, by default 0

        Returns
        -------
        Filter
            A 1-lag optimal filter.

        Raises
        ------
        ValueError
            Under various conditions where arguments are inconsistent with the data
        """

        if self.sample_time_sec <= 0 and not (fmax is None and f_3db is None):
            raise ValueError("FilterMaker must have a sample_time_sec if it's to be smoothed with fmax or f_3db")
        if cut_pre < 0 or cut_post < 0:
            raise ValueError(f"(cut_pre,cut_post)=({cut_pre},{cut_post}), but neither can be negative")

        if self.noise_autocorr is None and self.whitener is None:
            raise ValueError("FilterMaker must have noise_autocorr or whitener arguments to generate 1-lag filters")
        noise_autocorr = self._compute_autocorr(cut_pre, cut_post)
        avg_signal, peak, _ = self._normalize_signal(cut_pre, cut_post)

        n = len(avg_signal)
        assert len(noise_autocorr) >= n, "Noise autocorrelation vector is too short for signal size"
        pulse_model = np.vstack((avg_signal, np.ones_like(avg_signal)))
        if constraints is not None:
            pulse_model = np.vstack((pulse_model, constraints))
        assert pulse_model.shape[1] == n

        noise_corr = noise_autocorr[:n]
        TS = ToeplitzSolver(noise_corr, symmetric=True)
        Rinv_model = np.vstack([TS(r) for r in pulse_model])
        A = pulse_model.dot(Rinv_model.T)
        all_filters = np.linalg.solve(A, Rinv_model)
        filt_noconst = all_filters[0]

        band_limit(filt_noconst, self.sample_time_sec, fmax, f_3db)

        self._normalize_filter(filt_noconst, avg_signal)
        variance = bracketR(filt_noconst, noise_corr)

        # Set weights in the cut_pre and cut_post windows to 0
        if cut_pre > 0 or cut_post > 0:
            filt_noconst = np.hstack([np.zeros(cut_pre), filt_noconst, np.zeros(cut_post)])

        if variance <= 0:
            vdv = np.inf
        else:
            vdv = peak / (8 * np.log(2) * variance) ** 0.5
        return Filter1Lag(
            filt_noconst,
            peak,
            variance,
            vdv,
            self.n_pretrigger,
            None,
            None,
            avg_signal,
            None,
            1,
            fmax,
            f_3db,
            cut_pre,
            cut_post,
        )

    def compute_1lag(self, fmax: float | None = None, f_3db: float | None = None, cut_pre: int = 0, cut_post: int = 0) -> Filter:
        """Compute a single filter, with optional low-pass filtering, and with optional zero
        weights at the pre-trigger or post-trigger end of the filter. Can be used with 0-lag "convolution" only,
        so cannot estimate arrival time.

        Either or both of `fmax` and `f_3db` are allowed.

        Parameters
        ----------
        fmax : Optional[float], optional
            The strict maximum frequency to be passed in all filters, by default None
        f_3db : Optional[float], optional
            The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None
        cut_pre : int
            The number of initial samples to be given zero weight, by default 0
        cut_post : int
            The number of samples at the end of a record to be given zero weight, by default 0

        Returns
        -------
        Filter
            A 1-lag optimal filter.

        Raises
        ------
        ValueError
            Under various conditions where arguments are inconsistent with the data
        """
        return self.compute_constrained_1lag(None, fmax=fmax, f_3db=f_3db, cut_pre=cut_pre, cut_post=cut_post)

    def compute_constrained_5lag(
        self,
        constraints: ArrayLike | None = None,
        fmax: float | None = None,
        f_3db: float | None = None,
        cut_pre: int = 0,
        cut_post: int = 0,
    ) -> Filter:
        """Compute a single constrained optimal filter, with optional low-pass filtering, and with optional zero
        weights at the pre-trigger or post-trigger end of the filter.

        Either or both of `fmax` and `f_3db` are allowed.

        Parameters
        ----------
        constraints: ndarray, optional
            The vector or vectors to which the filter should be orthogonal. If a 2d array, each _row_
            is a constraint, and the number of columns should be equal to the len(self.signal_model)
            minus `(cut_pre+cut_post)`.
        fmax : Optional[float], optional
            The strict maximum frequency to be passed in all filters, by default None
        f_3db : Optional[float], optional
            The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None
        cut_pre : int
            The number of initial samples to be given zero weight, by default 0
        cut_post : int
            The number of samples at the end of a record to be given zero weight, by default 0

        Returns
        -------
        Filter
            A 5-lag optimal filter.

        Raises
        ------
        ValueError
            Under various conditions where arguments are inconsistent with the data
        """

        if self.sample_time_sec <= 0 and not (fmax is None and f_3db is None):
            raise ValueError("FilterMaker must have a sample_time_sec if it's to be smoothed with fmax or f_3db")
        if cut_pre < 0 or cut_post < 0:
            raise ValueError(f"(cut_pre,cut_post)=({cut_pre},{cut_post}), but neither can be negative")

        if self.noise_autocorr is None and self.whitener is None:
            raise ValueError("FilterMaker must have noise_autocorr or whitener arguments to generate 5-lag filters")
        noise_autocorr = self._compute_autocorr(cut_pre, cut_post)
        avg_signal, peak, _ = self._normalize_signal(cut_pre, cut_post)

        shorten = 2  # for 5-lag convolution
        truncated_signal = avg_signal[shorten:-shorten]
        n = len(truncated_signal)
        assert len(noise_autocorr) >= n, "Noise autocorrelation vector is too short for signal size"
        pulse_model = np.vstack((truncated_signal, np.ones_like(truncated_signal)))
        if constraints is not None:
            pulse_model = np.vstack((pulse_model, constraints))
        assert pulse_model.shape[1] == n

        noise_corr = noise_autocorr[:n]
        TS = ToeplitzSolver(noise_corr, symmetric=True)
        Rinv_model = np.vstack([TS(r) for r in pulse_model])
        A = pulse_model.dot(Rinv_model.T)
        all_filters = np.linalg.solve(A, Rinv_model)
        filt_noconst = all_filters[0]

        band_limit(filt_noconst, self.sample_time_sec, fmax, f_3db)

        self._normalize_5lag_filter(filt_noconst, avg_signal)
        variance = bracketR(filt_noconst, noise_corr)

        # Set weights in the cut_pre and cut_post windows to 0
        if cut_pre > 0 or cut_post > 0:
            filt_noconst = np.hstack([np.zeros(cut_pre), filt_noconst, np.zeros(cut_post)])

        if variance <= 0:
            vdv = np.inf
        else:
            vdv = peak / (8 * np.log(2) * variance) ** 0.5
        return Filter5Lag(
            filt_noconst,
            peak,
            variance,
            vdv,
            self.n_pretrigger - 2,
            None,
            None,
            avg_signal,
            None,
            1 + 2 * shorten,
            fmax,
            f_3db,
            cut_pre,
            cut_post,
        )

    def compute_5lag(self, fmax: float | None = None, f_3db: float | None = None, cut_pre: int = 0, cut_post: int = 0) -> Filter:
        """Compute a single filter, with optional low-pass filtering, and with optional zero
        weights at the pre-trigger or post-trigger end of the filter.

        Either or both of `fmax` and `f_3db` are allowed.

        Parameters
        ----------
        fmax : Optional[float], optional
            The strict maximum frequency to be passed in all filters, by default None
        f_3db : Optional[float], optional
            The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None
        cut_pre : int
            The number of initial samples to be given zero weight, by default 0
        cut_post : int
            The number of samples at the end of a record to be given zero weight, by default 0

        Returns
        -------
        Filter
            A 5-lag optimal filter.

        Raises
        ------
        ValueError
            Under various conditions where arguments are inconsistent with the data
        """
        return self.compute_constrained_5lag(None, fmax=fmax, f_3db=f_3db, cut_pre=cut_pre, cut_post=cut_post)

    def compute_5lag_noexp(
        self, exp_time_seconds: float, fmax: float | None = None, f_3db: float | None = None, cut_pre: int = 0, cut_post: int = 0
    ) -> Filter:
        """Compute a single filter, with optional low-pass filtering, and with optional zero
        weights at the pre-trigger or post-trigger end of the filter.

        Either or both of `fmax` and `f_3db` are allowed.

        Parameters
        ----------
        exp_time_seconds: float
            Generate a filter orthogonal to decaying exponentials of this time constant (must be positive)
        fmax : Optional[float], optional
            The strict maximum frequency to be passed in all filters, by default None
        f_3db : Optional[float], optional
            The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None
        cut_pre : int
            The number of initial samples to be given zero weight, by default 0
        cut_post : int
            The number of samples at the end of a record to be given zero weight, by default 0

        Returns
        -------
        Filter
            A 5-lag optimal filter.

        Raises
        ------
        ValueError
            Under various conditions where arguments are inconsistent with the data
        """
        assert exp_time_seconds > 0
        n = len(self.signal_model) - 4 - (cut_pre + cut_post)
        log_per_sample = self.sample_time_sec / exp_time_seconds
        constraint = np.exp(-np.arange(n) * log_per_sample)
        return self.compute_constrained_5lag(constraint, fmax=fmax, f_3db=f_3db, cut_pre=cut_pre, cut_post=cut_post)

    def compute_fourier(self, fmax: float | None = None, f_3db: float | None = None, cut_pre: int = 0, cut_post: int = 0) -> Filter:
        """Compute a single Fourier-domain filter, with optional low-pass filtering, and with optional
        zero weights at the pre-trigger or post-trigger end of the filter. Fourier domain calculation
        implicitly assumes periodic boundary conditions.

        Either or both of `fmax` and `f_3db` are allowed.

        Parameters
        ----------
        fmax : Optional[float], optional
            The strict maximum frequency to be passed in all filters, by default None
        f_3db : Optional[float], optional
            The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None
        cut_pre : int
            The number of initial samples to be given zero weight, by default 0
        cut_post : int
            The number of samples at the end of a record to be given zero weight, by default 0

        Returns
        -------
        Filter
            A 5-lag optimal filter, computed in the Fourier domain.

        Raises
        ------
        ValueError
            Under various conditions where arguments are inconsistent with the data
        """
        # Make sure we have either a noise PSD or an autocorrelation or a whitener
        if self.noise_psd is None:
            raise ValueError("FilterMaker must have noise_psd to generate a Fourier filter")
        if cut_pre < 0 or cut_post < 0:
            raise ValueError(f"(cut_pre,cut_post)=({cut_pre},{cut_post}), but neither can be negative")

        avg_signal, peak, _ = self._normalize_signal(cut_pre, cut_post)
        noise_psd = np.asarray(self.noise_psd)

        # Terminology: the `avg_signal` vector will be "shortened" by `shorten` _on each end.
        # That's to permit 5-lag filtering (where we step the filter by ±2 lags either direction from 0 lag).
        # The `avg_signal` was already "reduced" in length by (cut_pre, cut_post), for a total
        # `reduction` of `2 * shorten + (cut_pre + cut_post)`.
        shorten = 2  # to use in 5-lag style
        reduction = 2 * shorten + (cut_pre + cut_post)

        truncated_avg_signal = avg_signal[shorten:-shorten]
        len_reduced_psd = len(noise_psd) - (reduction + 1) // 2
        window = 1.0
        sig_ft = np.fft.rfft(truncated_avg_signal * window)

        if len(sig_ft) != len_reduced_psd:
            raise ValueError(f"signal real DFT and noise PSD are not the same length ({len(sig_ft)} and {len_reduced_psd})")

        # Careful with PSD: "shorten" it by converting into a real space autocorrelation,
        # truncating the middle, and going back to Fourier space
        if reduction > 0:
            noise_autocorr = np.fft.irfft(noise_psd)
            noise_autocorr = np.hstack((noise_autocorr[: len_reduced_psd - 1], noise_autocorr[-len_reduced_psd:]))
            noise_psd = np.abs(np.fft.rfft(noise_autocorr))
        sig_ft_weighted = sig_ft / noise_psd

        # Band-limit
        if fmax is not None or f_3db is not None:
            f_nyquist = 0.5 / self.sample_time_sec
            freq = np.linspace(0, f_nyquist, len_reduced_psd, dtype=float)
            if fmax is not None:
                sig_ft_weighted[freq > fmax] = 0.0
            if f_3db is not None:
                sig_ft_weighted /= 1 + (freq * 1.0 / f_3db) ** 2

        sig_ft_weighted[0] = 0.0
        filt_fourier = np.fft.irfft(sig_ft_weighted) / window
        self._normalize_5lag_filter(filt_fourier, avg_signal)

        # How we compute the uncertainty depends on whether there's a noise autocorrelation result
        if self.noise_autocorr is None:
            noise_ft_squared = (len(noise_psd) - 1) / self.sample_time_sec * noise_psd
            kappa = (np.abs(sig_ft) ** 2 / noise_ft_squared)[1:].sum()
            variance_fourier = 1.0 / kappa
        else:
            ac = np.array(self.noise_autocorr)[: len(filt_fourier)]
            variance_fourier = bracketR(filt_fourier, ac)
        vdv = peak / (8 * np.log(2) * variance_fourier) ** 0.5
        return Filter5Lag(
            filt_fourier,
            peak,
            variance_fourier,
            vdv,
            self.n_pretrigger - 2,
            None,
            None,
            truncated_avg_signal,
            None,
            1 + 2 * shorten,
            fmax,
            f_3db,
            cut_pre,
            cut_post,
        )

    def compute_ats(self, fmax: float | None = None, f_3db: float | None = None, cut_pre: int = 0, cut_post: int = 0) -> Filter:  # noqa: PLR0914
        """Compute a single "arrival-time-safe" filter, with optional low-pass filtering,
        and with optional zero weights at the pre-trigger or post-trigger end of the filter.

        Either or both of `fmax` and `f_3db` are allowed.

        Parameters
        ----------
        fmax : Optional[float], optional
            The strict maximum frequency to be passed in all filters, by default None
        f_3db : Optional[float], optional
            The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None
        cut_pre : int
            The number of initial samples to be given zero weight, by default 0
        cut_post : int
            The number of samples at the end of a record to be given zero weight, by default 0

        Returns
        -------
        Filter
            An arrival-time-safe optimal filter.

        Raises
        ------
        ValueError
            Under various conditions where arguments are inconsistent with the data
        """
        if self.noise_autocorr is None and self.whitener is None:
            raise ValueError("FilterMaker must have noise_autocorr or whitener arguments to generate ATS filters")
        if self.dt_model is None:
            raise ValueError("FilterMaker must have dt_model to generate ATS filters")
        if self.sample_time_sec is None and not (fmax is None and f_3db is None):
            raise ValueError("FilterMaker must have a sample_time_sec if it's to be smoothed with fmax or f_3db")

        noise_autocorr = self._compute_autocorr(cut_pre, cut_post)
        avg_signal, peak, dt_model = self._normalize_signal(cut_pre, cut_post)

        ns = len(avg_signal)
        assert ns == len(dt_model)
        if cut_pre < 0 or cut_post < 0:
            raise ValueError(f"(cut_pre,cut_post)=({cut_pre},{cut_post}), but neither can be negative")
        if cut_pre + cut_post >= ns:
            raise ValueError(f"cut_pre+cut_post = {cut_pre + cut_post} but should be < {ns}")

        MT = np.vstack((avg_signal, dt_model, np.ones(ns)))

        if self.whitener is not None:
            WM = self.whitener(MT.T)
            A = np.dot(WM.T, WM)
            Ainv = np.linalg.inv(A)
            WtWM = self.whitener.applyWT(WM)
            filt = np.dot(Ainv, WtWM.T)

        else:
            assert len(noise_autocorr) >= ns
            noise_corr = noise_autocorr[:ns]
            TS = ToeplitzSolver(noise_corr, symmetric=True)

            RinvM = np.vstack([TS(r) for r in MT]).T
            A = np.dot(MT, RinvM)
            Ainv = np.linalg.inv(A)
            filt = np.dot(Ainv, RinvM.T)

        band_limit(filt.T, self.sample_time_sec, fmax, f_3db)

        if cut_pre > 0 or cut_post > 0:
            nfilt = filt.shape[0]
            filt = np.hstack([np.zeros((nfilt, cut_pre), dtype=float), filt, np.zeros((nfilt, cut_post), dtype=float)])

        filt_noconst = filt[0]
        filt_dt = filt[1]
        filt_baseline = filt[2]

        variance = bracketR(filt_noconst, noise_autocorr)
        vdv = peak / (np.log(2) * 8 * variance) ** 0.5
        return FilterATS(
            filt_noconst,
            peak,
            variance,
            vdv,
            self.n_pretrigger,
            filt_dt,
            filt_baseline,
            avg_signal,
            dt_model,
            1,
            fmax,
            f_3db,
            cut_pre,
            cut_post,
        )

    def _compute_autocorr(self, cut_pre: int = 0, cut_post: int = 0) -> np.ndarray:
        """Return the noise autocorrelation, if any, cut down by the requested number of values at the start and end.

        Parameters
        ----------
        cut_pre : int, optional
            How many samples to remove from the start of the each pulse record, by default 0
        cut_post : int, optional
            How many samples to remove from the end of the each pulse record, by default 0

        Returns
        -------
        np.ndarray
            The noise autocorrelation of the appropriate length. Or a length-0 array if not known.
        """
        # If there's an autocorrelation, cut it down to length.
        if self.noise_autocorr is None:
            return np.array([], dtype=float)
        N = len(np.asarray(self.signal_model))
        return np.asarray(self.noise_autocorr)[: N - (cut_pre + cut_post)]

    def _normalize_signal(self, cut_pre: int = 0, cut_post: int = 0) -> tuple[np.ndarray, float, np.ndarray]:
        """Compute the normalized signal, peak value, and first-order arrival-time model.

        Parameters
        ----------
        cut_pre : int, optional
            How many samples to remove from the start of the each pulse record, by default 0
        cut_post : int, optional
            How many samples to remove from the end of the each pulse record, by default 0

        Returns
        -------
        tuple[np.ndarray, float, np.ndarray]
            (sig, pk, dsig), where `sig` is the nominal signal model (normalized to have unit amplitude), `pk` is the
            peak values of the nominal signal, and `dsig` is the delta between `sig` that differ by one sample in
            arrival time. The `dsig` will be an empty array if no arrival-time model is known.

        Raises
        ------
        ValueError
            If negative numbers of samples are to be cut, or the entire record is to be cut.
        """
        avg_signal = np.array(self.signal_model)
        ns = len(avg_signal)
        pre_avg = avg_signal[cut_pre : self.n_pretrigger - 1].mean()

        if cut_pre < 0 or cut_post < 0:
            raise ValueError(f"(cut_pre,cut_post)=({cut_pre},{cut_post}), but neither can be negative")
        if cut_pre + cut_post >= ns:
            raise ValueError(f"cut_pre+cut_post = {cut_pre + cut_post} but should be < {ns}")

        # Unless passed in, find the signal's peak value. This is normally peak=(max-pretrigger).
        # If signal is negative-going, however, then peak=(pretrigger-min).
        if self.peak > 0.0:
            peak_signal = self.peak
        else:
            a = avg_signal[cut_pre : ns - cut_post].min()
            b = avg_signal[cut_pre : ns - cut_post].max()
            is_negative = pre_avg - a > b - pre_avg
            if is_negative:
                peak_signal = a - pre_avg
            else:
                peak_signal = b - pre_avg

        # avg_signal: normalize to have unit peak
        avg_signal -= pre_avg

        rescale = 1 / np.max(avg_signal)
        avg_signal *= rescale
        avg_signal[: self.n_pretrigger] = 0.0
        avg_signal = avg_signal[cut_pre : ns - cut_post]
        if self.dt_model is None:
            dt_model = np.array([], dtype=float)
        else:
            dt_model = self.dt_model * rescale
            dt_model = dt_model[cut_pre : ns - cut_post]
        return avg_signal, peak_signal, dt_model

    @staticmethod
    def _normalize_5lag_filter(f: np.ndarray, avg_signal: np.ndarray) -> None:
        """Rescale 5-lag filter `f` in-place so that it gives unit response to avg_signal

        Parameters
        ----------
        f : np.ndarray
            Optimal filter values, which need to be renormalized
        avg_signal : np.ndarray
            The signal to which filter `f` should give unit response
        """
        assert len(f) <= len(avg_signal) - 4
        conv = np.zeros(5, dtype=float)
        for i in range(5):
            conv[i] = np.dot(f, avg_signal[i : i + len(f)])
        x = np.linspace(-2, 2, 5)
        fit = np.polyfit(x, conv, 2)
        fit_ctr = -0.5 * fit[1] / fit[0]
        fit_peak = np.polyval(fit, fit_ctr)
        f *= 1.0 / fit_peak

    @staticmethod
    def _normalize_filter(f: np.ndarray, avg_signal: np.ndarray) -> None:
        """Rescale single-lag filter `f` in-place so that it gives unit response to avg_signal

        Parameters
        ----------
        f : np.ndarray
            Optimal filter values, which need to be renormalized
        avg_signal : np.ndarray
            The signal to which filter `f` should give unit response
        """
        assert len(f) == len(avg_signal)
        f *= 1 / np.dot(f, avg_signal)

compute_1lag(fmax=None, f_3db=None, cut_pre=0, cut_post=0)

Compute a single filter, with optional low-pass filtering, and with optional zero weights at the pre-trigger or post-trigger end of the filter. Can be used with 0-lag "convolution" only, so cannot estimate arrival time.

Either or both of fmax and f_3db are allowed.

Parameters:
  • fmax (Optional[float], default: None ) –

    The strict maximum frequency to be passed in all filters, by default None

  • f_3db (Optional[float], default: None ) –

    The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None

  • cut_pre (int, default: 0 ) –

    The number of initial samples to be given zero weight, by default 0

  • cut_post (int, default: 0 ) –

    The number of samples at the end of a record to be given zero weight, by default 0

Returns:
  • Filter –

    A 1-lag optimal filter.

Raises:
  • ValueError –

    Under various conditions where arguments are inconsistent with the data

Source code in mass2/core/optimal_filtering.py
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
def compute_1lag(self, fmax: float | None = None, f_3db: float | None = None, cut_pre: int = 0, cut_post: int = 0) -> Filter:
    """Compute a single filter, with optional low-pass filtering, and with optional zero
    weights at the pre-trigger or post-trigger end of the filter. Can be used with 0-lag "convolution" only,
    so cannot estimate arrival time.

    Either or both of `fmax` and `f_3db` are allowed.

    Parameters
    ----------
    fmax : Optional[float], optional
        The strict maximum frequency to be passed in all filters, by default None
    f_3db : Optional[float], optional
        The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None
    cut_pre : int
        The number of initial samples to be given zero weight, by default 0
    cut_post : int
        The number of samples at the end of a record to be given zero weight, by default 0

    Returns
    -------
    Filter
        A 1-lag optimal filter.

    Raises
    ------
    ValueError
        Under various conditions where arguments are inconsistent with the data
    """
    return self.compute_constrained_1lag(None, fmax=fmax, f_3db=f_3db, cut_pre=cut_pre, cut_post=cut_post)

compute_5lag(fmax=None, f_3db=None, cut_pre=0, cut_post=0)

Compute a single filter, with optional low-pass filtering, and with optional zero weights at the pre-trigger or post-trigger end of the filter.

Either or both of fmax and f_3db are allowed.

Parameters:
  • fmax (Optional[float], default: None ) –

    The strict maximum frequency to be passed in all filters, by default None

  • f_3db (Optional[float], default: None ) –

    The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None

  • cut_pre (int, default: 0 ) –

    The number of initial samples to be given zero weight, by default 0

  • cut_post (int, default: 0 ) –

    The number of samples at the end of a record to be given zero weight, by default 0

Returns:
  • Filter –

    A 5-lag optimal filter.

Raises:
  • ValueError –

    Under various conditions where arguments are inconsistent with the data

Source code in mass2/core/optimal_filtering.py
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
def compute_5lag(self, fmax: float | None = None, f_3db: float | None = None, cut_pre: int = 0, cut_post: int = 0) -> Filter:
    """Compute a single filter, with optional low-pass filtering, and with optional zero
    weights at the pre-trigger or post-trigger end of the filter.

    Either or both of `fmax` and `f_3db` are allowed.

    Parameters
    ----------
    fmax : Optional[float], optional
        The strict maximum frequency to be passed in all filters, by default None
    f_3db : Optional[float], optional
        The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None
    cut_pre : int
        The number of initial samples to be given zero weight, by default 0
    cut_post : int
        The number of samples at the end of a record to be given zero weight, by default 0

    Returns
    -------
    Filter
        A 5-lag optimal filter.

    Raises
    ------
    ValueError
        Under various conditions where arguments are inconsistent with the data
    """
    return self.compute_constrained_5lag(None, fmax=fmax, f_3db=f_3db, cut_pre=cut_pre, cut_post=cut_post)

compute_5lag_noexp(exp_time_seconds, fmax=None, f_3db=None, cut_pre=0, cut_post=0)

Compute a single filter, with optional low-pass filtering, and with optional zero weights at the pre-trigger or post-trigger end of the filter.

Either or both of fmax and f_3db are allowed.

Parameters:
  • exp_time_seconds (float) –

    Generate a filter orthogonal to decaying exponentials of this time constant (must be positive)

  • fmax (Optional[float], default: None ) –

    The strict maximum frequency to be passed in all filters, by default None

  • f_3db (Optional[float], default: None ) –

    The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None

  • cut_pre (int, default: 0 ) –

    The number of initial samples to be given zero weight, by default 0

  • cut_post (int, default: 0 ) –

    The number of samples at the end of a record to be given zero weight, by default 0

Returns:
  • Filter –

    A 5-lag optimal filter.

Raises:
  • ValueError –

    Under various conditions where arguments are inconsistent with the data

Source code in mass2/core/optimal_filtering.py
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
def compute_5lag_noexp(
    self, exp_time_seconds: float, fmax: float | None = None, f_3db: float | None = None, cut_pre: int = 0, cut_post: int = 0
) -> Filter:
    """Compute a single filter, with optional low-pass filtering, and with optional zero
    weights at the pre-trigger or post-trigger end of the filter.

    Either or both of `fmax` and `f_3db` are allowed.

    Parameters
    ----------
    exp_time_seconds: float
        Generate a filter orthogonal to decaying exponentials of this time constant (must be positive)
    fmax : Optional[float], optional
        The strict maximum frequency to be passed in all filters, by default None
    f_3db : Optional[float], optional
        The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None
    cut_pre : int
        The number of initial samples to be given zero weight, by default 0
    cut_post : int
        The number of samples at the end of a record to be given zero weight, by default 0

    Returns
    -------
    Filter
        A 5-lag optimal filter.

    Raises
    ------
    ValueError
        Under various conditions where arguments are inconsistent with the data
    """
    assert exp_time_seconds > 0
    n = len(self.signal_model) - 4 - (cut_pre + cut_post)
    log_per_sample = self.sample_time_sec / exp_time_seconds
    constraint = np.exp(-np.arange(n) * log_per_sample)
    return self.compute_constrained_5lag(constraint, fmax=fmax, f_3db=f_3db, cut_pre=cut_pre, cut_post=cut_post)

compute_ats(fmax=None, f_3db=None, cut_pre=0, cut_post=0)

Compute a single "arrival-time-safe" filter, with optional low-pass filtering, and with optional zero weights at the pre-trigger or post-trigger end of the filter.

Either or both of fmax and f_3db are allowed.

Parameters:
  • fmax (Optional[float], default: None ) –

    The strict maximum frequency to be passed in all filters, by default None

  • f_3db (Optional[float], default: None ) –

    The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None

  • cut_pre (int, default: 0 ) –

    The number of initial samples to be given zero weight, by default 0

  • cut_post (int, default: 0 ) –

    The number of samples at the end of a record to be given zero weight, by default 0

Returns:
  • Filter –

    An arrival-time-safe optimal filter.

Raises:
  • ValueError –

    Under various conditions where arguments are inconsistent with the data

Source code in mass2/core/optimal_filtering.py
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
def compute_ats(self, fmax: float | None = None, f_3db: float | None = None, cut_pre: int = 0, cut_post: int = 0) -> Filter:  # noqa: PLR0914
    """Compute a single "arrival-time-safe" filter, with optional low-pass filtering,
    and with optional zero weights at the pre-trigger or post-trigger end of the filter.

    Either or both of `fmax` and `f_3db` are allowed.

    Parameters
    ----------
    fmax : Optional[float], optional
        The strict maximum frequency to be passed in all filters, by default None
    f_3db : Optional[float], optional
        The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None
    cut_pre : int
        The number of initial samples to be given zero weight, by default 0
    cut_post : int
        The number of samples at the end of a record to be given zero weight, by default 0

    Returns
    -------
    Filter
        An arrival-time-safe optimal filter.

    Raises
    ------
    ValueError
        Under various conditions where arguments are inconsistent with the data
    """
    if self.noise_autocorr is None and self.whitener is None:
        raise ValueError("FilterMaker must have noise_autocorr or whitener arguments to generate ATS filters")
    if self.dt_model is None:
        raise ValueError("FilterMaker must have dt_model to generate ATS filters")
    if self.sample_time_sec is None and not (fmax is None and f_3db is None):
        raise ValueError("FilterMaker must have a sample_time_sec if it's to be smoothed with fmax or f_3db")

    noise_autocorr = self._compute_autocorr(cut_pre, cut_post)
    avg_signal, peak, dt_model = self._normalize_signal(cut_pre, cut_post)

    ns = len(avg_signal)
    assert ns == len(dt_model)
    if cut_pre < 0 or cut_post < 0:
        raise ValueError(f"(cut_pre,cut_post)=({cut_pre},{cut_post}), but neither can be negative")
    if cut_pre + cut_post >= ns:
        raise ValueError(f"cut_pre+cut_post = {cut_pre + cut_post} but should be < {ns}")

    MT = np.vstack((avg_signal, dt_model, np.ones(ns)))

    if self.whitener is not None:
        WM = self.whitener(MT.T)
        A = np.dot(WM.T, WM)
        Ainv = np.linalg.inv(A)
        WtWM = self.whitener.applyWT(WM)
        filt = np.dot(Ainv, WtWM.T)

    else:
        assert len(noise_autocorr) >= ns
        noise_corr = noise_autocorr[:ns]
        TS = ToeplitzSolver(noise_corr, symmetric=True)

        RinvM = np.vstack([TS(r) for r in MT]).T
        A = np.dot(MT, RinvM)
        Ainv = np.linalg.inv(A)
        filt = np.dot(Ainv, RinvM.T)

    band_limit(filt.T, self.sample_time_sec, fmax, f_3db)

    if cut_pre > 0 or cut_post > 0:
        nfilt = filt.shape[0]
        filt = np.hstack([np.zeros((nfilt, cut_pre), dtype=float), filt, np.zeros((nfilt, cut_post), dtype=float)])

    filt_noconst = filt[0]
    filt_dt = filt[1]
    filt_baseline = filt[2]

    variance = bracketR(filt_noconst, noise_autocorr)
    vdv = peak / (np.log(2) * 8 * variance) ** 0.5
    return FilterATS(
        filt_noconst,
        peak,
        variance,
        vdv,
        self.n_pretrigger,
        filt_dt,
        filt_baseline,
        avg_signal,
        dt_model,
        1,
        fmax,
        f_3db,
        cut_pre,
        cut_post,
    )

compute_constrained_1lag(constraints=None, fmax=None, f_3db=None, cut_pre=0, cut_post=0)

Compute a single constrained optimal filter, with optional low-pass filtering, and with optional zero weights at the pre-trigger or post-trigger end of the filter. Can be used with 0-lag "convolution" only, so cannot estimate arrival time.

Either or both of fmax and f_3db are allowed.

Parameters:
  • constraints (ArrayLike | None, default: None ) –

    The vector or vectors to which the filter should be orthogonal. If a 2d array, each row is a constraint, and the number of columns should be equal to the len(self.signal_model) minus (cut_pre+cut_post).

  • fmax (Optional[float], default: None ) –

    The strict maximum frequency to be passed in all filters, by default None

  • f_3db (Optional[float], default: None ) –

    The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None

  • cut_pre (int, default: 0 ) –

    The number of initial samples to be given zero weight, by default 0

  • cut_post (int, default: 0 ) –

    The number of samples at the end of a record to be given zero weight, by default 0

Returns:
  • Filter –

    A 1-lag optimal filter.

Raises:
  • ValueError –

    Under various conditions where arguments are inconsistent with the data

Source code in mass2/core/optimal_filtering.py
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
def compute_constrained_1lag(
    self,
    constraints: ArrayLike | None = None,
    fmax: float | None = None,
    f_3db: float | None = None,
    cut_pre: int = 0,
    cut_post: int = 0,
) -> Filter:
    """Compute a single constrained optimal filter, with optional low-pass filtering, and with optional zero
    weights at the pre-trigger or post-trigger end of the filter. Can be used with 0-lag "convolution" only,
    so cannot estimate arrival time.

    Either or both of `fmax` and `f_3db` are allowed.

    Parameters
    ----------
    constraints: ndarray, optional
        The vector or vectors to which the filter should be orthogonal. If a 2d array, each _row_
        is a constraint, and the number of columns should be equal to the len(self.signal_model)
        minus `(cut_pre+cut_post)`.
    fmax : Optional[float], optional
        The strict maximum frequency to be passed in all filters, by default None
    f_3db : Optional[float], optional
        The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None
    cut_pre : int
        The number of initial samples to be given zero weight, by default 0
    cut_post : int
        The number of samples at the end of a record to be given zero weight, by default 0

    Returns
    -------
    Filter
        A 1-lag optimal filter.

    Raises
    ------
    ValueError
        Under various conditions where arguments are inconsistent with the data
    """

    if self.sample_time_sec <= 0 and not (fmax is None and f_3db is None):
        raise ValueError("FilterMaker must have a sample_time_sec if it's to be smoothed with fmax or f_3db")
    if cut_pre < 0 or cut_post < 0:
        raise ValueError(f"(cut_pre,cut_post)=({cut_pre},{cut_post}), but neither can be negative")

    if self.noise_autocorr is None and self.whitener is None:
        raise ValueError("FilterMaker must have noise_autocorr or whitener arguments to generate 1-lag filters")
    noise_autocorr = self._compute_autocorr(cut_pre, cut_post)
    avg_signal, peak, _ = self._normalize_signal(cut_pre, cut_post)

    n = len(avg_signal)
    assert len(noise_autocorr) >= n, "Noise autocorrelation vector is too short for signal size"
    pulse_model = np.vstack((avg_signal, np.ones_like(avg_signal)))
    if constraints is not None:
        pulse_model = np.vstack((pulse_model, constraints))
    assert pulse_model.shape[1] == n

    noise_corr = noise_autocorr[:n]
    TS = ToeplitzSolver(noise_corr, symmetric=True)
    Rinv_model = np.vstack([TS(r) for r in pulse_model])
    A = pulse_model.dot(Rinv_model.T)
    all_filters = np.linalg.solve(A, Rinv_model)
    filt_noconst = all_filters[0]

    band_limit(filt_noconst, self.sample_time_sec, fmax, f_3db)

    self._normalize_filter(filt_noconst, avg_signal)
    variance = bracketR(filt_noconst, noise_corr)

    # Set weights in the cut_pre and cut_post windows to 0
    if cut_pre > 0 or cut_post > 0:
        filt_noconst = np.hstack([np.zeros(cut_pre), filt_noconst, np.zeros(cut_post)])

    if variance <= 0:
        vdv = np.inf
    else:
        vdv = peak / (8 * np.log(2) * variance) ** 0.5
    return Filter1Lag(
        filt_noconst,
        peak,
        variance,
        vdv,
        self.n_pretrigger,
        None,
        None,
        avg_signal,
        None,
        1,
        fmax,
        f_3db,
        cut_pre,
        cut_post,
    )

compute_constrained_5lag(constraints=None, fmax=None, f_3db=None, cut_pre=0, cut_post=0)

Compute a single constrained optimal filter, with optional low-pass filtering, and with optional zero weights at the pre-trigger or post-trigger end of the filter.

Either or both of fmax and f_3db are allowed.

Parameters:
  • constraints (ArrayLike | None, default: None ) –

    The vector or vectors to which the filter should be orthogonal. If a 2d array, each row is a constraint, and the number of columns should be equal to the len(self.signal_model) minus (cut_pre+cut_post).

  • fmax (Optional[float], default: None ) –

    The strict maximum frequency to be passed in all filters, by default None

  • f_3db (Optional[float], default: None ) –

    The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None

  • cut_pre (int, default: 0 ) –

    The number of initial samples to be given zero weight, by default 0

  • cut_post (int, default: 0 ) –

    The number of samples at the end of a record to be given zero weight, by default 0

Returns:
  • Filter –

    A 5-lag optimal filter.

Raises:
  • ValueError –

    Under various conditions where arguments are inconsistent with the data

Source code in mass2/core/optimal_filtering.py
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
def compute_constrained_5lag(
    self,
    constraints: ArrayLike | None = None,
    fmax: float | None = None,
    f_3db: float | None = None,
    cut_pre: int = 0,
    cut_post: int = 0,
) -> Filter:
    """Compute a single constrained optimal filter, with optional low-pass filtering, and with optional zero
    weights at the pre-trigger or post-trigger end of the filter.

    Either or both of `fmax` and `f_3db` are allowed.

    Parameters
    ----------
    constraints: ndarray, optional
        The vector or vectors to which the filter should be orthogonal. If a 2d array, each _row_
        is a constraint, and the number of columns should be equal to the len(self.signal_model)
        minus `(cut_pre+cut_post)`.
    fmax : Optional[float], optional
        The strict maximum frequency to be passed in all filters, by default None
    f_3db : Optional[float], optional
        The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None
    cut_pre : int
        The number of initial samples to be given zero weight, by default 0
    cut_post : int
        The number of samples at the end of a record to be given zero weight, by default 0

    Returns
    -------
    Filter
        A 5-lag optimal filter.

    Raises
    ------
    ValueError
        Under various conditions where arguments are inconsistent with the data
    """

    if self.sample_time_sec <= 0 and not (fmax is None and f_3db is None):
        raise ValueError("FilterMaker must have a sample_time_sec if it's to be smoothed with fmax or f_3db")
    if cut_pre < 0 or cut_post < 0:
        raise ValueError(f"(cut_pre,cut_post)=({cut_pre},{cut_post}), but neither can be negative")

    if self.noise_autocorr is None and self.whitener is None:
        raise ValueError("FilterMaker must have noise_autocorr or whitener arguments to generate 5-lag filters")
    noise_autocorr = self._compute_autocorr(cut_pre, cut_post)
    avg_signal, peak, _ = self._normalize_signal(cut_pre, cut_post)

    shorten = 2  # for 5-lag convolution
    truncated_signal = avg_signal[shorten:-shorten]
    n = len(truncated_signal)
    assert len(noise_autocorr) >= n, "Noise autocorrelation vector is too short for signal size"
    pulse_model = np.vstack((truncated_signal, np.ones_like(truncated_signal)))
    if constraints is not None:
        pulse_model = np.vstack((pulse_model, constraints))
    assert pulse_model.shape[1] == n

    noise_corr = noise_autocorr[:n]
    TS = ToeplitzSolver(noise_corr, symmetric=True)
    Rinv_model = np.vstack([TS(r) for r in pulse_model])
    A = pulse_model.dot(Rinv_model.T)
    all_filters = np.linalg.solve(A, Rinv_model)
    filt_noconst = all_filters[0]

    band_limit(filt_noconst, self.sample_time_sec, fmax, f_3db)

    self._normalize_5lag_filter(filt_noconst, avg_signal)
    variance = bracketR(filt_noconst, noise_corr)

    # Set weights in the cut_pre and cut_post windows to 0
    if cut_pre > 0 or cut_post > 0:
        filt_noconst = np.hstack([np.zeros(cut_pre), filt_noconst, np.zeros(cut_post)])

    if variance <= 0:
        vdv = np.inf
    else:
        vdv = peak / (8 * np.log(2) * variance) ** 0.5
    return Filter5Lag(
        filt_noconst,
        peak,
        variance,
        vdv,
        self.n_pretrigger - 2,
        None,
        None,
        avg_signal,
        None,
        1 + 2 * shorten,
        fmax,
        f_3db,
        cut_pre,
        cut_post,
    )

compute_fourier(fmax=None, f_3db=None, cut_pre=0, cut_post=0)

Compute a single Fourier-domain filter, with optional low-pass filtering, and with optional zero weights at the pre-trigger or post-trigger end of the filter. Fourier domain calculation implicitly assumes periodic boundary conditions.

Either or both of fmax and f_3db are allowed.

Parameters:
  • fmax (Optional[float], default: None ) –

    The strict maximum frequency to be passed in all filters, by default None

  • f_3db (Optional[float], default: None ) –

    The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None

  • cut_pre (int, default: 0 ) –

    The number of initial samples to be given zero weight, by default 0

  • cut_post (int, default: 0 ) –

    The number of samples at the end of a record to be given zero weight, by default 0

Returns:
  • Filter –

    A 5-lag optimal filter, computed in the Fourier domain.

Raises:
  • ValueError –

    Under various conditions where arguments are inconsistent with the data

Source code in mass2/core/optimal_filtering.py
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
def compute_fourier(self, fmax: float | None = None, f_3db: float | None = None, cut_pre: int = 0, cut_post: int = 0) -> Filter:
    """Compute a single Fourier-domain filter, with optional low-pass filtering, and with optional
    zero weights at the pre-trigger or post-trigger end of the filter. Fourier domain calculation
    implicitly assumes periodic boundary conditions.

    Either or both of `fmax` and `f_3db` are allowed.

    Parameters
    ----------
    fmax : Optional[float], optional
        The strict maximum frequency to be passed in all filters, by default None
    f_3db : Optional[float], optional
        The 3 dB point for a one-pole low-pass filter to be applied to all filters, by default None
    cut_pre : int
        The number of initial samples to be given zero weight, by default 0
    cut_post : int
        The number of samples at the end of a record to be given zero weight, by default 0

    Returns
    -------
    Filter
        A 5-lag optimal filter, computed in the Fourier domain.

    Raises
    ------
    ValueError
        Under various conditions where arguments are inconsistent with the data
    """
    # Make sure we have either a noise PSD or an autocorrelation or a whitener
    if self.noise_psd is None:
        raise ValueError("FilterMaker must have noise_psd to generate a Fourier filter")
    if cut_pre < 0 or cut_post < 0:
        raise ValueError(f"(cut_pre,cut_post)=({cut_pre},{cut_post}), but neither can be negative")

    avg_signal, peak, _ = self._normalize_signal(cut_pre, cut_post)
    noise_psd = np.asarray(self.noise_psd)

    # Terminology: the `avg_signal` vector will be "shortened" by `shorten` _on each end.
    # That's to permit 5-lag filtering (where we step the filter by ±2 lags either direction from 0 lag).
    # The `avg_signal` was already "reduced" in length by (cut_pre, cut_post), for a total
    # `reduction` of `2 * shorten + (cut_pre + cut_post)`.
    shorten = 2  # to use in 5-lag style
    reduction = 2 * shorten + (cut_pre + cut_post)

    truncated_avg_signal = avg_signal[shorten:-shorten]
    len_reduced_psd = len(noise_psd) - (reduction + 1) // 2
    window = 1.0
    sig_ft = np.fft.rfft(truncated_avg_signal * window)

    if len(sig_ft) != len_reduced_psd:
        raise ValueError(f"signal real DFT and noise PSD are not the same length ({len(sig_ft)} and {len_reduced_psd})")

    # Careful with PSD: "shorten" it by converting into a real space autocorrelation,
    # truncating the middle, and going back to Fourier space
    if reduction > 0:
        noise_autocorr = np.fft.irfft(noise_psd)
        noise_autocorr = np.hstack((noise_autocorr[: len_reduced_psd - 1], noise_autocorr[-len_reduced_psd:]))
        noise_psd = np.abs(np.fft.rfft(noise_autocorr))
    sig_ft_weighted = sig_ft / noise_psd

    # Band-limit
    if fmax is not None or f_3db is not None:
        f_nyquist = 0.5 / self.sample_time_sec
        freq = np.linspace(0, f_nyquist, len_reduced_psd, dtype=float)
        if fmax is not None:
            sig_ft_weighted[freq > fmax] = 0.0
        if f_3db is not None:
            sig_ft_weighted /= 1 + (freq * 1.0 / f_3db) ** 2

    sig_ft_weighted[0] = 0.0
    filt_fourier = np.fft.irfft(sig_ft_weighted) / window
    self._normalize_5lag_filter(filt_fourier, avg_signal)

    # How we compute the uncertainty depends on whether there's a noise autocorrelation result
    if self.noise_autocorr is None:
        noise_ft_squared = (len(noise_psd) - 1) / self.sample_time_sec * noise_psd
        kappa = (np.abs(sig_ft) ** 2 / noise_ft_squared)[1:].sum()
        variance_fourier = 1.0 / kappa
    else:
        ac = np.array(self.noise_autocorr)[: len(filt_fourier)]
        variance_fourier = bracketR(filt_fourier, ac)
    vdv = peak / (8 * np.log(2) * variance_fourier) ** 0.5
    return Filter5Lag(
        filt_fourier,
        peak,
        variance_fourier,
        vdv,
        self.n_pretrigger - 2,
        None,
        None,
        truncated_avg_signal,
        None,
        1 + 2 * shorten,
        fmax,
        f_3db,
        cut_pre,
        cut_post,
    )

ToeplitzWhitener dataclass

An object that can perform approximate noise whitening.

For an ARMA(p,q) noise model, mutliply by (or solve) the matrix W (or its transpose), where W is the Toeplitz approximation to the whitening matrix V. A whitening matrix V means that if R is the ARMA noise covariance matrix, then VRV' = I. While W only approximately satisfies this, it has some handy properties that make it a useful replacement. (In particular, it has the time-transpose property that if you zero-pad the beginning of vector v and shift the remaining elements, then the same is done to Wv.)

The callable function object returns Wv or WM if called with vector v or matrix M. Other methods:

  • tw.whiten(v) returns Wv; it is equivalent to tw(v)
  • tw.solveWT(v) returns inv(W')*v
  • tw.applyWT(v) returns W'v
  • tw.solveW(v) returns inv(W)*v
Arguments

theta : np.ndarray The moving-average (MA) process coefficients phi : np.ndarray The autoregressive (AR) process coefficients

Returns:
  • ToeplitzWhitener –

    Object that can perform approximate, time-invariant noise whitening.

Raises:
  • ValueError –

    If the operative methods are passed an array of dimension higher than 2.

Source code in mass2/core/optimal_filtering.py
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
@dataclass(frozen=True)
class ToeplitzWhitener:
    """An object that can perform approximate noise whitening.

    For an ARMA(p,q) noise model, mutliply by (or solve) the matrix W (or its
    transpose), where W is the Toeplitz approximation to the whitening matrix V.
    A whitening matrix V means that if R is the ARMA noise covariance matrix,
    then VRV' = I. While W only approximately satisfies this, it has some handy
    properties that make it a useful replacement. (In particular, it has the
    time-transpose property that if you zero-pad the beginning of vector v and
    shift the remaining elements, then the same is done to Wv.)

    The callable function object returns Wv or WM if called with
    vector v or matrix M. Other methods:

    * `tw.whiten(v)` returns Wv; it is equivalent to `tw(v)`
    * `tw.solveWT(v)` returns inv(W')*v
    * `tw.applyWT(v)` returns W'v
    * `tw.solveW(v)` returns inv(W)*v

    Arguments
    ---------
    theta : np.ndarray
        The moving-average (MA) process coefficients
    phi : np.ndarray
        The autoregressive (AR) process coefficients

    Returns
    -------
    ToeplitzWhitener
        Object that can perform approximate, time-invariant noise whitening.

    Raises
    ------
    ValueError
        If the operative methods are passed an array of dimension higher than 2.
    """

    theta: np.ndarray
    phi: np.ndarray

    @property
    def p(self) -> int:
        """Return the autoregressive order"""
        return len(self.phi) - 1

    @property
    def q(self) -> int:
        """Return the moving-average order"""
        return len(self.theta) - 1

    def whiten(self, v: ArrayLike) -> np.ndarray:
        "Return whitened vector (or matrix of column vectors) Wv"
        return self(v)

    def __call__(self, v: ArrayLike) -> np.ndarray:
        "Return whitened vector (or matrix of column vectors) Wv"
        v = np.asarray(v)
        if v.ndim > 3:
            raise ValueError("v must be an array of dimension 1 or 2")
        elif v.ndim == 2:
            w = np.zeros_like(v)
            for i in range(v.shape[1]):
                w[:, i] = self(v[:, i])
            return w

        # Multiply by the Toeplitz AR matrix to make the MA*w vector.
        N = len(v)
        y = self.phi[0] * v
        for i in range(1, 1 + self.p):
            y[i:] += self.phi[i] * v[:-i]

        # Second, solve the MA matrix (a banded, lower-triangular Toeplitz matrix with
        # q non-zero subdiagonals.)
        y[0] /= self.theta[0]
        if N == 1:
            return y
        for i in range(1, min(self.q, N)):
            for j in range(i):
                y[i] -= y[j] * self.theta[i - j]
            y[i] /= self.theta[0]
        for i in range(self.q, N):
            for j in range(i - self.q, i):
                y[i] -= y[j] * self.theta[i - j]
            y[i] /= self.theta[0]
        return y

    def solveW(self, v: ArrayLike) -> np.ndarray:
        "Return unwhitened vector (or matrix of column vectors) inv(W)*v"
        v = np.asarray(v)
        if v.ndim > 3:
            raise ValueError("v must be dimension 1 or 2")
        elif v.ndim == 2:
            r = np.zeros_like(v)
            for i in range(v.shape[1]):
                r[:, i] = self.solveW(v[:, i])
            return r

        # Multiply by the Toeplitz MA matrix to make the AR*w vector.
        N = len(v)
        y = self.theta[0] * v
        for i in range(1, 1 + self.q):
            y[i:] += self.theta[i] * v[:-i]

        # Second, solve the AR matrix (a banded, lower-triangular Toeplitz matrix with
        # p non-zero subdiagonals.)
        y[0] /= self.phi[0]
        if N == 1:
            return y
        for i in range(1, min(self.p, N)):
            for j in range(i):
                y[i] -= y[j] * self.phi[i - j]
            y[i] /= self.phi[0]
        for i in range(self.p, N):
            for j in range(i - self.p, i):
                y[i] -= y[j] * self.phi[i - j]
            y[i] /= self.phi[0]
        return y

    def solveWT(self, v: ArrayLike) -> np.ndarray:
        "Return vector (or matrix of column vectors) inv(W')*v"
        v = np.asarray(v)
        if v.ndim > 3:
            raise ValueError("v must be dimension 1 or 2")
        elif v.ndim == 2:
            r = np.zeros_like(v)
            for i in range(v.shape[1]):
                r[:, i] = self.solveWT(v[:, i])
            return r

        N = len(v)
        y = np.array(v)
        y[N - 1] /= self.phi[0]
        for i in range(N - 2, -1, -1):
            f = min(self.p + 1, N - i)
            y[i] -= np.dot(y[i + 1 : i + f], self.phi[1:f])
            y[i] /= self.phi[0]
        return np.correlate(y, self.theta, "full")[self.q :]

    def applyWT(self, v: ArrayLike) -> np.ndarray:
        """Return vector (or matrix of column vectors) W'v"""
        v = np.asarray(v)
        if v.ndim > 3:
            raise ValueError("v must be dimension 1 or 2")
        elif v.ndim == 2:
            r = np.zeros_like(v)
            for i in range(v.shape[1]):
                r[:, i] = self.applyWT(v[:, i])
            return r

        N = len(v)
        y = np.array(v)
        y[N - 1] /= self.theta[0]
        for i in range(N - 2, -1, -1):
            f = min(self.q + 1, N - i)
            y[i] -= np.dot(y[i + 1 : i + f], self.theta[1:f])
            y[i] /= self.theta[0]
        return np.correlate(y, self.phi, "full")[self.p :]

    def W(self, N: int) -> np.ndarray:
        """Return the full, approximate whitening matrix.

        Normally the full W is large and slow to use. But it's here so you can
        easily test that W(len(v))*v == whiten(v), and similar.
        """
        AR = np.zeros((N, N), dtype=float)
        MA = np.zeros((N, N), dtype=float)
        for i in range(N):
            for j in range(max(0, i - self.p), i + 1):
                AR[i, j] = self.phi[i - j]
            for j in range(max(0, i - self.q), i + 1):
                MA[i, j] = self.theta[i - j]
        return np.linalg.solve(MA, AR)

p property

Return the autoregressive order

q property

Return the moving-average order

W(N)

Return the full, approximate whitening matrix.

Normally the full W is large and slow to use. But it's here so you can easily test that W(len(v))*v == whiten(v), and similar.

Source code in mass2/core/optimal_filtering.py
176
177
178
179
180
181
182
183
184
185
186
187
188
189
def W(self, N: int) -> np.ndarray:
    """Return the full, approximate whitening matrix.

    Normally the full W is large and slow to use. But it's here so you can
    easily test that W(len(v))*v == whiten(v), and similar.
    """
    AR = np.zeros((N, N), dtype=float)
    MA = np.zeros((N, N), dtype=float)
    for i in range(N):
        for j in range(max(0, i - self.p), i + 1):
            AR[i, j] = self.phi[i - j]
        for j in range(max(0, i - self.q), i + 1):
            MA[i, j] = self.theta[i - j]
    return np.linalg.solve(MA, AR)

__call__(v)

Return whitened vector (or matrix of column vectors) Wv

Source code in mass2/core/optimal_filtering.py
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
def __call__(self, v: ArrayLike) -> np.ndarray:
    "Return whitened vector (or matrix of column vectors) Wv"
    v = np.asarray(v)
    if v.ndim > 3:
        raise ValueError("v must be an array of dimension 1 or 2")
    elif v.ndim == 2:
        w = np.zeros_like(v)
        for i in range(v.shape[1]):
            w[:, i] = self(v[:, i])
        return w

    # Multiply by the Toeplitz AR matrix to make the MA*w vector.
    N = len(v)
    y = self.phi[0] * v
    for i in range(1, 1 + self.p):
        y[i:] += self.phi[i] * v[:-i]

    # Second, solve the MA matrix (a banded, lower-triangular Toeplitz matrix with
    # q non-zero subdiagonals.)
    y[0] /= self.theta[0]
    if N == 1:
        return y
    for i in range(1, min(self.q, N)):
        for j in range(i):
            y[i] -= y[j] * self.theta[i - j]
        y[i] /= self.theta[0]
    for i in range(self.q, N):
        for j in range(i - self.q, i):
            y[i] -= y[j] * self.theta[i - j]
        y[i] /= self.theta[0]
    return y

applyWT(v)

Return vector (or matrix of column vectors) W'v

Source code in mass2/core/optimal_filtering.py
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
def applyWT(self, v: ArrayLike) -> np.ndarray:
    """Return vector (or matrix of column vectors) W'v"""
    v = np.asarray(v)
    if v.ndim > 3:
        raise ValueError("v must be dimension 1 or 2")
    elif v.ndim == 2:
        r = np.zeros_like(v)
        for i in range(v.shape[1]):
            r[:, i] = self.applyWT(v[:, i])
        return r

    N = len(v)
    y = np.array(v)
    y[N - 1] /= self.theta[0]
    for i in range(N - 2, -1, -1):
        f = min(self.q + 1, N - i)
        y[i] -= np.dot(y[i + 1 : i + f], self.theta[1:f])
        y[i] /= self.theta[0]
    return np.correlate(y, self.phi, "full")[self.p :]

solveW(v)

Return unwhitened vector (or matrix of column vectors) inv(W)*v

Source code in mass2/core/optimal_filtering.py
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
def solveW(self, v: ArrayLike) -> np.ndarray:
    "Return unwhitened vector (or matrix of column vectors) inv(W)*v"
    v = np.asarray(v)
    if v.ndim > 3:
        raise ValueError("v must be dimension 1 or 2")
    elif v.ndim == 2:
        r = np.zeros_like(v)
        for i in range(v.shape[1]):
            r[:, i] = self.solveW(v[:, i])
        return r

    # Multiply by the Toeplitz MA matrix to make the AR*w vector.
    N = len(v)
    y = self.theta[0] * v
    for i in range(1, 1 + self.q):
        y[i:] += self.theta[i] * v[:-i]

    # Second, solve the AR matrix (a banded, lower-triangular Toeplitz matrix with
    # p non-zero subdiagonals.)
    y[0] /= self.phi[0]
    if N == 1:
        return y
    for i in range(1, min(self.p, N)):
        for j in range(i):
            y[i] -= y[j] * self.phi[i - j]
        y[i] /= self.phi[0]
    for i in range(self.p, N):
        for j in range(i - self.p, i):
            y[i] -= y[j] * self.phi[i - j]
        y[i] /= self.phi[0]
    return y

solveWT(v)

Return vector (or matrix of column vectors) inv(W')*v

Source code in mass2/core/optimal_filtering.py
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
def solveWT(self, v: ArrayLike) -> np.ndarray:
    "Return vector (or matrix of column vectors) inv(W')*v"
    v = np.asarray(v)
    if v.ndim > 3:
        raise ValueError("v must be dimension 1 or 2")
    elif v.ndim == 2:
        r = np.zeros_like(v)
        for i in range(v.shape[1]):
            r[:, i] = self.solveWT(v[:, i])
        return r

    N = len(v)
    y = np.array(v)
    y[N - 1] /= self.phi[0]
    for i in range(N - 2, -1, -1):
        f = min(self.p + 1, N - i)
        y[i] -= np.dot(y[i + 1 : i + f], self.phi[1:f])
        y[i] /= self.phi[0]
    return np.correlate(y, self.theta, "full")[self.q :]

whiten(v)

Return whitened vector (or matrix of column vectors) Wv

Source code in mass2/core/optimal_filtering.py
68
69
70
def whiten(self, v: ArrayLike) -> np.ndarray:
    "Return whitened vector (or matrix of column vectors) Wv"
    return self(v)

band_limit(modelmatrix, sample_time_sec, fmax, f_3db)

Band-limit the column-vectors in a model matrix with a hard and/or 1-pole low-pass filter. Change the input modelmatrix in-place.

No effect if both fmax and f_3db are None.

Parameters:
  • modelmatrix (ndarray) –

    The 1D or 2D array to band-limit. (If a 2D array, columns are independently band-limited.)

  • sample_time_sec (float) –

    The sampling period, normally in seconds.

  • fmax (Optional[float]) –

    The hard maximum frequency (units are inverse of sample_time_sec units, or Hz)

  • f_3db (Optional[float]) –

    The 1-pole low-pass filter's 3 dB point (same units as fmax)

Source code in mass2/core/optimal_filtering.py
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
def band_limit(modelmatrix: np.ndarray, sample_time_sec: float, fmax: float | None, f_3db: float | None) -> None:
    """Band-limit the column-vectors in a model matrix with a hard and/or
    1-pole low-pass filter. Change the input `modelmatrix` in-place.

    No effect if both `fmax` and `f_3db` are `None`.

    Parameters
    ----------
    modelmatrix : np.ndarray
        The 1D or 2D array to band-limit. (If a 2D array, columns are independently band-limited.)
    sample_time_sec : float
        The sampling period, normally in seconds.
    fmax : Optional[float]
        The hard maximum frequency (units are inverse of `sample_time_sec` units, or Hz)
    f_3db : Optional[float]
        The 1-pole low-pass filter's 3 dB point (same units as `fmax`)
    """
    if fmax is None and f_3db is None:
        return

    # Handle the 2D case by calling this function once per column.
    assert len(modelmatrix.shape) <= 2
    if len(modelmatrix.shape) == 2:
        for i in range(modelmatrix.shape[1]):
            band_limit(modelmatrix[:, i], sample_time_sec, fmax, f_3db)
        return

    vector = modelmatrix
    filt_length = len(vector)
    sig_ft = np.fft.rfft(vector)
    freq = np.fft.fftfreq(filt_length, d=sample_time_sec)
    freq = np.abs(freq[: len(sig_ft)])
    if fmax is not None:
        sig_ft[freq > fmax] = 0.0
    if f_3db is not None:
        sig_ft /= 1.0 + (1.0 * freq / f_3db) ** 2

    # n=filt_length is needed when filt_length is ODD
    vector[:] = np.fft.irfft(sig_ft, n=filt_length)

bracketR(q, noise)

Return the dot product (q^T R q) for vector and matrix R constructed from the vector by R_ij = noise_|i-j|. We don't want to construct the full matrix R because for records as long as 10,000 samples, the matrix will consist of 10^8 floats (800 MB of memory).

Source code in mass2/core/optimal_filtering.py
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
def bracketR(q: NDArray, noise: NDArray) -> float:
    """Return the dot product (q^T R q) for vector <q> and matrix R constructed from
    the vector <noise> by R_ij = noise_|i-j|.  We don't want to construct the full matrix
    R because for records as long as 10,000 samples, the matrix will consist of 10^8 floats
    (800 MB of memory).
    """

    if len(noise) < len(q):
        raise ValueError(f"Vector q (length {len(q)}) cannot be longer than the noise (length {len(noise)})")
    n = len(q)
    r = np.zeros(2 * n - 1, dtype=float)
    r[n - 1 :] = noise[:n]
    r[n - 1 :: -1] = noise[:n]
    dot = 0.0
    for i in range(n):
        dot += q[i] * r[n - i - 1 : 2 * n - i - 1].dot(q)
    return dot

Classes and functions to correct for arrival-time bias in optimal filtering.

PhaseCorrector

A class to correct for arrival-time bias in optimal filtering.

Source code in mass2/core/phase_correct.py
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
class PhaseCorrector:
    """A class to correct for arrival-time bias in optimal filtering."""

    version = 1

    def __init__(
        self,
        phase_uniformifier_x: ArrayLike,
        phase_uniformifier_y: ArrayLike,
        corrections: list[CubicSpline],
        indicatorName: str,
        uncorrectedName: str,
    ):
        self.corrections = corrections
        self.phase_uniformifier_x = np.array(phase_uniformifier_x)
        self.phase_uniformifier_y = np.array(phase_uniformifier_y)
        self.indicatorName = tostr(indicatorName)
        self.uncorrectedName = tostr(uncorrectedName)
        self.phase_uniformifier = CubicSpline(self.phase_uniformifier_x, self.phase_uniformifier_y)

    def toHDF5(self, hdf5_group: h5py.Group, name: str = "phase_correction", overwrite: bool = False) -> None:
        """Write to the given HDF5 group for later recovery from disk (by fromHDF5 class method)."""
        group = hdf5_group.require_group(name)

        def h5group_update(name: str, vector: ArrayLike) -> None:
            "Overwrite or create a dataset in the given group."
            if name in group:
                if overwrite:
                    del group[name]
                else:
                    raise AttributeError(f"Cannot overwrite phase correction dataset '{name}'")
            group[name] = vector

        h5group_update("phase_uniformifier_x", self.phase_uniformifier_x)
        h5group_update("phase_uniformifier_y", self.phase_uniformifier_y)
        h5group_update("uncorrected_name", self.uncorrectedName)
        h5group_update("indicator_name", self.indicatorName)
        h5group_update("version", self.version)
        for i, correction in enumerate(self.corrections):
            h5group_update(f"correction_{i}_x", correction._x)
            h5group_update(f"correction_{i}_y", correction._y)

    def correct(self, phase: ArrayLike, ph: ArrayLike) -> NDArray:
        """Apply the phase correction to the given data `ph`."""
        ph = np.asarray(ph)
        # attempt to force phases to fall between X and X
        phase_uniformified = np.asarray(phase) - self.phase_uniformifier(ph)
        # Compute a correction for each pulse for each correction-line energy
        # For the actual correction, don't let |ph| > 0.6 sample
        phase_clipped = np.clip(phase_uniformified, -0.6, 0.6)
        pheight_corrected = _phase_corrected_filtvals(phase_clipped, ph, self.corrections)
        return pheight_corrected

    def __call__(self, phase_indicator: ArrayLike, ph: ArrayLike) -> NDArray:
        "Equivalent to self.correct()"
        return self.correct(phase_indicator, ph)

    @classmethod
    def fromHDF5(cls, hdf5_group: h5py.Group, name: str = "phase_correction") -> "PhaseCorrector":
        """Recover a PhaseCorrector object from the given HDF5 group."""
        x = hdf5_group[f"{name}/phase_uniformifier_x"][()]
        y = hdf5_group[f"{name}/phase_uniformifier_y"][()]
        uncorrectedName = tostr(hdf5_group[f"{name}/uncorrected_name"][()])
        indicatorName = tostr(hdf5_group[f"{name}/indicator_name"][()])
        version = hdf5_group[f"{name}/version"][()]
        i = 0
        corrections = []
        while f"{name}/correction_{i}_x" in hdf5_group:
            _x = hdf5_group[f"{name}/correction_{i}_x"][()]
            _y = hdf5_group[f"{name}/correction_{i}_y"][()]
            corrections.append(CubicSpline(_x, _y))
            i += 1
        assert version == cls.version
        return cls(x, y, corrections, indicatorName, uncorrectedName)

    def __repr__(self) -> str:
        """String representation of this object."""
        s = f"""PhaseCorrector with
        splines at this many levels: {len(self.corrections)}
        phase_uniformifier_x: {self.phase_uniformifier_x}
        phase_uniformifier_y: {self.phase_uniformifier_y}
        uncorrectedName: {self.uncorrectedName}
        """
        return s

__call__(phase_indicator, ph)

Equivalent to self.correct()

Source code in mass2/core/phase_correct.py
71
72
73
def __call__(self, phase_indicator: ArrayLike, ph: ArrayLike) -> NDArray:
    "Equivalent to self.correct()"
    return self.correct(phase_indicator, ph)

__repr__()

String representation of this object.

Source code in mass2/core/phase_correct.py
 93
 94
 95
 96
 97
 98
 99
100
101
def __repr__(self) -> str:
    """String representation of this object."""
    s = f"""PhaseCorrector with
    splines at this many levels: {len(self.corrections)}
    phase_uniformifier_x: {self.phase_uniformifier_x}
    phase_uniformifier_y: {self.phase_uniformifier_y}
    uncorrectedName: {self.uncorrectedName}
    """
    return s

correct(phase, ph)

Apply the phase correction to the given data ph.

Source code in mass2/core/phase_correct.py
60
61
62
63
64
65
66
67
68
69
def correct(self, phase: ArrayLike, ph: ArrayLike) -> NDArray:
    """Apply the phase correction to the given data `ph`."""
    ph = np.asarray(ph)
    # attempt to force phases to fall between X and X
    phase_uniformified = np.asarray(phase) - self.phase_uniformifier(ph)
    # Compute a correction for each pulse for each correction-line energy
    # For the actual correction, don't let |ph| > 0.6 sample
    phase_clipped = np.clip(phase_uniformified, -0.6, 0.6)
    pheight_corrected = _phase_corrected_filtvals(phase_clipped, ph, self.corrections)
    return pheight_corrected

fromHDF5(hdf5_group, name='phase_correction') classmethod

Recover a PhaseCorrector object from the given HDF5 group.

Source code in mass2/core/phase_correct.py
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
@classmethod
def fromHDF5(cls, hdf5_group: h5py.Group, name: str = "phase_correction") -> "PhaseCorrector":
    """Recover a PhaseCorrector object from the given HDF5 group."""
    x = hdf5_group[f"{name}/phase_uniformifier_x"][()]
    y = hdf5_group[f"{name}/phase_uniformifier_y"][()]
    uncorrectedName = tostr(hdf5_group[f"{name}/uncorrected_name"][()])
    indicatorName = tostr(hdf5_group[f"{name}/indicator_name"][()])
    version = hdf5_group[f"{name}/version"][()]
    i = 0
    corrections = []
    while f"{name}/correction_{i}_x" in hdf5_group:
        _x = hdf5_group[f"{name}/correction_{i}_x"][()]
        _y = hdf5_group[f"{name}/correction_{i}_y"][()]
        corrections.append(CubicSpline(_x, _y))
        i += 1
    assert version == cls.version
    return cls(x, y, corrections, indicatorName, uncorrectedName)

toHDF5(hdf5_group, name='phase_correction', overwrite=False)

Write to the given HDF5 group for later recovery from disk (by fromHDF5 class method).

Source code in mass2/core/phase_correct.py
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
def toHDF5(self, hdf5_group: h5py.Group, name: str = "phase_correction", overwrite: bool = False) -> None:
    """Write to the given HDF5 group for later recovery from disk (by fromHDF5 class method)."""
    group = hdf5_group.require_group(name)

    def h5group_update(name: str, vector: ArrayLike) -> None:
        "Overwrite or create a dataset in the given group."
        if name in group:
            if overwrite:
                del group[name]
            else:
                raise AttributeError(f"Cannot overwrite phase correction dataset '{name}'")
        group[name] = vector

    h5group_update("phase_uniformifier_x", self.phase_uniformifier_x)
    h5group_update("phase_uniformifier_y", self.phase_uniformifier_y)
    h5group_update("uncorrected_name", self.uncorrectedName)
    h5group_update("indicator_name", self.indicatorName)
    h5group_update("version", self.version)
    for i, correction in enumerate(self.corrections):
        h5group_update(f"correction_{i}_x", correction._x)
        h5group_update(f"correction_{i}_y", correction._y)

phase_correct(phase, pheight, ph_peaks=None, method2017=True, kernel_width=None, indicatorName='', uncorrectedName='')

Create a PhaseCorrector object to correct for arrival-time bias in optimal filtering.

Source code in mass2/core/phase_correct.py
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
def phase_correct(
    phase: ArrayLike,
    pheight: ArrayLike,
    ph_peaks: ArrayLike | None = None,
    method2017: bool = True,
    kernel_width: float | None = None,
    indicatorName: str = "",
    uncorrectedName: str = "",
) -> PhaseCorrector:
    """Create a PhaseCorrector object to correct for arrival-time bias in optimal filtering."""
    phase = np.asarray(phase)
    pheight = np.asarray(pheight)
    if ph_peaks is None:
        ph_peaks = _find_peaks_heuristic(pheight)
    ph_peaks = np.asarray(ph_peaks)
    if len(ph_peaks) <= 0:
        raise ValueError("Could not phase_correct because no peaks found")
    ph_peaks.sort()

    # Compute a correction function at each line in ph_peaks
    corrections = []
    median_phase = []
    if kernel_width is None:
        kernel_width = np.max(ph_peaks) / 1000.0
    for pk in ph_peaks:
        nextcorr, mphase = _phasecorr_find_alignment(
            phase, pheight, pk, 0.012 * np.mean(ph_peaks), method2017=method2017, kernel_width=kernel_width
        )
        corrections.append(nextcorr)
        median_phase.append(mphase)

    NC = len(corrections)
    if NC > 3:
        phase_uniformifier_x = ph_peaks
        phase_uniformifier_y = np.array(median_phase)
    else:
        # Too few peaks to spline, so just bin and take the median per bin, then
        # interpolated (approximating) spline through/near these points.
        NBINS = 10
        top = min(pheight.max(), 1.2 * np.percentile(pheight, 98))
        bin = np.digitize(pheight, np.linspace(0, top, 1 + NBINS)) - 1
        x = np.zeros(NBINS, dtype=float)
        y = np.zeros(NBINS, dtype=float)
        w = np.zeros(NBINS, dtype=float)
        for i in range(NBINS):
            w[i] = (bin == i).sum()
            if w[i] == 0:
                continue
            x[i] = np.median(pheight[bin == i])
            y[i] = np.median(phase[bin == i])

        nonempty = w > 0
        # Use sp.interpolate.UnivariateSpline because it can make an approximating
        # spline. But then use its x/y data and knots to create a Mass CubicSpline,
        # because that one can have natural boundary conditions instead of insane
        # cubic functions in the extrapolation.
        if nonempty.sum() > 1:
            spline_order = min(3, nonempty.sum() - 1)
            crazy_spline = sp.interpolate.UnivariateSpline(x[nonempty], y[nonempty], w=w[nonempty] * (12**-0.5), k=spline_order)
            phase_uniformifier_x = crazy_spline._data[0]
            phase_uniformifier_y = crazy_spline._data[1]
        else:
            phase_uniformifier_x = np.array([0, 0, 0, 0])
            phase_uniformifier_y = np.array([0, 0, 0, 0])

    return PhaseCorrector(phase_uniformifier_x, phase_uniformifier_y, corrections, indicatorName, uncorrectedName)

Phase correction step, Mass-style.

PhaseCorrectMassStep dataclass

Bases: RecipeStep

Perform a Mass-style phase correction step.

Source code in mass2/core/phase_correct_steps.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
@dataclass(frozen=True)
class PhaseCorrectMassStep(RecipeStep):
    """Perform a Mass-style phase correction step."""

    line_names: list[str]
    line_energies: list[float]
    previous_step_index: int
    phase_corrector: PhaseCorrector

    def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
        """Calculate the phase-corrected pulse height and return a new DataFrame."""
        # since we only need to load two columns I'm assuming we can fit them in memory and just
        # loading them whole
        # if it becomes an issues, use iter_slices or
        # add a user defined funciton in rust
        indicator_col, uncorrected_col = self.inputs
        corrected_col = self.output[0]
        indicator = df[indicator_col].to_numpy()
        uncorrected = df[uncorrected_col].to_numpy()
        corrected = self.phase_corrector(indicator, uncorrected)
        series = pl.Series(corrected_col, corrected)
        df2 = df.with_columns(series)
        return df2

    def dbg_plot(self, df_after: pl.DataFrame, **kwargs: Any) -> plt.Axes:
        """Make a diagnostic plot of the phase correction."""
        indicator_col, uncorrected_col = self.inputs
        df_small = df_after.lazy().filter(self.good_expr).filter(self.use_expr).select(self.inputs + self.output).collect()
        mass2.misc.plot_a_vs_b_series(df_small[indicator_col], df_small[uncorrected_col])
        mass2.misc.plot_a_vs_b_series(
            df_small[indicator_col],
            df_small[self.output[0]],
            plt.gca(),
        )
        plt.legend()
        plt.tight_layout()
        return plt.gca()

calc_from_df(df)

Calculate the phase-corrected pulse height and return a new DataFrame.

Source code in mass2/core/phase_correct_steps.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
    """Calculate the phase-corrected pulse height and return a new DataFrame."""
    # since we only need to load two columns I'm assuming we can fit them in memory and just
    # loading them whole
    # if it becomes an issues, use iter_slices or
    # add a user defined funciton in rust
    indicator_col, uncorrected_col = self.inputs
    corrected_col = self.output[0]
    indicator = df[indicator_col].to_numpy()
    uncorrected = df[uncorrected_col].to_numpy()
    corrected = self.phase_corrector(indicator, uncorrected)
    series = pl.Series(corrected_col, corrected)
    df2 = df.with_columns(series)
    return df2

dbg_plot(df_after, **kwargs)

Make a diagnostic plot of the phase correction.

Source code in mass2/core/phase_correct_steps.py
40
41
42
43
44
45
46
47
48
49
50
51
52
def dbg_plot(self, df_after: pl.DataFrame, **kwargs: Any) -> plt.Axes:
    """Make a diagnostic plot of the phase correction."""
    indicator_col, uncorrected_col = self.inputs
    df_small = df_after.lazy().filter(self.good_expr).filter(self.use_expr).select(self.inputs + self.output).collect()
    mass2.misc.plot_a_vs_b_series(df_small[indicator_col], df_small[uncorrected_col])
    mass2.misc.plot_a_vs_b_series(
        df_small[indicator_col],
        df_small[self.output[0]],
        plt.gca(),
    )
    plt.legend()
    plt.tight_layout()
    return plt.gca()

phase_correct_mass_specific_lines(ch, indicator_col, uncorrected_col, corrected_col, previous_step_index, line_names, use_expr)

Perform a Mass-style phase correction step using specific lines.

Source code in mass2/core/phase_correct_steps.py
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
def phase_correct_mass_specific_lines(
    ch: Channel,
    indicator_col: str,
    uncorrected_col: str,
    corrected_col: str,
    previous_step_index: int,
    line_names: Iterable[str | float],
    use_expr: pl.Expr,
) -> PhaseCorrectMassStep:
    """Perform a Mass-style phase correction step using specific lines."""
    previous_step, previous_step_index = ch.get_step(previous_step_index)
    assert hasattr(previous_step, "energy2ph")
    (line_names, line_energies) = mass2.calibration.algorithms.line_names_and_energies(line_names)
    line_positions = [previous_step.energy2ph(line_energy) for line_energy in line_energies]
    [indicator, uncorrected] = ch.good_serieses([indicator_col, uncorrected_col], use_expr=use_expr)
    phase_corrector = mass2.core.phase_correct.phase_correct(
        indicator.to_numpy(),
        uncorrected.to_numpy(),
        line_positions,
        indicatorName=indicator_col,
        uncorrectedName=uncorrected_col,
    )
    return PhaseCorrectMassStep(
        inputs=[indicator_col, uncorrected_col],
        output=[corrected_col],
        good_expr=ch.good_expr,
        use_expr=use_expr,
        line_names=line_names,
        line_energies=line_energies,
        previous_step_index=previous_step_index,
        phase_corrector=phase_corrector,
    )

Pulse summarizing algorithms.

fit_pulse_2exp_with_tail(data, npre, dt=1, guess_tau=None)

Fit a pulse shape to data using two exponentials plus an exponential tail.

Source code in mass2/core/pulse_algorithms.py
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
def fit_pulse_2exp_with_tail(data: ArrayLike, npre: int, dt: float = 1, guess_tau: float | None = None) -> LineModelResult:
    """Fit a pulse shape to data using two exponentials plus an exponential tail."""
    data = np.asarray(data)
    if guess_tau is None:
        guess_tau = dt * len(data) / 5
    model = lmfit.Model(pulse_2exp_with_tail)
    baseline = np.amin(data)
    params = model.make_params(
        t0=npre * dt,
        a_tail=data[0] - baseline,
        baseline=baseline,
        a=np.amax(data) - baseline,
        tau_tail=guess_tau,
        tau_rise=guess_tau,
        tau_fall_factor=2.0,
    )
    params["a_tail"].set(min=0)
    params["a"].set(min=0)
    params["tau_tail"].set(min=dt / 5)
    params["tau_rise"].set(min=dt / 5)
    params["tau_fall_factor"].set(min=1)
    params.add("tau_fall", expr="tau_rise*tau_fall_factor")

    result = model.fit(data, params, t=np.arange(len(data)) * dt)

    return result

pulse_2exp_with_tail(t, t0, a_tail, tau_tail, a, tau_rise, tau_fall_factor, baseline)

Create a pulse shape from two exponentials plus an exponential tail.

Source code in mass2/core/pulse_algorithms.py
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
def pulse_2exp_with_tail(
    t: ArrayLike, t0: float, a_tail: float, tau_tail: float, a: float, tau_rise: float, tau_fall_factor: float, baseline: float
) -> NDArray:
    """Create a pulse shape from two exponentials plus an exponential tail."""
    tt = np.asarray(t) - t0
    tau_fall = tau_rise * tau_fall_factor
    assert tau_fall_factor >= 1

    if tau_fall_factor > 1:
        # location of peak
        t_peak = (tau_rise * tau_fall) / (tau_fall - tau_rise) * np.log(tau_fall / tau_rise)
        # value at peak
        max_val = np.exp(-t_peak / tau_fall) - np.exp(-t_peak / tau_rise)
    else:  # tau_fall == tau_rise
        max_val = 1 / np.e

    return (
        a_tail * np.exp(-tt / tau_tail) / np.exp(-tt[0] / tau_tail)  # normalized tail
        + a * (np.exp(-tt / tau_fall) - np.exp(-tt / tau_rise)) * np.greater(tt, 0) / max_val
        + baseline
    )

summarize_data_numba(rawdata, timebase, peak_samplenumber, pretrigger_ignore_samples, nPresamples)

Summarize one segment of the data file, loading it into cache.

Source code in mass2/core/pulse_algorithms.py
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
@njit
def summarize_data_numba(  # noqa: PLR0914
    rawdata: NDArray[np.uint16],
    timebase: float,
    peak_samplenumber: int,
    pretrigger_ignore_samples: int,
    nPresamples: int,
) -> ResultArrayType:
    """Summarize one segment of the data file, loading it into cache."""
    nPulses = rawdata.shape[0]
    nSamples = rawdata.shape[1]

    e_nPresamples = nPresamples - pretrigger_ignore_samples

    # Create the structured array for results
    results = np.zeros(nPulses, dtype=result_dtype)

    for j in range(nPulses):
        pulse = rawdata[j, :]
        pretrig_sum = 0.0
        pretrig_rms_sum = 0.0
        pulse_sum = 0.0
        pulse_rms_sum = 0.0
        promptness_sum = 0.0
        peak_value = 0
        peak_index = 0
        min_value = np.iinfo(np.uint16).max
        s_prompt = nPresamples + 2
        e_prompt = nPresamples + 8

        for k in range(nSamples):
            signal = pulse[k]

            if signal > peak_value:
                peak_value = signal
                peak_index = k
            min_value = min(min_value, signal)

            if k < e_nPresamples:
                pretrig_sum += signal
                pretrig_rms_sum += signal**2

            if s_prompt <= k < e_prompt:
                promptness_sum += signal

            if k == nPresamples - 1:
                ptm = pretrig_sum / e_nPresamples
                ptrms = np.sqrt(pretrig_rms_sum / e_nPresamples - ptm**2)
                if signal - ptm > 4.3 * ptrms:
                    e_prompt -= 1
                    s_prompt -= 1
                    results["shift1"][j] = 1
                else:
                    results["shift1"][j] = 0

            if k >= nPresamples - 1:
                pulse_sum += signal
                pulse_rms_sum += signal**2

        results["pretrig_mean"][j] = ptm
        results["pretrig_rms"][j] = ptrms
        if ptm < peak_value:
            peak_value -= int(ptm + 0.5)
            results["promptness"][j] = (promptness_sum / 6.0 - ptm) / peak_value
            results["peak_value"][j] = peak_value
            results["peak_index"][j] = peak_index
        else:
            results["promptness"][j] = 0.0
            results["peak_value"][j] = 0
            results["peak_index"][j] = 0
        results["min_value"][j] = min_value
        pulse_avg = pulse_sum / (nSamples - nPresamples + 1) - ptm
        results["pulse_average"][j] = pulse_avg
        results["pulse_rms"][j] = np.sqrt(pulse_rms_sum / (nSamples - nPresamples + 1) - ptm * pulse_avg * 2 - ptm**2)

        low_th = int(0.1 * peak_value + ptm)
        high_th = int(0.9 * peak_value + ptm)

        k = nPresamples
        low_value = high_value = pulse[k]
        while k < nSamples:
            signal = pulse[k]
            if signal > low_th:
                low_idx = k
                low_value = signal
                break
            k += 1

        high_value = low_value
        high_idx = low_idx

        while k < nSamples:
            signal = pulse[k]
            if signal > high_th:
                high_idx = k - 1
                high_value = pulse[high_idx]
                break
            k += 1

        if high_value > low_value:
            results["rise_time"][j] = timebase * (high_idx - low_idx) * peak_value / (high_value - low_value)
        else:
            results["rise_time"][j] = timebase

        # The following is quite confusing, but it appears to be equivalent to
        # slope = -2 * pulse[peak_samplenumber:-4]
        # slope -= pulse[peak_samplenumber+1:-3]
        # slope += pulse[peak_samplenumber+3:-1]
        # slope += 2*pulse[peak_samplenumber+4:]
        # slope = np.minimum(slope[2:], slope[:-2])
        # results["postpeak_deriv"][j] = 0.1 * np.max(slope)
        # TODO: consider replacing, if the above is not slower?

        f0, f1, f3, f4 = 2, 1, -1, -2
        s0, s1, s2, s3 = (
            pulse[peak_samplenumber],
            pulse[peak_samplenumber + 1],
            pulse[peak_samplenumber + 2],
            pulse[peak_samplenumber + 3],
        )
        s4 = pulse[peak_samplenumber + 4]
        t0 = f4 * s0 + f3 * s1 + f1 * s3 + f0 * s4
        s0, s1, s2, s3 = s1, s2, s3, s4
        s4 = pulse[peak_samplenumber + 5]
        t1 = f4 * s0 + f3 * s1 + f1 * s3 + f0 * s4
        t_max_deriv = np.iinfo(np.int32).min

        for k in range(peak_samplenumber + 6, nSamples):
            s0, s1, s2, s3 = s1, s2, s3, s4
            s4 = pulse[k]
            t2 = f4 * s0 + f3 * s1 + f1 * s3 + f0 * s4

            t3 = min(t2, t0)
            t_max_deriv = max(t_max_deriv, t3)

            t0, t1 = t1, t2

        results["postpeak_deriv"][j] = 0.1 * t_max_deriv

    return results

Pulse model object, to hold a low-dimensional linear basis able to express all normal pulses,

PulseModel

Object to hold a "pulse model", meaning a low-dimensional linear basis to express "all" pulses, along with a projector such that projector.dot(basis) is the identity matrix.

Also has the capacity to store to and restore from HDF5, and the ability to compute additional basis elements and corresponding projectors with method _additional_projectors_tsvd

Source code in mass2/core/pulse_model.py
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
class PulseModel:
    """Object to hold a "pulse model", meaning a low-dimensional linear basis to express "all" pulses,
    along with a projector such that projector.dot(basis) is the identity matrix.

    Also has the capacity to store to and restore from HDF5, and the ability to compute additional
    basis elements and corresponding projectors with method _additional_projectors_tsvd"""

    version = 2

    def __init__(  # noqa: PLR0917
        self,
        projectors_so_far: NDArray,
        basis_so_far: NDArray,
        n_basis: int,
        pulses_for_svd: NDArray,
        v_dv: float,
        pretrig_rms_median: float,
        pretrig_rms_sigma: float,
        file_name: str,
        extra_n_basis_5lag: int,
        f_5lag: NDArray,
        average_pulse_for_5lag: NDArray,
        noise_psd: NDArray,
        noise_psd_delta_f: NDArray,
        noise_autocorr: NDArray,
        _from_hdf5: bool = False,
    ):
        self.pulses_for_svd = pulses_for_svd
        self.n_basis = n_basis
        dn = n_basis - extra_n_basis_5lag
        if projectors_so_far.shape[0] < dn:
            self.projectors, self.basis = self._additional_projectors_tsvd(projectors_so_far, basis_so_far, dn, pulses_for_svd)
        elif (projectors_so_far.shape[0] == dn) or _from_hdf5:
            self.projectors, self.basis = projectors_so_far, basis_so_far
        else:  # don't throw error on
            s = f"n_basis-extra_n_basis_5lag={dn} < projectors_so_far.shape[0] = {projectors_so_far.shape[0]}"
            s += f", extra_n_basis_5lag={extra_n_basis_5lag}"
            raise Exception(s)
        if (not _from_hdf5) and (extra_n_basis_5lag > 0):
            filters_5lag = np.zeros((len(f_5lag) + 4, 5))
            for i in range(5):
                if i < 4:
                    filters_5lag[i : -4 + i, i] = projectors_so_far[2, 2:-2]
                else:
                    filters_5lag[i:, i] = projectors_so_far[2, 2:-2]
            self.projectors, self.basis = self._additional_projectors_tsvd(self.projectors, self.basis, n_basis, filters_5lag)

        self.v_dv = v_dv
        self.pretrig_rms_median = pretrig_rms_median
        self.pretrig_rms_sigma = pretrig_rms_sigma
        self.file_name = str(file_name)
        self.extra_n_basis_5lag = extra_n_basis_5lag
        self.f_5lag = f_5lag
        self.average_pulse_for_5lag = average_pulse_for_5lag
        self.noise_psd = noise_psd
        self.noise_psd_delta_f = noise_psd_delta_f
        self.noise_autocorr = noise_autocorr

    def toHDF5(self, hdf5_group: h5py.Group, save_inverted: bool) -> None:
        """Save the pulse model to an HDF5 group."""
        projectors, basis = self.projectors[()], self.basis[()]
        if save_inverted:
            # flip every component except the mean component if data is being inverted
            basis[:, 1:] *= -1
            projectors[1:, :] *= -1

        # projectors is MxN, where N is samples/record and M the number of basis elements
        # basis is NxM
        hdf5_group["svdbasis/projectors"] = projectors
        hdf5_group["svdbasis/basis"] = basis
        hdf5_group["svdbasis/v_dv"] = self.v_dv
        hdf5_group["svdbasis/training_pulses_for_plots"] = self.pulses_for_svd
        hdf5_group["svdbasis/was_saved_inverted"] = save_inverted
        hdf5_group["svdbasis/pretrig_rms_median"] = self.pretrig_rms_median
        hdf5_group["svdbasis/pretrig_rms_sigma"] = self.pretrig_rms_sigma
        hdf5_group["svdbasis/version"] = self.version
        hdf5_group["svdbasis/file_name"] = self.file_name
        hdf5_group["svdbasis/extra_n_basis_5lag"] = self.extra_n_basis_5lag
        hdf5_group["svdbasis/5lag_filter"] = self.f_5lag
        hdf5_group["svdbasis/average_pulse_for_5lag"] = self.average_pulse_for_5lag
        hdf5_group["svdbasis/noise_psd"] = self.noise_psd
        hdf5_group["svdbasis/noise_psd_delta_f"] = self.noise_psd_delta_f
        hdf5_group["svdbasis/noise_autocorr"] = self.noise_autocorr

    @classmethod
    def fromHDF5(cls, hdf5_group: h5py.Group) -> "PulseModel":
        """Restore a pulse model from an HDF5 group."""
        projectors = hdf5_group["svdbasis/projectors"][()]
        n_basis = projectors.shape[0]
        basis = hdf5_group["svdbasis/basis"][()]
        v_dv = hdf5_group["svdbasis/v_dv"][()]
        pulses_for_svd = hdf5_group["svdbasis/training_pulses_for_plots"][()]
        pretrig_rms_median = hdf5_group["svdbasis/pretrig_rms_median"][()]
        pretrig_rms_sigma = hdf5_group["svdbasis/pretrig_rms_sigma"][()]
        version = hdf5_group["svdbasis/version"][()]
        file_name = tostr(hdf5_group["svdbasis/file_name"][()])
        extra_n_basis_5lag = hdf5_group["svdbasis/extra_n_basis_5lag"][()]
        f_5lag = hdf5_group["svdbasis/5lag_filter"][()]
        average_pulse_for_5lag = hdf5_group["svdbasis/average_pulse_for_5lag"][()]
        noise_psd = hdf5_group["svdbasis/noise_psd"][()]
        noise_psd_delta_f = hdf5_group["svdbasis/noise_psd_delta_f"][()]
        noise_autocorr = hdf5_group["svdbasis/noise_autocorr"][()]

        if version != cls.version:
            raise Exception(f"loading not implemented for other versions, version={version}")
        return cls(
            projectors,
            basis,
            n_basis,
            pulses_for_svd,
            v_dv,
            pretrig_rms_median,
            pretrig_rms_sigma,
            file_name,
            extra_n_basis_5lag,
            f_5lag,
            average_pulse_for_5lag,
            noise_psd,
            noise_psd_delta_f,
            noise_autocorr,
            _from_hdf5=True,
        )

    @staticmethod
    def _additional_projectors_tsvd(
        projectors: NDArray, basis: NDArray, n_basis: int, pulses_for_svd: NDArray
    ) -> tuple[NDArray, NDArray]:
        """
        Given an existing basis with projectors, compute a basis with n_basis elements
        by randomized SVD of the residual elements of the training data in pulses_for_svd.
        It should be the case that projectors.dot(basis) is approximately the identity matrix.

        It is assumed that the projectors will have been computed from the basis in some
        noise-optimal way, say, from optimal filtering. However, the additional basis elements
        will be computed from a standard (non-noise-weighted) SVD, and the additional projectors
        will be computed without noise optimization.

        The projectors and basis will be ordered as:
        mean, deriv_ike, pulse_like, any svd components...
        """
        # Check sanity of inputs
        n_samples, n_existing = basis.shape
        assert (n_existing, n_samples) == projectors.shape
        assert n_basis >= n_existing

        if n_basis == n_existing:
            return projectors, basis

        mpc = np.matmul(projectors, pulses_for_svd)  # modeled pulse coefs
        mp = np.matmul(basis, mpc)  # modeled pulse
        residuals = pulses_for_svd - mp
        Q = mass2.mathstat.utilities.find_range_randomly(residuals, n_basis - n_existing)

        projectors2 = np.linalg.pinv(Q)  # = Q.T, perhaps??
        projectors2 -= projectors2.dot(basis).dot(projectors)

        basis = np.hstack([basis, Q])
        projectors = np.vstack([projectors, projectors2])

        return projectors, basis

    def labels(self) -> list[str]:
        """Return a list of labels for the basis elements."""
        labels = ["const", "deriv", "pulse"]
        for i in range(self.n_basis - 3):
            if i > self.n_basis - 3 - self.extra_n_basis_5lag:
                labels += [f"5lag{i + 2 - self.extra_n_basis_5lag}"]
            else:
                labels += [f"svd{i}"]
        return labels

    def plot(self, fig1: plt.Axes | None = None, fig2: plt.Axes | None = None) -> None:
        """Plot a pulse model"""
        # plots information about a pulse model
        # fig1 and fig2 are optional matplotlib.pyplot (plt) figures if you need to embed the plots.
        # you can pass in the reference like fig=plt.figure() call or the figure's number, e.g. fig.number
        #   fig1 has modeled pulse vs true pulse
        #   fig2 has projectors, basis, "from ljh", residuals, and a measure of "wrongness"

        labels = self.labels()
        mpc = np.matmul(self.projectors, self.pulses_for_svd)
        mp = np.matmul(self.basis, mpc)
        residuals = self.pulses_for_svd - mp

        if fig1 is None:
            fig = plt.figure(figsize=(10, 14))
        else:
            fig = plt.figure(fig1)
        plt.subplot(511)
        plt.plot(self.projectors[::-1, :].T)
        plt.title("projectors")
        # projector_scale = np.amax(np.abs(self.projectors[2, :]))
        # plt.ylim(-2*projector_scale, 2*projector_scale)
        plt.legend(labels[::-1])
        plt.grid(True)
        plt.subplot(512)
        plt.plot(self.basis[:, ::-1])
        plt.title("basis")
        plt.legend(labels[::-1])
        plt.grid(True)
        plt.subplot(513)
        plt.plot(self.pulses_for_svd[:, :10])
        plt.title("from ljh")
        plt.legend([f"{i}" for i in range(10)])
        plt.grid(True)
        plt.subplot(514)
        plt.plot(residuals[:, :10])
        plt.title("residuals")
        plt.legend([f"{i}" for i in range(10)])
        plt.grid(True)
        should_be_identity = np.matmul(self.projectors, self.basis)
        identity = np.identity(self.n_basis)
        wrongness = np.abs(should_be_identity - identity)
        wrongness[wrongness < 1e-20] = 1e-20  # avoid warnings
        plt.subplot(515)
        plt.imshow(np.log10(wrongness))
        plt.title("log10(abs(projectors*basis-identity))")
        plt.colorbar()
        fig.suptitle(self.file_name)

        if fig2 is None:
            plt.figure(figsize=(10, 14))
        else:
            plt.figure(fig2)
        plt.plot(self.pulses_for_svd[:, 0], label="from ljh index 0")
        plt.plot(mp[:, 0], label="modeled pulse index 0")
        plt.legend()
        plt.title("modeled pulse vs true pulse")

fromHDF5(hdf5_group) classmethod

Restore a pulse model from an HDF5 group.

Source code in mass2/core/pulse_model.py
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
@classmethod
def fromHDF5(cls, hdf5_group: h5py.Group) -> "PulseModel":
    """Restore a pulse model from an HDF5 group."""
    projectors = hdf5_group["svdbasis/projectors"][()]
    n_basis = projectors.shape[0]
    basis = hdf5_group["svdbasis/basis"][()]
    v_dv = hdf5_group["svdbasis/v_dv"][()]
    pulses_for_svd = hdf5_group["svdbasis/training_pulses_for_plots"][()]
    pretrig_rms_median = hdf5_group["svdbasis/pretrig_rms_median"][()]
    pretrig_rms_sigma = hdf5_group["svdbasis/pretrig_rms_sigma"][()]
    version = hdf5_group["svdbasis/version"][()]
    file_name = tostr(hdf5_group["svdbasis/file_name"][()])
    extra_n_basis_5lag = hdf5_group["svdbasis/extra_n_basis_5lag"][()]
    f_5lag = hdf5_group["svdbasis/5lag_filter"][()]
    average_pulse_for_5lag = hdf5_group["svdbasis/average_pulse_for_5lag"][()]
    noise_psd = hdf5_group["svdbasis/noise_psd"][()]
    noise_psd_delta_f = hdf5_group["svdbasis/noise_psd_delta_f"][()]
    noise_autocorr = hdf5_group["svdbasis/noise_autocorr"][()]

    if version != cls.version:
        raise Exception(f"loading not implemented for other versions, version={version}")
    return cls(
        projectors,
        basis,
        n_basis,
        pulses_for_svd,
        v_dv,
        pretrig_rms_median,
        pretrig_rms_sigma,
        file_name,
        extra_n_basis_5lag,
        f_5lag,
        average_pulse_for_5lag,
        noise_psd,
        noise_psd_delta_f,
        noise_autocorr,
        _from_hdf5=True,
    )

labels()

Return a list of labels for the basis elements.

Source code in mass2/core/pulse_model.py
174
175
176
177
178
179
180
181
182
def labels(self) -> list[str]:
    """Return a list of labels for the basis elements."""
    labels = ["const", "deriv", "pulse"]
    for i in range(self.n_basis - 3):
        if i > self.n_basis - 3 - self.extra_n_basis_5lag:
            labels += [f"5lag{i + 2 - self.extra_n_basis_5lag}"]
        else:
            labels += [f"svd{i}"]
    return labels

plot(fig1=None, fig2=None)

Plot a pulse model

Source code in mass2/core/pulse_model.py
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
def plot(self, fig1: plt.Axes | None = None, fig2: plt.Axes | None = None) -> None:
    """Plot a pulse model"""
    # plots information about a pulse model
    # fig1 and fig2 are optional matplotlib.pyplot (plt) figures if you need to embed the plots.
    # you can pass in the reference like fig=plt.figure() call or the figure's number, e.g. fig.number
    #   fig1 has modeled pulse vs true pulse
    #   fig2 has projectors, basis, "from ljh", residuals, and a measure of "wrongness"

    labels = self.labels()
    mpc = np.matmul(self.projectors, self.pulses_for_svd)
    mp = np.matmul(self.basis, mpc)
    residuals = self.pulses_for_svd - mp

    if fig1 is None:
        fig = plt.figure(figsize=(10, 14))
    else:
        fig = plt.figure(fig1)
    plt.subplot(511)
    plt.plot(self.projectors[::-1, :].T)
    plt.title("projectors")
    # projector_scale = np.amax(np.abs(self.projectors[2, :]))
    # plt.ylim(-2*projector_scale, 2*projector_scale)
    plt.legend(labels[::-1])
    plt.grid(True)
    plt.subplot(512)
    plt.plot(self.basis[:, ::-1])
    plt.title("basis")
    plt.legend(labels[::-1])
    plt.grid(True)
    plt.subplot(513)
    plt.plot(self.pulses_for_svd[:, :10])
    plt.title("from ljh")
    plt.legend([f"{i}" for i in range(10)])
    plt.grid(True)
    plt.subplot(514)
    plt.plot(residuals[:, :10])
    plt.title("residuals")
    plt.legend([f"{i}" for i in range(10)])
    plt.grid(True)
    should_be_identity = np.matmul(self.projectors, self.basis)
    identity = np.identity(self.n_basis)
    wrongness = np.abs(should_be_identity - identity)
    wrongness[wrongness < 1e-20] = 1e-20  # avoid warnings
    plt.subplot(515)
    plt.imshow(np.log10(wrongness))
    plt.title("log10(abs(projectors*basis-identity))")
    plt.colorbar()
    fig.suptitle(self.file_name)

    if fig2 is None:
        plt.figure(figsize=(10, 14))
    else:
        plt.figure(fig2)
    plt.plot(self.pulses_for_svd[:, 0], label="from ljh index 0")
    plt.plot(mp[:, 0], label="modeled pulse index 0")
    plt.legend()
    plt.title("modeled pulse vs true pulse")

toHDF5(hdf5_group, save_inverted)

Save the pulse model to an HDF5 group.

Source code in mass2/core/pulse_model.py
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
def toHDF5(self, hdf5_group: h5py.Group, save_inverted: bool) -> None:
    """Save the pulse model to an HDF5 group."""
    projectors, basis = self.projectors[()], self.basis[()]
    if save_inverted:
        # flip every component except the mean component if data is being inverted
        basis[:, 1:] *= -1
        projectors[1:, :] *= -1

    # projectors is MxN, where N is samples/record and M the number of basis elements
    # basis is NxM
    hdf5_group["svdbasis/projectors"] = projectors
    hdf5_group["svdbasis/basis"] = basis
    hdf5_group["svdbasis/v_dv"] = self.v_dv
    hdf5_group["svdbasis/training_pulses_for_plots"] = self.pulses_for_svd
    hdf5_group["svdbasis/was_saved_inverted"] = save_inverted
    hdf5_group["svdbasis/pretrig_rms_median"] = self.pretrig_rms_median
    hdf5_group["svdbasis/pretrig_rms_sigma"] = self.pretrig_rms_sigma
    hdf5_group["svdbasis/version"] = self.version
    hdf5_group["svdbasis/file_name"] = self.file_name
    hdf5_group["svdbasis/extra_n_basis_5lag"] = self.extra_n_basis_5lag
    hdf5_group["svdbasis/5lag_filter"] = self.f_5lag
    hdf5_group["svdbasis/average_pulse_for_5lag"] = self.average_pulse_for_5lag
    hdf5_group["svdbasis/noise_psd"] = self.noise_psd
    hdf5_group["svdbasis/noise_psd_delta_f"] = self.noise_psd_delta_f
    hdf5_group["svdbasis/noise_autocorr"] = self.noise_autocorr

Define RecipeStep and Recipe classes for processing pulse data in a sequence of steps.

CategorizeStep dataclass

Bases: RecipeStep

A step to categorize pulses into discrete categories based on conditions given in a dictionary mapping category names to polars expressions. The first condition must be True, to be used as a fallback.

Source code in mass2/core/recipe.py
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
@dataclass(frozen=True)
class CategorizeStep(RecipeStep):
    """A step to categorize pulses into discrete categories based on conditions given in a dictionary mapping
    category names to polars expressions. The first condition must be True, to be used as a fallback."""

    category_condition_dict: dict[str, pl.Expr]

    def __post_init__(self) -> None:
        """Verify that the first condition is always True."""
        err_msg = "The first condition must be True, to be used as a fallback"
        first_condition = next(iter(self.category_condition_dict.values()))
        assert first_condition is True or first_condition.meta.eq(pl.lit(True)), err_msg

    def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
        """Calculate the category for each pulse and return a new DataFrame with a column for the category names."""
        output_col = self.output[0]

        def categorize_df(df: pl.DataFrame, category_condition_dict: dict[str, pl.Expr], output_col: str) -> pl.DataFrame:
            """returns a series showing which category each pulse is in
            pulses will be assigned to the last category for which the condition evaluates to True"""
            dtype = pl.Enum(category_condition_dict.keys())
            physical = np.zeros(len(df), dtype=int)
            for category_int, (category_str, condition_expr) in enumerate(category_condition_dict.items()):
                if condition_expr is True or condition_expr.meta.eq(pl.lit(True)):
                    in_category = np.ones(len(df), dtype=bool)
                else:
                    in_category = df.select(condition_expr).fill_null(False).to_numpy().flatten()
                assert in_category.dtype == bool
                physical[in_category] = category_int
            series = pl.Series(name=output_col, values=physical).cast(dtype)
            df = pl.DataFrame({output_col: series})
            return df

        df2 = categorize_df(df, self.category_condition_dict, output_col).with_columns(df)
        return df2

__post_init__()

Verify that the first condition is always True.

Source code in mass2/core/recipe.py
220
221
222
223
224
def __post_init__(self) -> None:
    """Verify that the first condition is always True."""
    err_msg = "The first condition must be True, to be used as a fallback"
    first_condition = next(iter(self.category_condition_dict.values()))
    assert first_condition is True or first_condition.meta.eq(pl.lit(True)), err_msg

calc_from_df(df)

Calculate the category for each pulse and return a new DataFrame with a column for the category names.

Source code in mass2/core/recipe.py
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
    """Calculate the category for each pulse and return a new DataFrame with a column for the category names."""
    output_col = self.output[0]

    def categorize_df(df: pl.DataFrame, category_condition_dict: dict[str, pl.Expr], output_col: str) -> pl.DataFrame:
        """returns a series showing which category each pulse is in
        pulses will be assigned to the last category for which the condition evaluates to True"""
        dtype = pl.Enum(category_condition_dict.keys())
        physical = np.zeros(len(df), dtype=int)
        for category_int, (category_str, condition_expr) in enumerate(category_condition_dict.items()):
            if condition_expr is True or condition_expr.meta.eq(pl.lit(True)):
                in_category = np.ones(len(df), dtype=bool)
            else:
                in_category = df.select(condition_expr).fill_null(False).to_numpy().flatten()
            assert in_category.dtype == bool
            physical[in_category] = category_int
        series = pl.Series(name=output_col, values=physical).cast(dtype)
        df = pl.DataFrame({output_col: series})
        return df

    df2 = categorize_df(df, self.category_condition_dict, output_col).with_columns(df)
    return df2

ChangeTimeZoneStep dataclass

Bases: RecipeStep

Replace all polars Datetime type series in the dataframe with ones using the given time zone.

Alternatively, replace only the columns named in inputs if not an empty collection.

Usage:

ctzstep = mass2.core.ChangeTimeZoneStep.new("America/Chicago") ch2 = ch.with_step(ctzstep)

Source code in mass2/core/recipe.py
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
@dataclass(frozen=True)
class ChangeTimeZoneStep(RecipeStep):
    """Replace all polars `Datetime` type series in the dataframe with ones using the given time zone.

    Alternatively, replace only the columns named in `inputs` if not an empty collection.

    Usage:
    >>> ctzstep = mass2.core.ChangeTimeZoneStep.new("America/Chicago")
    >>> ch2 = ch.with_step(ctzstep)
    """

    new_time_zone: str

    def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
        "Change timezones for all `Datetime`-type series in `df`"

        # When there are no fields listed in self.inputs, convert all `pl.Datetime`-type columns
        if len(self.inputs) == 0:
            return df.with_columns(pl.col(pl.Datetime).dt.convert_time_zone(self.new_time_zone))

        def change_zone(col_name: str) -> pl.Expr:
            return pl.col(col_name).dt.convert_time_zone(self.new_time_zone)

        return df.with_columns([change_zone(col_name) for col_name in self.inputs])

    @classmethod
    def new(cls, new_time_zone: str, inputs: list[str] = []) -> "ChangeTimeZoneStep":
        """Create a ChangeTimeZoneStep

        Parameters
        ----------
        new_time_zone : str
            The time zone to change to
        inputs : list[str], optional
            the dataframe columns to change, by default [], which means all the columns of type Datetime

        Returns
        -------
        ChangeTimeZoneStep
            A RecipeStep for changing time zones.
        """
        return cls(
            inputs=inputs,
            output=[],
            good_expr=pl.lit(True),
            use_expr=pl.lit(True),
            new_time_zone=new_time_zone,
        )

calc_from_df(df)

Change timezones for all Datetime-type series in df

Source code in mass2/core/recipe.py
131
132
133
134
135
136
137
138
139
140
141
def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
    "Change timezones for all `Datetime`-type series in `df`"

    # When there are no fields listed in self.inputs, convert all `pl.Datetime`-type columns
    if len(self.inputs) == 0:
        return df.with_columns(pl.col(pl.Datetime).dt.convert_time_zone(self.new_time_zone))

    def change_zone(col_name: str) -> pl.Expr:
        return pl.col(col_name).dt.convert_time_zone(self.new_time_zone)

    return df.with_columns([change_zone(col_name) for col_name in self.inputs])

new(new_time_zone, inputs=[]) classmethod

Create a ChangeTimeZoneStep

Parameters:
  • new_time_zone (str) –

    The time zone to change to

  • inputs (list[str], default: [] ) –

    the dataframe columns to change, by default [], which means all the columns of type Datetime

Returns:
Source code in mass2/core/recipe.py
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
@classmethod
def new(cls, new_time_zone: str, inputs: list[str] = []) -> "ChangeTimeZoneStep":
    """Create a ChangeTimeZoneStep

    Parameters
    ----------
    new_time_zone : str
        The time zone to change to
    inputs : list[str], optional
        the dataframe columns to change, by default [], which means all the columns of type Datetime

    Returns
    -------
    ChangeTimeZoneStep
        A RecipeStep for changing time zones.
    """
    return cls(
        inputs=inputs,
        output=[],
        good_expr=pl.lit(True),
        use_expr=pl.lit(True),
        new_time_zone=new_time_zone,
    )

ColumnAsNumpyMapStep dataclass

Bases: RecipeStep

This step is meant for interactive exploration, it takes a column and applies a function to it, and makes a new column with the result. It makes it easy to test functions on a column without having to write a whole new step class, while maintaining the benefit of being able to use the step in a Recipe chain, like replaying steps on another channel.

example usage:

def my_function(x): ... return x * 2 step = ColumnAsNumpyMapStep(inputs=["my_column"], output=["my_new_column"], f=my_function) ch2 = ch.with_step(step)

Source code in mass2/core/recipe.py
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
@dataclass(frozen=True)
class ColumnAsNumpyMapStep(RecipeStep):
    """
    This step is meant for interactive exploration, it takes a column and applies a function to it,
    and makes a new column with the result. It makes it easy to test functions on a column without
    having to write a whole new step class,
    while maintaining the benefit of being able to use the step in a Recipe chain, like replaying steps
    on another channel.

    example usage:
    >>> def my_function(x):
    ...     return x * 2
    >>> step = ColumnAsNumpyMapStep(inputs=["my_column"], output=["my_new_column"], f=my_function)
    >>> ch2 = ch.with_step(step)
    """

    f: Callable[[np.ndarray], np.ndarray]

    def __post_init__(self) -> None:
        """Check that inputs and outputs are valid (single column each) and that `f` is a callable object."""
        assert len(self.inputs) == 1, "ColumnMapStep expects exactly one input"
        assert len(self.output) == 1, "ColumnMapStep expects exactly one output"
        if not callable(self.f):
            raise ValueError(f"f must be a callable, got {self.f}")

    def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
        """Calculate the new column by applying `f` to the input column, returning a new DataFrame."""
        output_col = self.output[0]
        output_segments = []
        for df_iter in df.select(self.inputs).iter_slices():
            series1 = df_iter[self.inputs[0]]
            # Have to apply the function differently when series elements are arrays vs scalars
            if series1.dtype.base_type() is pl.Array:
                output_numpy = np.array([self.f(v.to_numpy()) for v in series1])
            else:
                output_numpy = self.f(series1.to_numpy())
            this_output_segment = pl.Series(output_col, output_numpy)
            output_segments.append(this_output_segment)

        combined = pl.concat(output_segments)
        # Put into a DataFrame with one column
        df2 = pl.DataFrame({output_col: combined}).with_columns(df)
        return df2

__post_init__()

Check that inputs and outputs are valid (single column each) and that f is a callable object.

Source code in mass2/core/recipe.py
186
187
188
189
190
191
def __post_init__(self) -> None:
    """Check that inputs and outputs are valid (single column each) and that `f` is a callable object."""
    assert len(self.inputs) == 1, "ColumnMapStep expects exactly one input"
    assert len(self.output) == 1, "ColumnMapStep expects exactly one output"
    if not callable(self.f):
        raise ValueError(f"f must be a callable, got {self.f}")

calc_from_df(df)

Calculate the new column by applying f to the input column, returning a new DataFrame.

Source code in mass2/core/recipe.py
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
    """Calculate the new column by applying `f` to the input column, returning a new DataFrame."""
    output_col = self.output[0]
    output_segments = []
    for df_iter in df.select(self.inputs).iter_slices():
        series1 = df_iter[self.inputs[0]]
        # Have to apply the function differently when series elements are arrays vs scalars
        if series1.dtype.base_type() is pl.Array:
            output_numpy = np.array([self.f(v.to_numpy()) for v in series1])
        else:
            output_numpy = self.f(series1.to_numpy())
        this_output_segment = pl.Series(output_col, output_numpy)
        output_segments.append(this_output_segment)

    combined = pl.concat(output_segments)
    # Put into a DataFrame with one column
    df2 = pl.DataFrame({output_col: combined}).with_columns(df)
    return df2

PretrigMeanJumpFixStep dataclass

Bases: RecipeStep

A step to fix jumps in the pretrigger mean by unwrapping the phase angle, a periodic quantity.

Source code in mass2/core/recipe.py
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
@dataclass(frozen=True)
class PretrigMeanJumpFixStep(RecipeStep):
    """A step to fix jumps in the pretrigger mean by unwrapping the phase angle, a periodic quantity."""

    period: float

    def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
        """Calculate the jump-corrected pretrigger mean and return a new DataFrame."""
        ptm1 = df[self.inputs[0]].to_numpy()
        ptm2 = np.unwrap(ptm1 % self.period, period=self.period)
        df2 = pl.DataFrame({self.output[0]: ptm2}).with_columns(df)
        return df2

    def dbg_plot(self, df_after: pl.DataFrame, **kwargs: Any) -> plt.Axes:
        """Make a diagnostic plot of the pretrigger mean before and after the jump fix."""
        plt.figure()
        plt.plot(df_after["timestamp"], df_after[self.inputs[0]], ".", label=self.inputs[0], **kwargs)
        plt.plot(df_after["timestamp"], df_after[self.output[0]], ".", label=self.output[0], **kwargs)
        plt.legend()
        plt.xlabel("timestamp")
        plt.ylabel("pretrig mean")
        plt.tight_layout()
        return plt.gca()

calc_from_df(df)

Calculate the jump-corrected pretrigger mean and return a new DataFrame.

Source code in mass2/core/recipe.py
65
66
67
68
69
70
def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
    """Calculate the jump-corrected pretrigger mean and return a new DataFrame."""
    ptm1 = df[self.inputs[0]].to_numpy()
    ptm2 = np.unwrap(ptm1 % self.period, period=self.period)
    df2 = pl.DataFrame({self.output[0]: ptm2}).with_columns(df)
    return df2

dbg_plot(df_after, **kwargs)

Make a diagnostic plot of the pretrigger mean before and after the jump fix.

Source code in mass2/core/recipe.py
72
73
74
75
76
77
78
79
80
81
def dbg_plot(self, df_after: pl.DataFrame, **kwargs: Any) -> plt.Axes:
    """Make a diagnostic plot of the pretrigger mean before and after the jump fix."""
    plt.figure()
    plt.plot(df_after["timestamp"], df_after[self.inputs[0]], ".", label=self.inputs[0], **kwargs)
    plt.plot(df_after["timestamp"], df_after[self.output[0]], ".", label=self.output[0], **kwargs)
    plt.legend()
    plt.xlabel("timestamp")
    plt.ylabel("pretrig mean")
    plt.tight_layout()
    return plt.gca()

Recipe dataclass

Bases: Sequence[RecipeStep]

A sequence of RecipeStep objects to be applied in order to a DataFrame.

Source code in mass2/core/recipe.py
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
@dataclass(frozen=True)
class Recipe(Sequence[RecipeStep]):
    """A sequence of RecipeStep objects to be applied in order to a DataFrame."""

    steps: list[RecipeStep]

    # TODO: leaves many optimizations on the table, but is very simple
    # 1. we could calculate filt_value_5lag and filt_phase_5lag at the same time
    # 2. we could calculate intermediate quantities optionally and not materialize all of them

    def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
        "return a dataframe with all the newly calculated info"
        for step in self.steps:
            df = step.calc_from_df(df).with_columns(df)
        return df

    @classmethod
    def new_empty(cls) -> "Recipe":
        """Create a new empty Recipe."""
        return cls([])

    @overload
    def __getitem__(self, key: int) -> RecipeStep:
        """Return the step at a given index."""
        ...

    @overload
    def __getitem__(self, key: slice) -> Sequence[RecipeStep]:
        """Return the steps at a given slice of indices."""
        ...

    def __getitem__(self, key: int | slice) -> RecipeStep | Sequence[RecipeStep]:
        """Return the step at the given index, or the steps at a slice of steps."""
        return self.steps[key]

    def __len__(self) -> int:
        """Return the number of steps in the recipe."""
        return len(self.steps)

    def with_step(self, step: RecipeStep) -> "Recipe":
        """Create a new Recipe with the given step added to the end."""
        # return a new Recipe with the step added, no mutation!
        return Recipe(self.steps + [step])

    def trim_dead_ends(self, required_fields: Iterable[str] | str | None, drop_debug: bool = True) -> "Recipe":
        """Create a new Recipe object with all dead-end steps (and optionally also debug info) removed.

        The purpose is to replace the fully useful interactive Recipe with a trimmed-down object that can
        repeat the current steps as a "recipe" without having the extra information from which the recipe
        was first created. In one test, this method reduced the pickle file's size from 3.4 MB per channel
        to 30 kB per channel, or a 112x size reduction (with `drop_debug=True`).

        Dead-end steps are defined as any step that can be omitted without affecting the ability to
        compute any of the fields given in `required_fields`. The result of this method is to return
        a Recipe where any step is remove if it does not contribute to computing any of the `required_fields`
        (i.e., if it is a dead end).

        Examples of a dead end are typically steps used to prepare a tentative, intermediate calibration function.

        Parameters
        ----------
        required_fields : Iterable[str] | str | None
            Steps will be preserved if any of their outputs are among `required_fields`, or if their outputs are
            found recursively among the inputs to any such steps. If a string, treat as a list of that one string.
            If None, preserve all steps.

        drop_debug : bool
            Whether to run `step.drop_debug()` to remove debugging information from the preserved steps.

        Returns
        -------
        Recipe
            A copy of `self`, except that any steps not required to compute any of `required_fields` are omitted.
        """
        if isinstance(required_fields, str):
            required_fields = [required_fields]

        nsteps = len(self)
        required = np.zeros(nsteps, dtype=bool)

        # The easiest approach is to traverse the steps from last to first to build our list of required
        # fields, because necessarily no later step can produce the inputs needed by an earlier step.
        if required_fields is None:
            required[:] = True
        else:
            all_fields_out: set[str] = set(required_fields)
            for istep in range(nsteps - 1, -1, -1):
                step = self[istep]
                for field in step.output:
                    if field in all_fields_out:
                        required[istep] = True
                        all_fields_out.update(step.inputs)
                        break

        if not np.any(required):
            # If this error ever because a problem, where user _acutally_ wants an empty series of steps
            # to be a non-err, then add argument `error_on_empty_output=True` to this method.
            raise ValueError("trim_dead_ends found no steps to be preserved")

        steps = []
        for i in range(nsteps):
            if required[i]:
                if drop_debug:
                    steps.append(self[i].drop_debug())
                else:
                    steps.append(self[i])
        return Recipe(steps)

    def trim_debug_info(self) -> "Recipe":
        """Create a new Recipe object with all debug info removed from each step, but no steps removed.

        The following are exactly equivalent:
        >>> recipe2 = recipe.trim_debug_info()
        >>> recipe2 = recipe.trim_dead_ends(required_fields=None, drop_debug=True)

        Returns
        -------
        Recipe
            A copy of `self`, except that `drop_debug()` is called to replace each step with a less-baggage
            version of that step.
        """
        slim_steps = []
        for step in self:
            slim_steps.append(step.drop_debug())
        return Recipe(slim_steps)

__getitem__(key)

__getitem__(key: int) -> RecipeStep
__getitem__(key: slice) -> Sequence[RecipeStep]

Return the step at the given index, or the steps at a slice of steps.

Source code in mass2/core/recipe.py
295
296
297
def __getitem__(self, key: int | slice) -> RecipeStep | Sequence[RecipeStep]:
    """Return the step at the given index, or the steps at a slice of steps."""
    return self.steps[key]

__len__()

Return the number of steps in the recipe.

Source code in mass2/core/recipe.py
299
300
301
def __len__(self) -> int:
    """Return the number of steps in the recipe."""
    return len(self.steps)

calc_from_df(df)

return a dataframe with all the newly calculated info

Source code in mass2/core/recipe.py
274
275
276
277
278
def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
    "return a dataframe with all the newly calculated info"
    for step in self.steps:
        df = step.calc_from_df(df).with_columns(df)
    return df

new_empty() classmethod

Create a new empty Recipe.

Source code in mass2/core/recipe.py
280
281
282
283
@classmethod
def new_empty(cls) -> "Recipe":
    """Create a new empty Recipe."""
    return cls([])

trim_dead_ends(required_fields, drop_debug=True)

Create a new Recipe object with all dead-end steps (and optionally also debug info) removed.

The purpose is to replace the fully useful interactive Recipe with a trimmed-down object that can repeat the current steps as a "recipe" without having the extra information from which the recipe was first created. In one test, this method reduced the pickle file's size from 3.4 MB per channel to 30 kB per channel, or a 112x size reduction (with drop_debug=True).

Dead-end steps are defined as any step that can be omitted without affecting the ability to compute any of the fields given in required_fields. The result of this method is to return a Recipe where any step is remove if it does not contribute to computing any of the required_fields (i.e., if it is a dead end).

Examples of a dead end are typically steps used to prepare a tentative, intermediate calibration function.

Parameters:
  • required_fields (Iterable[str] | str | None) –

    Steps will be preserved if any of their outputs are among required_fields, or if their outputs are found recursively among the inputs to any such steps. If a string, treat as a list of that one string. If None, preserve all steps.

  • drop_debug (bool, default: True ) –

    Whether to run step.drop_debug() to remove debugging information from the preserved steps.

Returns:
  • Recipe –

    A copy of self, except that any steps not required to compute any of required_fields are omitted.

Source code in mass2/core/recipe.py
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
def trim_dead_ends(self, required_fields: Iterable[str] | str | None, drop_debug: bool = True) -> "Recipe":
    """Create a new Recipe object with all dead-end steps (and optionally also debug info) removed.

    The purpose is to replace the fully useful interactive Recipe with a trimmed-down object that can
    repeat the current steps as a "recipe" without having the extra information from which the recipe
    was first created. In one test, this method reduced the pickle file's size from 3.4 MB per channel
    to 30 kB per channel, or a 112x size reduction (with `drop_debug=True`).

    Dead-end steps are defined as any step that can be omitted without affecting the ability to
    compute any of the fields given in `required_fields`. The result of this method is to return
    a Recipe where any step is remove if it does not contribute to computing any of the `required_fields`
    (i.e., if it is a dead end).

    Examples of a dead end are typically steps used to prepare a tentative, intermediate calibration function.

    Parameters
    ----------
    required_fields : Iterable[str] | str | None
        Steps will be preserved if any of their outputs are among `required_fields`, or if their outputs are
        found recursively among the inputs to any such steps. If a string, treat as a list of that one string.
        If None, preserve all steps.

    drop_debug : bool
        Whether to run `step.drop_debug()` to remove debugging information from the preserved steps.

    Returns
    -------
    Recipe
        A copy of `self`, except that any steps not required to compute any of `required_fields` are omitted.
    """
    if isinstance(required_fields, str):
        required_fields = [required_fields]

    nsteps = len(self)
    required = np.zeros(nsteps, dtype=bool)

    # The easiest approach is to traverse the steps from last to first to build our list of required
    # fields, because necessarily no later step can produce the inputs needed by an earlier step.
    if required_fields is None:
        required[:] = True
    else:
        all_fields_out: set[str] = set(required_fields)
        for istep in range(nsteps - 1, -1, -1):
            step = self[istep]
            for field in step.output:
                if field in all_fields_out:
                    required[istep] = True
                    all_fields_out.update(step.inputs)
                    break

    if not np.any(required):
        # If this error ever because a problem, where user _acutally_ wants an empty series of steps
        # to be a non-err, then add argument `error_on_empty_output=True` to this method.
        raise ValueError("trim_dead_ends found no steps to be preserved")

    steps = []
    for i in range(nsteps):
        if required[i]:
            if drop_debug:
                steps.append(self[i].drop_debug())
            else:
                steps.append(self[i])
    return Recipe(steps)

trim_debug_info()

Create a new Recipe object with all debug info removed from each step, but no steps removed.

The following are exactly equivalent:

recipe2 = recipe.trim_debug_info() recipe2 = recipe.trim_dead_ends(required_fields=None, drop_debug=True)

Returns:
  • Recipe –

    A copy of self, except that drop_debug() is called to replace each step with a less-baggage version of that step.

Source code in mass2/core/recipe.py
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
def trim_debug_info(self) -> "Recipe":
    """Create a new Recipe object with all debug info removed from each step, but no steps removed.

    The following are exactly equivalent:
    >>> recipe2 = recipe.trim_debug_info()
    >>> recipe2 = recipe.trim_dead_ends(required_fields=None, drop_debug=True)

    Returns
    -------
    Recipe
        A copy of `self`, except that `drop_debug()` is called to replace each step with a less-baggage
        version of that step.
    """
    slim_steps = []
    for step in self:
        slim_steps.append(step.drop_debug())
    return Recipe(slim_steps)

with_step(step)

Create a new Recipe with the given step added to the end.

Source code in mass2/core/recipe.py
303
304
305
306
def with_step(self, step: RecipeStep) -> "Recipe":
    """Create a new Recipe with the given step added to the end."""
    # return a new Recipe with the step added, no mutation!
    return Recipe(self.steps + [step])

RecipeStep dataclass

Bases: ABC

Represent one step in a data processing recipe.

A step has inputs, outputs, and a calculation method. It also has a good_expr and use_expr that can be used to filter the data before processing.

This is an abstract base class, subclasses should implement calc_from_df and dbg_plot.

Source code in mass2/core/recipe.py
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
@dataclass(frozen=True)
class RecipeStep(ABC):
    """Represent one step in a data processing recipe.

    A step has inputs, outputs, and a calculation method. It also has a good_expr and use_expr that
    can be used to filter the data before processing.

    This is an abstract base class, subclasses should implement calc_from_df and dbg_plot.
    """

    inputs: list[str]
    output: list[str]
    good_expr: pl.Expr
    use_expr: pl.Expr

    @property
    def name(self) -> str:
        """The name of this step, usually the class name."""
        return str(type(self))

    @property
    def description(self) -> str:
        """A short description of this step, including its inputs and outputs."""
        return f"{type(self).__name__} inputs={self.inputs} outputs={self.output}"

    @abstractmethod
    def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
        """Calculate the outputs from the inputs in the given DataFrame, returning a new DataFrame."""
        # A simplest possible implementation would be something like:
        # return df.filter(self.good_expr)
        pass

    def dbg_plot(self, df_after: pl.DataFrame, **kwargs: Any) -> plt.Axes:
        """Generate a diagnostic plot of the results after this step."""
        # this is a no-op, subclasses can override this to plot something
        plt.figure()
        plt.text(0.0, 0.5, f"No plot defined for: {self.description}")
        return plt.gca()

    def drop_debug(self) -> "RecipeStep":
        "Return self, or a copy of it with debug information removed"
        return self

description property

A short description of this step, including its inputs and outputs.

name property

The name of this step, usually the class name.

calc_from_df(df) abstractmethod

Calculate the outputs from the inputs in the given DataFrame, returning a new DataFrame.

Source code in mass2/core/recipe.py
40
41
42
43
44
45
@abstractmethod
def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
    """Calculate the outputs from the inputs in the given DataFrame, returning a new DataFrame."""
    # A simplest possible implementation would be something like:
    # return df.filter(self.good_expr)
    pass

dbg_plot(df_after, **kwargs)

Generate a diagnostic plot of the results after this step.

Source code in mass2/core/recipe.py
47
48
49
50
51
52
def dbg_plot(self, df_after: pl.DataFrame, **kwargs: Any) -> plt.Axes:
    """Generate a diagnostic plot of the results after this step."""
    # this is a no-op, subclasses can override this to plot something
    plt.figure()
    plt.text(0.0, 0.5, f"No plot defined for: {self.description}")
    return plt.gca()

drop_debug()

Return self, or a copy of it with debug information removed

Source code in mass2/core/recipe.py
54
55
56
def drop_debug(self) -> "RecipeStep":
    "Return self, or a copy of it with debug information removed"
    return self

SelectStep dataclass

Bases: RecipeStep

This step is meant for interactive exploration, it's basically like the df.select() method, but it's saved as a step.

Source code in mass2/core/recipe.py
250
251
252
253
254
255
256
257
258
259
260
261
@dataclass(frozen=True)
class SelectStep(RecipeStep):
    """
    This step is meant for interactive exploration, it's basically like the df.select() method, but it's saved as a step.
    """

    col_expr_dict: dict[str, pl.Expr]

    def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
        """Select the given columns and return a new DataFrame."""
        df2 = df.select(**self.col_expr_dict).with_columns(df)
        return df2

calc_from_df(df)

Select the given columns and return a new DataFrame.

Source code in mass2/core/recipe.py
258
259
260
261
def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
    """Select the given columns and return a new DataFrame."""
    df2 = df.select(**self.col_expr_dict).with_columns(df)
    return df2

SummarizeStep dataclass

Bases: RecipeStep

Summarize raw pulse data into summary statistics using numba-accelerated code.

Source code in mass2/core/recipe.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
@dataclass(frozen=True)
class SummarizeStep(RecipeStep):
    """Summarize raw pulse data into summary statistics using numba-accelerated code."""

    frametime_s: float
    peak_index: int
    pulse_col: str
    pretrigger_ignore_samples: int
    n_presamples: int
    transform_raw: Callable | None = None

    def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
        """Calculate the summary statistics and return a new DataFrame."""
        summaries = []
        for df_iter in df.select(self.inputs).iter_slices():
            raw = df_iter[self.pulse_col].to_numpy()
            if self.transform_raw is not None:
                raw = self.transform_raw(raw)

            s = pl.from_numpy(
                pulse_algorithms.summarize_data_numba(
                    raw,
                    self.frametime_s,
                    peak_samplenumber=self.peak_index,
                    pretrigger_ignore_samples=self.pretrigger_ignore_samples,
                    nPresamples=self.n_presamples,
                )
            )
            summaries.append(s)

        df2 = pl.concat(summaries).with_columns(df)
        return df2

calc_from_df(df)

Calculate the summary statistics and return a new DataFrame.

Source code in mass2/core/recipe.py
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
def calc_from_df(self, df: pl.DataFrame) -> pl.DataFrame:
    """Calculate the summary statistics and return a new DataFrame."""
    summaries = []
    for df_iter in df.select(self.inputs).iter_slices():
        raw = df_iter[self.pulse_col].to_numpy()
        if self.transform_raw is not None:
            raw = self.transform_raw(raw)

        s = pl.from_numpy(
            pulse_algorithms.summarize_data_numba(
                raw,
                self.frametime_s,
                peak_samplenumber=self.peak_index,
                pretrigger_ignore_samples=self.pretrigger_ignore_samples,
                nPresamples=self.n_presamples,
            )
        )
        summaries.append(s)

    df2 = pl.concat(summaries).with_columns(df)
    return df2

Tools for rough calibration of pulse heights to energies

BestAssignmentPfitGainResult dataclass

Result of finding the best assignment of pulse heights to energies and fitting a polynomial gain curve.

Source code in mass2/core/rough_cal.py
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
@dataclass(frozen=True)
class BestAssignmentPfitGainResult:
    """Result of finding the best assignment of pulse heights to energies and fitting a polynomial gain curve."""

    rms_residual: float
    ph_assigned: np.ndarray
    residual_e: np.ndarray | None
    assignment_inds: np.ndarray | None
    pfit_gain: np.polynomial.Polynomial
    energy_target: np.ndarray
    names_target: list[str]  # list of strings with names for the energies in energy_target
    ph_target: np.ndarray  # longer than energy target by 0-3

    def ph_unassigned(self) -> ndarray:
        """Which pulse heights were not assigned to any energy."""
        return np.array(list(set(self.ph_target) - set(self.ph_assigned)))

    def plot(self, ax: Axes | None = None) -> None:
        """Make a diagnostic plot of the gain fit."""
        if ax is None:
            plt.figure()
            ax = plt.gca()
        gain = self.ph_assigned / self.energy_target
        ax.plot(self.ph_assigned, self.ph_assigned / self.energy_target, "o")
        ph_large_range = np.linspace(0, self.ph_assigned[-1] * 1.1, 51)
        ax.plot(ph_large_range, self.pfit_gain(ph_large_range))
        ax.set_xlabel("pulse_height")
        ax.set_ylabel("gain")
        ax.set_title(f"BestAssignmentPfitGainResult rms_residual={self.rms_residual:.2f} eV")
        assert len(self.names_target) == len(self.ph_assigned)
        for name, x, y in zip(self.names_target, self.ph_assigned, gain):
            ax.annotate(str(name), (x, y))

    def phzerogain(self) -> float:
        """Find the pulse height where the gain goes to zero.
        Quadratic fits should have two roots, we want the positive one; if they are complex, choose the real part."""
        # the pulse height at which the gain is zero
        # for now I'm counting on the roots being ordered, we want the positive root where gain goes zero
        # since our function is invalid outside that range
        if self.pfit_gain.degree() == 2:
            return np.real(self.pfit_gain.roots()[1])
        elif self.pfit_gain.degree() == 1:
            return self.pfit_gain.roots()[0]
        else:
            raise ValueError()

    def ph2energy(self, ph: ndarray | float) -> float | ndarray:
        """Convert pulse height to energy using the fitted gain curve."""
        return ph / self.pfit_gain(ph)

    def energy2ph(self, energy: ArrayLike) -> NDArray:
        """Invert the gain curve to convert energy to pulse height."""
        if self.pfit_gain.degree() == 2:
            return self._energy2ph_deg2(energy)
        elif self.pfit_gain.degree() == 1:
            return self._energy2ph_deg1(energy)
        elif self.pfit_gain.degree() == 0:
            return self._energy2ph_deg0(energy)
        else:
            raise Exception("degree out of range")

    def _energy2ph_deg2(self, energy: ArrayLike) -> NDArray:
        """Invert a 2nd degree polynomial gain curve to convert energy to pulse height."""
        # ph2energy is equivalent to this with y=energy, x=ph
        # y = x/(c + b*x + a*x^2)
        # so
        # y*c + (y*b-1)*x + a*x^2 = 0
        # and given that we've selected for well formed calibrations,
        # we know which root we want
        cba = self.pfit_gain.convert().coef
        c, bb, a = cba * np.asarray(energy)
        b = bb - 1
        ph = (-b - np.sqrt(b**2 - 4 * a * c)) / (2 * a)
        return ph

    def _energy2ph_deg1(self, energy: ArrayLike) -> NDArray:
        """Invert a 1st degree polynomial gain curve to convert energy to pulse height."""
        # ph2energy is equivalent to this with y=energy, x=ph
        # y = x/(b + a*x)
        # so
        # x = y*b/(1-y*a)
        # and given that we've selected for well formed calibrations,
        # we know which root we want
        b, a = self.pfit_gain.convert().coef
        y = np.asarray(energy)
        ph = y * b / (1 - y * a)
        return ph

    def _energy2ph_deg0(self, energy: ArrayLike) -> NDArray:
        """Invert a 0th degree polynomial gain curve to convert energy to pulse height."""
        # ph2energy is equivalent to this with y=energy, x=ph
        # y = x/(a)
        # so
        # x = y*a
        (a,) = self.pfit_gain.convert().coef
        y = np.asarray(energy)
        ph = y * a
        return ph

    def predicted_energies(self) -> NDArray | float:
        """Convert the assigned pulse heights to energies using the fitted gain curve."""
        return self.ph2energy(self.ph_assigned)

energy2ph(energy)

Invert the gain curve to convert energy to pulse height.

Source code in mass2/core/rough_cal.py
136
137
138
139
140
141
142
143
144
145
def energy2ph(self, energy: ArrayLike) -> NDArray:
    """Invert the gain curve to convert energy to pulse height."""
    if self.pfit_gain.degree() == 2:
        return self._energy2ph_deg2(energy)
    elif self.pfit_gain.degree() == 1:
        return self._energy2ph_deg1(energy)
    elif self.pfit_gain.degree() == 0:
        return self._energy2ph_deg0(energy)
    else:
        raise Exception("degree out of range")

ph2energy(ph)

Convert pulse height to energy using the fitted gain curve.

Source code in mass2/core/rough_cal.py
132
133
134
def ph2energy(self, ph: ndarray | float) -> float | ndarray:
    """Convert pulse height to energy using the fitted gain curve."""
    return ph / self.pfit_gain(ph)

ph_unassigned()

Which pulse heights were not assigned to any energy.

Source code in mass2/core/rough_cal.py
 99
100
101
def ph_unassigned(self) -> ndarray:
    """Which pulse heights were not assigned to any energy."""
    return np.array(list(set(self.ph_target) - set(self.ph_assigned)))

phzerogain()

Find the pulse height where the gain goes to zero. Quadratic fits should have two roots, we want the positive one; if they are complex, choose the real part.

Source code in mass2/core/rough_cal.py
119
120
121
122
123
124
125
126
127
128
129
130
def phzerogain(self) -> float:
    """Find the pulse height where the gain goes to zero.
    Quadratic fits should have two roots, we want the positive one; if they are complex, choose the real part."""
    # the pulse height at which the gain is zero
    # for now I'm counting on the roots being ordered, we want the positive root where gain goes zero
    # since our function is invalid outside that range
    if self.pfit_gain.degree() == 2:
        return np.real(self.pfit_gain.roots()[1])
    elif self.pfit_gain.degree() == 1:
        return self.pfit_gain.roots()[0]
    else:
        raise ValueError()

plot(ax=None)

Make a diagnostic plot of the gain fit.

Source code in mass2/core/rough_cal.py
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
def plot(self, ax: Axes | None = None) -> None:
    """Make a diagnostic plot of the gain fit."""
    if ax is None:
        plt.figure()
        ax = plt.gca()
    gain = self.ph_assigned / self.energy_target
    ax.plot(self.ph_assigned, self.ph_assigned / self.energy_target, "o")
    ph_large_range = np.linspace(0, self.ph_assigned[-1] * 1.1, 51)
    ax.plot(ph_large_range, self.pfit_gain(ph_large_range))
    ax.set_xlabel("pulse_height")
    ax.set_ylabel("gain")
    ax.set_title(f"BestAssignmentPfitGainResult rms_residual={self.rms_residual:.2f} eV")
    assert len(self.names_target) == len(self.ph_assigned)
    for name, x, y in zip(self.names_target, self.ph_assigned, gain):
        ax.annotate(str(name), (x, y))

predicted_energies()

Convert the assigned pulse heights to energies using the fitted gain curve.

Source code in mass2/core/rough_cal.py
185
186
187
def predicted_energies(self) -> NDArray | float:
    """Convert the assigned pulse heights to energies using the fitted gain curve."""
    return self.ph2energy(self.ph_assigned)

RoughCalibrationStep dataclass

Bases: RecipeStep

A step to perform a rough calibration of pulse heights to energies.

Source code in mass2/core/rough_cal.py
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
@dataclass(frozen=True)
class RoughCalibrationStep(RecipeStep):
    """A step to perform a rough calibration of pulse heights to energies."""

    pfresult: SmoothedLocalMaximaResult | None
    assignment_result: BestAssignmentPfitGainResult | None
    ph2energy: Callable
    success: bool

    def calc_from_df(self, df: DataFrame) -> DataFrame:
        """Apply the rough calibration to a dataframe."""
        # only works with in memory data, but just takes it as numpy data and calls function
        # is much faster than map_elements approach, but wouldn't work with out of core data without some extra book keeping
        inputs_np = [df[input].to_numpy() for input in self.inputs]
        out = self.ph2energy(inputs_np[0])
        df2 = pl.DataFrame({self.output[0]: out}).with_columns(df)
        return df2

    def drop_debug(self) -> "RoughCalibrationStep":
        """Return a copy of this step with debug information removed."""
        return dataclasses.replace(self, pfresult=None, assignment_result=None)

    def dbg_plot(self, df_after: pl.DataFrame, **kwargs: Any) -> None:
        """Create diagnostic plots of the rough calibration step."""
        if self.success:
            self.dbg_plot_success(df_after, **kwargs)
        else:
            self.dbg_plot_failure(df_after, **kwargs)

    def dbg_plot_success(self, df: DataFrame, **kwargs: Any) -> None:
        """Create diagnostic plots of the rough calibration step, if it succeeded."""
        _, axs = plt.subplots(2, 1, figsize=(11, 6))
        if self.assignment_result:
            self.assignment_result.plot(ax=axs[0])
        if self.pfresult:
            self.pfresult.plot(self.assignment_result, ax=axs[1])
        plt.tight_layout()

    def dbg_plot_failure(self, df: DataFrame, **kwargs: None) -> None:
        """Create diagnostic plots of the rough calibration step, if it failed."""
        _, axs = plt.subplots(2, 1, figsize=(11, 6))
        if self.pfresult:
            self.pfresult.plot(self.assignment_result, ax=axs[1])
        plt.tight_layout()

    def energy2ph(self, energy: ArrayLike) -> NDArray | float:
        """Convert energy to pulse height using the fitted gain curve."""
        if self.assignment_result:
            return self.assignment_result.energy2ph(energy)
        return 0.0

    @classmethod
    def learn_combinatoric(
        cls,
        ch: Channel,
        line_names: list[str],
        uncalibrated_col: str,
        calibrated_col: str,
        ph_smoothing_fwhm: float,
        n_extra: int,
        use_expr: pl.Expr = field(default_factory=alwaysTrue),
    ) -> "RoughCalibrationStep":
        """Train a rough calibration step by finding peaks in a smoothed histogram of pulse heights,
        and assigning them to known energies in a way that minimizes the RMS error in a local linearity test."""
        (names, ee) = line_names_and_energies(line_names)
        uncalibrated = ch.good_series(uncalibrated_col, use_expr=use_expr).to_numpy()
        assert len(uncalibrated) > 10, "not enough pulses"
        pfresult = peakfind_local_maxima_of_smoothed_hist(uncalibrated, fwhm_pulse_height_units=ph_smoothing_fwhm)
        assignment_result = find_optimal_assignment2(pfresult.ph_sorted_by_prominence()[: len(ee) + n_extra], ee, names)
        # phzerogain doesn't exist if there is only one line, and it might make no sense even if it does.
        good_expr_with_new_info = ch.good_expr
        if len(line_names) > 1:
            # Fix issue #95: don't cut pulses exceeding max_ph if that value is negative or cuts most pulses.
            # exclude pulses with values where the gain is negative
            max_ph = assignment_result.phzerogain()
            if max_ph > 0 and max_ph > np.median(uncalibrated):
                good_expr_with_new_info = ch.good_expr.and_(pl.col(uncalibrated_col) < max_ph)

        step = cls(
            [uncalibrated_col],
            [calibrated_col],
            good_expr_with_new_info,
            use_expr=use_expr,
            pfresult=pfresult,
            assignment_result=assignment_result,
            ph2energy=assignment_result.ph2energy,
            success=True,
        )
        return step

    @classmethod
    def learn_combinatoric_height_info(
        cls,
        ch: Channel,
        line_names: list[str],
        line_heights_allowed: list[list[int]],
        uncalibrated_col: str,
        calibrated_col: str,
        ph_smoothing_fwhm: float,
        n_extra: int,
        use_expr: pl.Expr = field(default_factory=alwaysTrue),
    ) -> "RoughCalibrationStep":
        """Train a rough calibration step by finding peaks in a smoothed histogram of pulse heights,
        and assigning them to known energies in a way that minimizes the RMS error in a local linearity test,
        while respecting constraints on which pulse heights can be assigned to which energies."""
        (names, ee) = line_names_and_energies(line_names)
        uncalibrated = ch.good_series(uncalibrated_col, use_expr=use_expr).to_numpy()
        assert len(uncalibrated) > 10, "not enough pulses"
        pfresult = peakfind_local_maxima_of_smoothed_hist(uncalibrated, fwhm_pulse_height_units=ph_smoothing_fwhm)
        assignment_result = find_optimal_assignment2_height_info(
            pfresult.ph_sorted_by_prominence()[: len(ee) + n_extra],
            ee,
            names,
            line_heights_allowed,
        )

        step = cls(
            [uncalibrated_col],
            [calibrated_col],
            ch.good_expr,
            use_expr=use_expr,
            pfresult=pfresult,
            assignment_result=assignment_result,
            ph2energy=assignment_result.ph2energy,
            success=True,
        )
        return step

    @classmethod
    def learn_3peak(  # noqa: PLR0917 PLR0914,
        cls,
        ch: Channel,
        line_names: list[str | float],
        uncalibrated_col: str = "filtValue",
        calibrated_col: str | None = None,
        use_expr: pl.Expr = field(default_factory=alwaysTrue),
        max_fractional_energy_error_3rd_assignment: float = 0.1,
        min_gain_fraction_at_ph_30k: float = 0.25,
        fwhm_pulse_height_units: float = 75,
        n_extra_peaks: int = 10,
        acceptable_rms_residual_e: float = 10,
    ) -> "RoughCalibrationStep":
        """Train a rough calibration step by finding peaks in a smoothed histogram of pulse heights,
        and assigning 3 of them to known energies in a way that minimizes the RMS error in a local linearity test,
        and then evaluating that assignment by fitting a 2nd degree polynomial gain curve to all possible pulse heights
        and returning the RMS error in energy after applying that gain curve to all possible pulse heights.
        If no good assignment is found, the step will be marked as unsuccessful."""
        if calibrated_col is None:
            calibrated_col = f"energy_{uncalibrated_col}"
        (line_names_str, line_energies_list) = line_names_and_energies(line_names)
        line_energies = np.asarray(line_energies_list)
        uncalibrated = ch.good_series(uncalibrated_col, use_expr=use_expr).to_numpy()
        pfresult = peakfind_local_maxima_of_smoothed_hist(uncalibrated, fwhm_pulse_height_units=fwhm_pulse_height_units)
        possible_phs = pfresult.ph_sorted_by_prominence()[: len(line_names_str) + n_extra_peaks]
        df3peak, _dfe = rank_3peak_assignments(
            possible_phs,
            line_energies,
            line_names_str,
            max_fractional_energy_error_3rd_assignment,
            min_gain_fraction_at_ph_30k,
        )
        best_rms_residual = np.inf
        best_assignment_result = None
        for assignment_row in df3peak.select("e0", "ph0", "e1", "ph1", "e2", "ph2", "e_err_at_ph2").iter_rows():
            e0, ph0, e1, ph1, e2, ph2, _e_err_at_ph2 = assignment_row
            pharray = np.array([ph0, ph1, ph2])
            earray = np.array([e0, e1, e2])
            rms_residual, assignment_result = eval_3peak_assignment_pfit_gain(
                pharray, earray, possible_phs, line_energies, line_names_str
            )
            if rms_residual < best_rms_residual:
                best_rms_residual = rms_residual
                best_assignment_result = assignment_result
                if rms_residual < acceptable_rms_residual_e:
                    break
        if (
            best_assignment_result
            and isinstance(best_assignment_result, BestAssignmentPfitGainResult)
            and not np.isinf(best_rms_residual)
        ):
            success = True
            ph2energy = best_assignment_result.ph2energy
            # df3peak_on_failure = None
        else:
            success = False

            def nanenergy(ph: NDArray | float) -> NDArray | float:
                "Return NaN for all pulse heights, indicating failure to calibrate."
                return ph * np.nan

            ph2energy = nanenergy
            # df3peak_on_failure = df3peak
        # df3peak_on_failure = df3peak

        if isinstance(best_assignment_result, str):
            best_assignment_result = None
        step = cls(
            [uncalibrated_col],
            [calibrated_col],
            ch.good_expr,
            use_expr=use_expr,
            pfresult=pfresult,
            assignment_result=best_assignment_result,
            ph2energy=ph2energy,
            success=success,
        )
        return step

calc_from_df(df)

Apply the rough calibration to a dataframe.

Source code in mass2/core/rough_cal.py
759
760
761
762
763
764
765
766
def calc_from_df(self, df: DataFrame) -> DataFrame:
    """Apply the rough calibration to a dataframe."""
    # only works with in memory data, but just takes it as numpy data and calls function
    # is much faster than map_elements approach, but wouldn't work with out of core data without some extra book keeping
    inputs_np = [df[input].to_numpy() for input in self.inputs]
    out = self.ph2energy(inputs_np[0])
    df2 = pl.DataFrame({self.output[0]: out}).with_columns(df)
    return df2

dbg_plot(df_after, **kwargs)

Create diagnostic plots of the rough calibration step.

Source code in mass2/core/rough_cal.py
772
773
774
775
776
777
def dbg_plot(self, df_after: pl.DataFrame, **kwargs: Any) -> None:
    """Create diagnostic plots of the rough calibration step."""
    if self.success:
        self.dbg_plot_success(df_after, **kwargs)
    else:
        self.dbg_plot_failure(df_after, **kwargs)

dbg_plot_failure(df, **kwargs)

Create diagnostic plots of the rough calibration step, if it failed.

Source code in mass2/core/rough_cal.py
788
789
790
791
792
793
def dbg_plot_failure(self, df: DataFrame, **kwargs: None) -> None:
    """Create diagnostic plots of the rough calibration step, if it failed."""
    _, axs = plt.subplots(2, 1, figsize=(11, 6))
    if self.pfresult:
        self.pfresult.plot(self.assignment_result, ax=axs[1])
    plt.tight_layout()

dbg_plot_success(df, **kwargs)

Create diagnostic plots of the rough calibration step, if it succeeded.

Source code in mass2/core/rough_cal.py
779
780
781
782
783
784
785
786
def dbg_plot_success(self, df: DataFrame, **kwargs: Any) -> None:
    """Create diagnostic plots of the rough calibration step, if it succeeded."""
    _, axs = plt.subplots(2, 1, figsize=(11, 6))
    if self.assignment_result:
        self.assignment_result.plot(ax=axs[0])
    if self.pfresult:
        self.pfresult.plot(self.assignment_result, ax=axs[1])
    plt.tight_layout()

drop_debug()

Return a copy of this step with debug information removed.

Source code in mass2/core/rough_cal.py
768
769
770
def drop_debug(self) -> "RoughCalibrationStep":
    """Return a copy of this step with debug information removed."""
    return dataclasses.replace(self, pfresult=None, assignment_result=None)

energy2ph(energy)

Convert energy to pulse height using the fitted gain curve.

Source code in mass2/core/rough_cal.py
795
796
797
798
799
def energy2ph(self, energy: ArrayLike) -> NDArray | float:
    """Convert energy to pulse height using the fitted gain curve."""
    if self.assignment_result:
        return self.assignment_result.energy2ph(energy)
    return 0.0

learn_3peak(ch, line_names, uncalibrated_col='filtValue', calibrated_col=None, use_expr=field(default_factory=alwaysTrue), max_fractional_energy_error_3rd_assignment=0.1, min_gain_fraction_at_ph_30k=0.25, fwhm_pulse_height_units=75, n_extra_peaks=10, acceptable_rms_residual_e=10) classmethod

Train a rough calibration step by finding peaks in a smoothed histogram of pulse heights, and assigning 3 of them to known energies in a way that minimizes the RMS error in a local linearity test, and then evaluating that assignment by fitting a 2nd degree polynomial gain curve to all possible pulse heights and returning the RMS error in energy after applying that gain curve to all possible pulse heights. If no good assignment is found, the step will be marked as unsuccessful.

Source code in mass2/core/rough_cal.py
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
@classmethod
def learn_3peak(  # noqa: PLR0917 PLR0914,
    cls,
    ch: Channel,
    line_names: list[str | float],
    uncalibrated_col: str = "filtValue",
    calibrated_col: str | None = None,
    use_expr: pl.Expr = field(default_factory=alwaysTrue),
    max_fractional_energy_error_3rd_assignment: float = 0.1,
    min_gain_fraction_at_ph_30k: float = 0.25,
    fwhm_pulse_height_units: float = 75,
    n_extra_peaks: int = 10,
    acceptable_rms_residual_e: float = 10,
) -> "RoughCalibrationStep":
    """Train a rough calibration step by finding peaks in a smoothed histogram of pulse heights,
    and assigning 3 of them to known energies in a way that minimizes the RMS error in a local linearity test,
    and then evaluating that assignment by fitting a 2nd degree polynomial gain curve to all possible pulse heights
    and returning the RMS error in energy after applying that gain curve to all possible pulse heights.
    If no good assignment is found, the step will be marked as unsuccessful."""
    if calibrated_col is None:
        calibrated_col = f"energy_{uncalibrated_col}"
    (line_names_str, line_energies_list) = line_names_and_energies(line_names)
    line_energies = np.asarray(line_energies_list)
    uncalibrated = ch.good_series(uncalibrated_col, use_expr=use_expr).to_numpy()
    pfresult = peakfind_local_maxima_of_smoothed_hist(uncalibrated, fwhm_pulse_height_units=fwhm_pulse_height_units)
    possible_phs = pfresult.ph_sorted_by_prominence()[: len(line_names_str) + n_extra_peaks]
    df3peak, _dfe = rank_3peak_assignments(
        possible_phs,
        line_energies,
        line_names_str,
        max_fractional_energy_error_3rd_assignment,
        min_gain_fraction_at_ph_30k,
    )
    best_rms_residual = np.inf
    best_assignment_result = None
    for assignment_row in df3peak.select("e0", "ph0", "e1", "ph1", "e2", "ph2", "e_err_at_ph2").iter_rows():
        e0, ph0, e1, ph1, e2, ph2, _e_err_at_ph2 = assignment_row
        pharray = np.array([ph0, ph1, ph2])
        earray = np.array([e0, e1, e2])
        rms_residual, assignment_result = eval_3peak_assignment_pfit_gain(
            pharray, earray, possible_phs, line_energies, line_names_str
        )
        if rms_residual < best_rms_residual:
            best_rms_residual = rms_residual
            best_assignment_result = assignment_result
            if rms_residual < acceptable_rms_residual_e:
                break
    if (
        best_assignment_result
        and isinstance(best_assignment_result, BestAssignmentPfitGainResult)
        and not np.isinf(best_rms_residual)
    ):
        success = True
        ph2energy = best_assignment_result.ph2energy
        # df3peak_on_failure = None
    else:
        success = False

        def nanenergy(ph: NDArray | float) -> NDArray | float:
            "Return NaN for all pulse heights, indicating failure to calibrate."
            return ph * np.nan

        ph2energy = nanenergy
        # df3peak_on_failure = df3peak
    # df3peak_on_failure = df3peak

    if isinstance(best_assignment_result, str):
        best_assignment_result = None
    step = cls(
        [uncalibrated_col],
        [calibrated_col],
        ch.good_expr,
        use_expr=use_expr,
        pfresult=pfresult,
        assignment_result=best_assignment_result,
        ph2energy=ph2energy,
        success=success,
    )
    return step

learn_combinatoric(ch, line_names, uncalibrated_col, calibrated_col, ph_smoothing_fwhm, n_extra, use_expr=field(default_factory=alwaysTrue)) classmethod

Train a rough calibration step by finding peaks in a smoothed histogram of pulse heights, and assigning them to known energies in a way that minimizes the RMS error in a local linearity test.

Source code in mass2/core/rough_cal.py
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
@classmethod
def learn_combinatoric(
    cls,
    ch: Channel,
    line_names: list[str],
    uncalibrated_col: str,
    calibrated_col: str,
    ph_smoothing_fwhm: float,
    n_extra: int,
    use_expr: pl.Expr = field(default_factory=alwaysTrue),
) -> "RoughCalibrationStep":
    """Train a rough calibration step by finding peaks in a smoothed histogram of pulse heights,
    and assigning them to known energies in a way that minimizes the RMS error in a local linearity test."""
    (names, ee) = line_names_and_energies(line_names)
    uncalibrated = ch.good_series(uncalibrated_col, use_expr=use_expr).to_numpy()
    assert len(uncalibrated) > 10, "not enough pulses"
    pfresult = peakfind_local_maxima_of_smoothed_hist(uncalibrated, fwhm_pulse_height_units=ph_smoothing_fwhm)
    assignment_result = find_optimal_assignment2(pfresult.ph_sorted_by_prominence()[: len(ee) + n_extra], ee, names)
    # phzerogain doesn't exist if there is only one line, and it might make no sense even if it does.
    good_expr_with_new_info = ch.good_expr
    if len(line_names) > 1:
        # Fix issue #95: don't cut pulses exceeding max_ph if that value is negative or cuts most pulses.
        # exclude pulses with values where the gain is negative
        max_ph = assignment_result.phzerogain()
        if max_ph > 0 and max_ph > np.median(uncalibrated):
            good_expr_with_new_info = ch.good_expr.and_(pl.col(uncalibrated_col) < max_ph)

    step = cls(
        [uncalibrated_col],
        [calibrated_col],
        good_expr_with_new_info,
        use_expr=use_expr,
        pfresult=pfresult,
        assignment_result=assignment_result,
        ph2energy=assignment_result.ph2energy,
        success=True,
    )
    return step

learn_combinatoric_height_info(ch, line_names, line_heights_allowed, uncalibrated_col, calibrated_col, ph_smoothing_fwhm, n_extra, use_expr=field(default_factory=alwaysTrue)) classmethod

Train a rough calibration step by finding peaks in a smoothed histogram of pulse heights, and assigning them to known energies in a way that minimizes the RMS error in a local linearity test, while respecting constraints on which pulse heights can be assigned to which energies.

Source code in mass2/core/rough_cal.py
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
@classmethod
def learn_combinatoric_height_info(
    cls,
    ch: Channel,
    line_names: list[str],
    line_heights_allowed: list[list[int]],
    uncalibrated_col: str,
    calibrated_col: str,
    ph_smoothing_fwhm: float,
    n_extra: int,
    use_expr: pl.Expr = field(default_factory=alwaysTrue),
) -> "RoughCalibrationStep":
    """Train a rough calibration step by finding peaks in a smoothed histogram of pulse heights,
    and assigning them to known energies in a way that minimizes the RMS error in a local linearity test,
    while respecting constraints on which pulse heights can be assigned to which energies."""
    (names, ee) = line_names_and_energies(line_names)
    uncalibrated = ch.good_series(uncalibrated_col, use_expr=use_expr).to_numpy()
    assert len(uncalibrated) > 10, "not enough pulses"
    pfresult = peakfind_local_maxima_of_smoothed_hist(uncalibrated, fwhm_pulse_height_units=ph_smoothing_fwhm)
    assignment_result = find_optimal_assignment2_height_info(
        pfresult.ph_sorted_by_prominence()[: len(ee) + n_extra],
        ee,
        names,
        line_heights_allowed,
    )

    step = cls(
        [uncalibrated_col],
        [calibrated_col],
        ch.good_expr,
        use_expr=use_expr,
        pfresult=pfresult,
        assignment_result=assignment_result,
        ph2energy=assignment_result.ph2energy,
        success=True,
    )
    return step

SmoothedLocalMaximaResult dataclass

A set of local maxima found in a smoothed histogram of pulse heights.

Source code in mass2/core/rough_cal.py
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
@dataclass(frozen=True)
class SmoothedLocalMaximaResult:
    """A set of local maxima found in a smoothed histogram of pulse heights."""

    fwhm_pulse_height_units: float
    bin_centers: np.ndarray
    counts: np.ndarray
    smoothed_counts: np.ndarray
    local_maxima_inds: np.ndarray  # inds into bin_centers
    local_minima_inds: np.ndarray  # inds into bin_centers

    def inds_sorted_by_peak_height(self) -> NDArray:
        """Indices of local maxima sorted by peak height, highest first."""
        return self.local_maxima_inds[np.argsort(-self.peak_height())]

    def inds_sorted_by_prominence(self) -> NDArray:
        """Indices of local maxima sorted by prominence, most prominent first."""
        return self.local_maxima_inds[np.argsort(-self.prominence())]

    def ph_sorted_by_prominence(self) -> NDArray:
        """Pulse heights of local maxima sorted by prominence, most prominent first."""
        return self.bin_centers[self.inds_sorted_by_prominence()]

    def ph_sorted_by_peak_height(self) -> NDArray:
        """Pulse heights of local maxima sorted by peak height, highest first."""
        return self.bin_centers[self.inds_sorted_by_peak_height()]

    def peak_height(self) -> NDArray:
        """Peak heights of local maxima."""
        return self.smoothed_counts[self.local_maxima_inds]

    def prominence(self) -> NDArray:
        """Prominence of local maxima, in aems order as `local_maxima_inds`."""
        assert len(self.local_minima_inds) == len(self.local_maxima_inds) + 1, (
            "peakfind_local_maxima_of_smoothed_hist must ensure this "
        )
        prominence = np.zeros_like(self.local_maxima_inds, dtype=float)
        for i in range(len(self.local_maxima_inds)):
            sc_max = self.smoothed_counts[self.local_maxima_inds[i]]
            sc_min_before = self.smoothed_counts[self.local_minima_inds[i]]
            sc_min_after = self.smoothed_counts[self.local_minima_inds[i + 1]]
            prominence[i] = (2 * sc_max - sc_min_before - sc_min_after) / 2
        assert np.all(prominence >= 0), "prominence should be non-negative"
        return prominence

    def plot(
        self,
        assignment_result: BestAssignmentPfitGainResult | None = None,
        n_highlight: int = 10,
        plot_counts: bool = False,
        ax: Axes | None = None,
    ) -> Axes:
        """Make a diagnostic plot of the smoothed histogram and local maxima."""
        if ax is None:
            plt.figure()
            ax = plt.gca()
        inds_prominence = self.inds_sorted_by_prominence()[:n_highlight]
        inds_peak_height = self.inds_sorted_by_peak_height()[:n_highlight]
        if plot_counts:
            ax.plot(self.bin_centers, self.counts, label="counts")
        ax.plot(self.bin_centers, self.smoothed_counts, label="smoothed_counts")
        ax.plot(
            self.bin_centers[self.local_maxima_inds],
            self.smoothed_counts[self.local_maxima_inds],
            ".",
            label="peaks",
        )
        if assignment_result is not None:
            inds_assigned = np.searchsorted(self.bin_centers, assignment_result.ph_assigned)
            inds_unassigned = np.searchsorted(self.bin_centers, assignment_result.ph_unassigned())
            bin_centers_assigned = self.bin_centers[inds_assigned]
            bin_centers_unassigned = self.bin_centers[inds_unassigned]
            smoothed_counts_assigned = self.smoothed_counts[inds_assigned]
            smoothed_counts_unassigned = self.smoothed_counts[inds_unassigned]
            ax.plot(bin_centers_assigned, smoothed_counts_assigned, "o", label="assigned")
            ax.plot(
                bin_centers_unassigned,
                smoothed_counts_unassigned,
                "o",
                label="unassigned",
            )
            for name, x, y in zip(
                assignment_result.names_target,
                bin_centers_assigned,
                smoothed_counts_assigned,
            ):
                ax.annotate(str(name), (x, y), rotation=30)
            ax.set_title(f"SmoothedLocalMaximaResult rms_residual={assignment_result.rms_residual:.2f} eV")

        else:
            ax.plot(
                self.bin_centers[inds_prominence],
                self.smoothed_counts[inds_prominence],
                "o",
                label=f"{n_highlight} most prominent",
            )
            ax.plot(
                self.bin_centers[inds_peak_height],
                self.smoothed_counts[inds_peak_height],
                "v",
                label=f"{n_highlight} highest",
            )
            ax.set_title("SmoothedLocalMaximaResult")

        ax.legend()
        ax.set_xlabel("pulse height")
        ax.set_ylabel("intensity")
        # print(f"{np.amax(self.smoothed_counts)=} {np.amin(self.smoothed_counts)=} ")
        # ax.set_ylim(1/self.fwhm_pulse_height_units, ax.get_ylim()[1])

        return ax

inds_sorted_by_peak_height()

Indices of local maxima sorted by peak height, highest first.

Source code in mass2/core/rough_cal.py
201
202
203
def inds_sorted_by_peak_height(self) -> NDArray:
    """Indices of local maxima sorted by peak height, highest first."""
    return self.local_maxima_inds[np.argsort(-self.peak_height())]

inds_sorted_by_prominence()

Indices of local maxima sorted by prominence, most prominent first.

Source code in mass2/core/rough_cal.py
205
206
207
def inds_sorted_by_prominence(self) -> NDArray:
    """Indices of local maxima sorted by prominence, most prominent first."""
    return self.local_maxima_inds[np.argsort(-self.prominence())]

peak_height()

Peak heights of local maxima.

Source code in mass2/core/rough_cal.py
217
218
219
def peak_height(self) -> NDArray:
    """Peak heights of local maxima."""
    return self.smoothed_counts[self.local_maxima_inds]

ph_sorted_by_peak_height()

Pulse heights of local maxima sorted by peak height, highest first.

Source code in mass2/core/rough_cal.py
213
214
215
def ph_sorted_by_peak_height(self) -> NDArray:
    """Pulse heights of local maxima sorted by peak height, highest first."""
    return self.bin_centers[self.inds_sorted_by_peak_height()]

ph_sorted_by_prominence()

Pulse heights of local maxima sorted by prominence, most prominent first.

Source code in mass2/core/rough_cal.py
209
210
211
def ph_sorted_by_prominence(self) -> NDArray:
    """Pulse heights of local maxima sorted by prominence, most prominent first."""
    return self.bin_centers[self.inds_sorted_by_prominence()]

plot(assignment_result=None, n_highlight=10, plot_counts=False, ax=None)

Make a diagnostic plot of the smoothed histogram and local maxima.

Source code in mass2/core/rough_cal.py
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
def plot(
    self,
    assignment_result: BestAssignmentPfitGainResult | None = None,
    n_highlight: int = 10,
    plot_counts: bool = False,
    ax: Axes | None = None,
) -> Axes:
    """Make a diagnostic plot of the smoothed histogram and local maxima."""
    if ax is None:
        plt.figure()
        ax = plt.gca()
    inds_prominence = self.inds_sorted_by_prominence()[:n_highlight]
    inds_peak_height = self.inds_sorted_by_peak_height()[:n_highlight]
    if plot_counts:
        ax.plot(self.bin_centers, self.counts, label="counts")
    ax.plot(self.bin_centers, self.smoothed_counts, label="smoothed_counts")
    ax.plot(
        self.bin_centers[self.local_maxima_inds],
        self.smoothed_counts[self.local_maxima_inds],
        ".",
        label="peaks",
    )
    if assignment_result is not None:
        inds_assigned = np.searchsorted(self.bin_centers, assignment_result.ph_assigned)
        inds_unassigned = np.searchsorted(self.bin_centers, assignment_result.ph_unassigned())
        bin_centers_assigned = self.bin_centers[inds_assigned]
        bin_centers_unassigned = self.bin_centers[inds_unassigned]
        smoothed_counts_assigned = self.smoothed_counts[inds_assigned]
        smoothed_counts_unassigned = self.smoothed_counts[inds_unassigned]
        ax.plot(bin_centers_assigned, smoothed_counts_assigned, "o", label="assigned")
        ax.plot(
            bin_centers_unassigned,
            smoothed_counts_unassigned,
            "o",
            label="unassigned",
        )
        for name, x, y in zip(
            assignment_result.names_target,
            bin_centers_assigned,
            smoothed_counts_assigned,
        ):
            ax.annotate(str(name), (x, y), rotation=30)
        ax.set_title(f"SmoothedLocalMaximaResult rms_residual={assignment_result.rms_residual:.2f} eV")

    else:
        ax.plot(
            self.bin_centers[inds_prominence],
            self.smoothed_counts[inds_prominence],
            "o",
            label=f"{n_highlight} most prominent",
        )
        ax.plot(
            self.bin_centers[inds_peak_height],
            self.smoothed_counts[inds_peak_height],
            "v",
            label=f"{n_highlight} highest",
        )
        ax.set_title("SmoothedLocalMaximaResult")

    ax.legend()
    ax.set_xlabel("pulse height")
    ax.set_ylabel("intensity")
    # print(f"{np.amax(self.smoothed_counts)=} {np.amin(self.smoothed_counts)=} ")
    # ax.set_ylim(1/self.fwhm_pulse_height_units, ax.get_ylim()[1])

    return ax

prominence()

Prominence of local maxima, in aems order as local_maxima_inds.

Source code in mass2/core/rough_cal.py
221
222
223
224
225
226
227
228
229
230
231
232
233
def prominence(self) -> NDArray:
    """Prominence of local maxima, in aems order as `local_maxima_inds`."""
    assert len(self.local_minima_inds) == len(self.local_maxima_inds) + 1, (
        "peakfind_local_maxima_of_smoothed_hist must ensure this "
    )
    prominence = np.zeros_like(self.local_maxima_inds, dtype=float)
    for i in range(len(self.local_maxima_inds)):
        sc_max = self.smoothed_counts[self.local_maxima_inds[i]]
        sc_min_before = self.smoothed_counts[self.local_minima_inds[i]]
        sc_min_after = self.smoothed_counts[self.local_minima_inds[i + 1]]
        prominence[i] = (2 * sc_max - sc_min_before - sc_min_after) / 2
    assert np.all(prominence >= 0), "prominence should be non-negative"
    return prominence

drift_correct_entropy(slope, indicator_zero_mean, uncorrected, bin_edges, fwhm_in_bin_number_units)

Calculate the entropy of a histogram of drift-corrected pulse heights.

Source code in mass2/core/rough_cal.py
637
638
639
640
641
642
643
644
645
646
647
648
def drift_correct_entropy(
    slope: float,
    indicator_zero_mean: ndarray,
    uncorrected: ndarray,
    bin_edges: ndarray,
    fwhm_in_bin_number_units: int,
) -> float:
    """Calculate the entropy of a histogram of drift-corrected pulse heights."""
    corrected = uncorrected * (1 + indicator_zero_mean * slope)
    smoothed_counts, bin_edges, _counts = hist_smoothed(corrected, fwhm_in_bin_number_units, bin_edges)
    w = smoothed_counts > 0
    return -(np.log(smoothed_counts[w]) * smoothed_counts[w]).sum()

eval_3peak_assignment_pfit_gain(ph_assigned, e_assigned, possible_phs, line_energies, line_names)

Evaluate a proposed assignment of 3 pulse heights to 3 energies by fitting a 2nd degree polynomial gain curve, and returning the RMS residual in energy after applying that gain curve to all possible pulse heights. If the proposed assignment does not lead to a well formed gain curve, return infinity and a string describing the problem.

Source code in mass2/core/rough_cal.py
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
def eval_3peak_assignment_pfit_gain(
    ph_assigned: NDArray, e_assigned: NDArray, possible_phs: NDArray, line_energies: NDArray, line_names: list[str]
) -> tuple[float, BestAssignmentPfitGainResult | str]:
    """Evaluate a proposed assignment of 3 pulse heights to 3 energies by fitting a 2nd degree polynomial gain curve,
    and returning the RMS residual in energy after applying that gain curve to all possible pulse heights.
    If the proposed assignment does not lead to a well formed gain curve, return infinity and a string describing the problem."""
    assert len(np.unique(ph_assigned)) == len(ph_assigned), "assignments must be unique"
    assert len(np.unique(e_assigned)) == len(e_assigned), "assignments must be unique"
    assert all(np.diff(ph_assigned) > 0), "assignments must be sorted"
    assert all(np.diff(e_assigned) > 0), "assignments must be sorted"
    gain_assigned = np.array(ph_assigned) / np.array(e_assigned)
    pfit_gain_3peak = np.polynomial.Polynomial.fit(ph_assigned, gain_assigned, deg=2)
    if pfit_gain_3peak.deriv(1)(0) > 0:
        # well formed calibration have negative derivative at zero pulse height
        return np.inf, "pfit_gain_3peak deriv at 0 should be <0"
    if pfit_gain_3peak(1e5) < 0:
        # well formed calibration have positive gain at 1e5
        return np.inf, "pfit_gain_3peak should be above zero at 100k ph"
    if any(np.iscomplex(pfit_gain_3peak.roots())):
        # well formed calibrations have real roots
        return np.inf, "pfit_gain_3peak must have real roots"

    def ph2energy(ph: NDArray) -> NDArray:
        "Convert pulse height to energy using the fitted gain curve."
        gain = pfit_gain_3peak(ph)
        return ph / gain

    cba = pfit_gain_3peak.convert().coef

    def energy2ph(energy: NDArray) -> NDArray:
        "Invert the gain curve to convert energy to pulse height."
        # ph2energy is equivalent to this with y=energy, x=ph
        # y = x/(c + b*x + a*x^2)
        # so
        # y*c + (y*b-1)*x + a*x^2 = 0
        # and given that we've selected for well formed calibrations,
        # we know which root we want
        c, bb, a = cba * energy
        b = bb - 1
        ph = (-b - np.sqrt(b**2 - 4 * a * c)) / (2 * a)
        return ph

    predicted_ph = [energy2ph(_e) for _e in line_energies]
    df = pl.DataFrame({
        "line_energy": line_energies,
        "line_name": line_names,
        "predicted_ph": predicted_ph,
    }).sort(by="predicted_ph")
    dfph = pl.DataFrame({"possible_ph": possible_phs, "ph_ind": np.arange(len(possible_phs))}).sort(by="possible_ph")
    # for each e find the closest possible_ph to the calculaed predicted_ph
    # we started with assignments for 3 energies
    # now we have assignments for all energies
    df = df.join_asof(dfph, left_on="predicted_ph", right_on="possible_ph", strategy="nearest")
    n_unique = len(df["possible_ph"].unique())
    if n_unique < len(df):
        # assigned multiple energies to same pulseheight, not a good cal
        return np.inf, "assignments should be unique"

    # now we evaluate the assignment and create a result object
    residual_e, pfit_gain = find_pfit_gain_residual(df["possible_ph"].to_numpy(), df["line_energy"].to_numpy())
    if pfit_gain(1e5) < 0:
        # well formed calibration have positive gain at 1e5
        return np.inf, "pfit_gain should be above zero at 100k ph"
    if any(np.iscomplex(pfit_gain.roots())):
        # well formed calibrations have real roots
        return np.inf, "pfit_gain should not have complex roots"
    rms_residual_e = mass2.misc.root_mean_squared(residual_e)
    result = BestAssignmentPfitGainResult(
        rms_residual_e,
        ph_assigned=df["possible_ph"].to_numpy(),
        residual_e=residual_e,
        assignment_inds=df["ph_ind"].to_numpy(),
        pfit_gain=pfit_gain,
        energy_target=df["line_energy"].to_numpy(),
        names_target=df["line_name"].to_list(),
        ph_target=possible_phs,
    )
    return rms_residual_e, result

find_best_residual_among_all_possible_assignments(ph, e)

Try all possible assignments of pulse heights to energies, and return the one with the lowest RMS residual in energy after fitting a 2nd degree polynomial gain curve.

Source code in mass2/core/rough_cal.py
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
def find_best_residual_among_all_possible_assignments(ph: ndarray, e: ndarray) -> tuple[float, ndarray, ndarray, ndarray, Polynomial]:
    """Try all possible assignments of pulse heights to energies,
    and return the one with the lowest RMS residual in energy after fitting a 2nd degree polynomial gain curve.
    """
    assert len(ph) >= len(e)
    ph = np.sort(ph)
    assignments_inds = itertools.combinations(np.arange(len(ph)), len(e))
    best_rms_residual = np.inf
    best_ph_assigned = np.array([])
    best_residual_e = np.array([])
    best_assignment_inds = np.array([])
    best_pfit = Polynomial([0])
    for i, indices in enumerate(assignments_inds):
        assignment_inds = np.array(indices)
        ph_assigned = np.array(ph[assignment_inds])
        residual_e, pfit_gain = find_pfit_gain_residual(ph_assigned, e)
        rms_residual = mass2.misc.root_mean_squared(residual_e)
        if rms_residual < best_rms_residual:
            best_rms_residual = rms_residual
            best_ph_assigned = ph_assigned
            best_residual_e = residual_e
            best_assignment_inds = assignment_inds
            best_pfit = pfit_gain
    return (
        best_rms_residual,
        best_ph_assigned,
        best_residual_e,
        best_assignment_inds,
        best_pfit,
    )

find_best_residual_among_all_possible_assignments2(ph, e, names)

Try all possible assignments of pulse heights to energies, and return the one with the lowest RMS residual in energy after fitting a 2nd degree polynomial gain curve. Return as a BestAssignmentPfitGainResult object.

Source code in mass2/core/rough_cal.py
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
def find_best_residual_among_all_possible_assignments2(ph: ndarray, e: ndarray, names: list[str]) -> BestAssignmentPfitGainResult:
    """Try all possible assignments of pulse heights to energies,
    and return the one with the lowest RMS residual in energy after fitting a 2nd degree polynomial gain curve.
    Return as a BestAssignmentPfitGainResult object.
    """
    (
        best_rms_residual,
        best_ph_assigned,
        best_residual_e,
        best_assignment_inds,
        best_pfit,
    ) = find_best_residual_among_all_possible_assignments(ph, e)
    return BestAssignmentPfitGainResult(
        float(best_rms_residual),
        best_ph_assigned,
        best_residual_e,
        best_assignment_inds,
        best_pfit,
        e,
        names,
        ph,
    )

find_local_maxima(pulse_heights, gaussian_fwhm)

Smears each pulse by a gaussian of gaussian_fhwm and finds local maxima, returns a list of their locations in pulse_height units (sorted by number of pulses in peak) AND their peak values as: (peak_locations, peak_intensities)

Args: pulse_heights (np.array(dtype=float)): a list of pulse heights (eg p_filt_value) gaussian_fwhm = fwhm of a gaussian that each pulse is smeared with, in same units as pulse heights

Source code in mass2/core/rough_cal.py
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
def find_local_maxima(pulse_heights: ArrayLike, gaussian_fwhm: float) -> Any:
    """Smears each pulse by a gaussian of gaussian_fhwm and finds local maxima,
    returns a list of their locations in pulse_height units (sorted by number of
    pulses in peak) AND their peak values as: (peak_locations, peak_intensities)

    Args:
        pulse_heights (np.array(dtype=float)): a list of pulse heights (eg p_filt_value)
        gaussian_fwhm = fwhm of a gaussian that each pulse is smeared with, in same units as pulse heights
    """
    # kernel density estimation (with a gaussian kernel)
    n = 128 * 1024
    gaussian_fwhm = float(gaussian_fwhm)
    # The above ensures that lo & hi are floats, so that (lo-hi)/n is always a float in python2
    sigma = gaussian_fwhm / (np.sqrt(np.log(2) * 2) * 2)
    tbw = 1.0 / sigma / (np.pi * 2)
    lo = np.min(pulse_heights) - 3 * gaussian_fwhm
    hi = np.max(pulse_heights) + 3 * gaussian_fwhm
    hist, bins = np.histogram(pulse_heights, np.linspace(lo, hi, n + 1))
    tx = np.fft.rfftfreq(n, (lo - hi) / n)
    ty = np.exp(-(tx**2) / 2 / tbw**2)
    x = (bins[1:] + bins[:-1]) / 2
    y = np.fft.irfft(np.fft.rfft(hist) * ty)

    flag = (y[1:-1] > y[:-2]) & (y[1:-1] > y[2:])
    lm = np.arange(1, n - 1)[flag]
    lm = lm[np.argsort(-y[lm])]
    bin_centers, _step_size = mass2.misc.midpoints_and_step_size(bins)
    return np.array(x[lm]), np.array(y[lm]), (hist, bin_centers, y)

find_optimal_assignment(ph, e)

Find the optimal assignment of pulse heights to energies by minimizing the RMS error in a local linearity test.

Source code in mass2/core/rough_cal.py
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
def find_optimal_assignment(ph: ArrayLike, e: ArrayLike) -> tuple[float, NDArray]:
    """Find the optimal assignment of pulse heights to energies by minimizing the RMS error in a local linearity test."""
    # ph is a list of peak heights longer than e
    # e is a list of known peak energies
    # we want to find the set of peak heights from ph that are closest to being locally linear with the energies in e

    # when given 3 or less energies to match, use the largest peaks in peak order
    ph = np.asarray(ph)
    e = np.asarray(e)
    assert len(e) >= 1
    if len(e) <= 2:
        return 0, np.array(sorted(ph[: len(e)]))

    rms_e_residual, pha, _pha_inds = rank_assignments(ph, e)
    ind = np.argmin(rms_e_residual)
    return rms_e_residual[ind], pha[ind]

find_optimal_assignment2(ph, e, line_names)

Find the optimal assignment of pulse heights to energies by minimizing the RMS error in a local linearity test, and fit a polynomial gain curve to the result.

Source code in mass2/core/rough_cal.py
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
def find_optimal_assignment2(ph: ArrayLike, e: ArrayLike, line_names: list[str]) -> BestAssignmentPfitGainResult:
    """Find the optimal assignment of pulse heights to energies by minimizing the RMS error in a local linearity test,
    and fit a polynomial gain curve to the result."""
    ph = np.asarray(ph)
    e = np.asarray(e)
    rms_e_residual, pha = find_optimal_assignment(ph, e)
    if e[0] == 0:
        raise ValueError("cannot use energy=0 points to learn gain based calibration")
    gain = pha / e
    deg = min(len(e) - 1, 2)
    if deg == 0:
        pfit_gain = np.polynomial.Polynomial(gain)
    else:
        pfit_gain = np.polynomial.Polynomial.fit(pha, gain, deg=min(len(e) - 1, 2))
    result = BestAssignmentPfitGainResult(
        rms_e_residual,
        ph_assigned=pha,
        residual_e=None,
        assignment_inds=None,
        pfit_gain=pfit_gain,
        energy_target=e,
        names_target=line_names,
        ph_target=ph,
    )
    return result

find_optimal_assignment2_height_info(ph, e, line_names, line_heights_allowed)

Find the optimal assignment of pulse heights to energies by minimizing the RMS error in a local linearity test, while respecting constraints on which pulse heights can be assigned to which energies, and fit a polynomial gain curve to the result.

Source code in mass2/core/rough_cal.py
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
def find_optimal_assignment2_height_info(
    ph: ArrayLike, e: ArrayLike, line_names: list[str], line_heights_allowed: ArrayLike
) -> BestAssignmentPfitGainResult:
    """Find the optimal assignment of pulse heights to energies by minimizing the RMS error in a local linearity test,
    while respecting constraints on which pulse heights can be assigned to which energies,
    and fit a polynomial gain curve to the result."""
    rms_e_residual, pha = find_optimal_assignment_height_info(ph, e, line_heights_allowed)
    ph = np.asarray(ph)
    e = np.asarray(e)
    gain = pha / e
    deg = min(len(e) - 1, 2)
    if deg == 0:
        pfit_gain = np.polynomial.Polynomial(gain)
    else:
        pfit_gain = np.polynomial.Polynomial.fit(pha, gain, deg=min(len(e) - 1, 2))
    result = BestAssignmentPfitGainResult(
        rms_e_residual,
        ph_assigned=pha,
        residual_e=None,
        assignment_inds=None,
        pfit_gain=pfit_gain,
        energy_target=e,
        names_target=line_names,
        ph_target=ph,
    )
    return result

find_optimal_assignment_height_info(ph, e, line_heights_allowed)

Find the optimal assignment of pulse heights to energies by minimizing the RMS error in a local linearity test, while respecting constraints on which pulse heights can be assigned to which energies.

Source code in mass2/core/rough_cal.py
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
def find_optimal_assignment_height_info(ph: ArrayLike, e: ArrayLike, line_heights_allowed: ArrayLike) -> tuple[float, NDArray]:
    """Find the optimal assignment of pulse heights to energies by minimizing the RMS error in a local linearity test,
    while respecting constraints on which pulse heights can be assigned to which energies."""
    # ph is a list of peak heights longer than e
    # e is a list of known peak energies
    # we want to find the set of peak heights from ph that are closest to being locally linear with the energies in e

    # when given 3 or less energies to match, use the largest peaks in peak order
    ph = np.asarray(ph)
    e = np.asarray(e)
    line_heights_allowed = np.asarray(line_heights_allowed)
    assert len(e) >= 1
    if len(e) <= 2:
        return 0, np.array(sorted(ph[: len(e)]))

    rms_e_residual, pha, pha_inds = rank_assignments(ph, e)
    best_ind = None
    best_rms_residual = np.inf
    print(f"{e=}")
    for i_assign in range(len(rms_e_residual)):
        rms_e_candidate = rms_e_residual[i_assign]
        # pha[i,:] is one choice of len(e) values from ph to assign to e
        pha_inds_candidate = pha_inds[i_assign, :]
        if rms_e_candidate > best_rms_residual:
            continue
        # check if peaks mathch height info
        print(f"{pha_inds_candidate=}")
        print(f"{line_heights_allowed=}")
        failed_line_height_check = False
        for j in range(len(e)):
            if pha_inds_candidate[j] not in line_heights_allowed[j]:
                failed_line_height_check = True
                print("not allowed")
                a = pha_inds_candidate[j]
                b = line_heights_allowed[j]
                print(f"{a=} {b=}")
                break
        if failed_line_height_check:
            continue
        print("is new best!")
        best_rms_residual = rms_e_candidate
        best_ind = i_assign
    if best_ind is None:
        raise Exception("no assignment found satisfying peak height info")
    return rms_e_residual[best_ind], pha[best_ind, :]

find_pfit_gain_residual(ph, e)

Find a 2nd degree polynomial fit to the gain curve defined by ph/e, and return the residuals in energy when using that gain curve to convert ph to energy.

Source code in mass2/core/rough_cal.py
565
566
567
568
569
570
571
572
573
574
575
576
577
578
def find_pfit_gain_residual(ph: ndarray, e: ndarray) -> tuple[ndarray, Polynomial]:
    """Find a 2nd degree polynomial fit to the gain curve defined by ph/e,
    and return the residuals in energy when using that gain curve to convert ph to energy."""
    assert len(ph) == len(e)
    gain = ph / e
    pfit_gain = np.polynomial.Polynomial.fit(ph, gain, deg=2)

    def ph2energy(ph: NDArray) -> NDArray:
        """Convert pulse height to energy using the fitted gain curve."""
        return ph / pfit_gain(ph)

    predicted_e = ph2energy(ph)
    residual_e = e - predicted_e
    return residual_e, pfit_gain

hist_smoothed(pulse_heights, fwhm_pulse_height_units, bin_edges=None)

Compute a histogram of pulse heights and smooth it with a Gaussian of given FWHM.

Source code in mass2/core/rough_cal.py
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
def hist_smoothed(
    pulse_heights: ndarray,
    fwhm_pulse_height_units: float,
    bin_edges: ndarray | None = None,
) -> tuple[ndarray, ndarray, ndarray]:
    """Compute a histogram of pulse heights and smooth it with a Gaussian of given FWHM."""
    pulse_heights = pulse_heights.astype(np.float64)
    # convert to float64 to avoid wrapping subtraction and platform specific behavior regarding uint16s
    # linux CI will throw errors, while windows does not, but maybe is just silently wrong?
    assert len(pulse_heights > 10), "not enough pulses"
    if bin_edges is None:
        n = 128 * 1024
        lo = (np.min(pulse_heights) - 3 * fwhm_pulse_height_units).astype(np.float64)
        hi = (np.max(pulse_heights) + 3 * fwhm_pulse_height_units).astype(np.float64)
        bin_edges = np.linspace(lo, hi, n + 1)

    _, step_size = mass2.misc.midpoints_and_step_size(bin_edges)
    counts, _ = np.histogram(pulse_heights, bin_edges)
    fwhm_in_bin_number_units = fwhm_pulse_height_units / step_size
    smoothed_counts = smooth_hist_with_gauassian_by_fft(counts, fwhm_in_bin_number_units)
    return smoothed_counts, bin_edges, counts

local_maxima(y)

Find local maxima and minima, as 1D arrays.

Source code in mass2/core/rough_cal.py
342
343
344
345
346
347
348
349
350
351
352
353
354
355
def local_maxima(y: ndarray) -> tuple[ndarray, ndarray]:
    "Find local maxima and minima, as 1D arrays."
    local_maxima_inds = []
    local_minima_inds = []
    increasing = False
    for i in range(len(y) - 1):
        if increasing and (y[i + 1] < y[i]):
            local_maxima_inds.append(i)
            increasing = False
        if not increasing and (y[i + 1] > y[i]):
            local_minima_inds.append(i)
            increasing = True
    # increasing starts false, so we always start with a miniumum
    return np.array(local_maxima_inds), np.array(local_minima_inds)

minimize_entropy_linear(indicator, uncorrected, bin_edges, fwhm_in_bin_number_units)

Minimize the entropy of a histogram of drift-corrected pulse heights by varying the slope of a linear correction based on the given indicator.

Source code in mass2/core/rough_cal.py
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
def minimize_entropy_linear(
    indicator: ndarray,
    uncorrected: ndarray,
    bin_edges: ndarray,
    fwhm_in_bin_number_units: int,
) -> tuple[OptimizeResult, float32]:
    """Minimize the entropy of a histogram of drift-corrected pulse heights
    by varying the slope of a linear correction based on the given indicator."""
    indicator_mean = np.mean(indicator)
    indicator_zero_mean = indicator - indicator_mean

    def entropy_fun(slope: float) -> float:
        """Return the entropy of a histogram of drift-corrected pulse heights, the optimization target."""
        return drift_correct_entropy(slope, indicator_zero_mean, uncorrected, bin_edges, fwhm_in_bin_number_units)

    result = sp.optimize.minimize_scalar(entropy_fun, bracket=[0, 0.1])
    return result, indicator_mean

peakfind_local_maxima_of_smoothed_hist(pulse_heights, fwhm_pulse_height_units, bin_edges=None)

Find local maxima in a smoothed histogram of pulse heights.

Source code in mass2/core/rough_cal.py
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
def peakfind_local_maxima_of_smoothed_hist(
    pulse_heights: ndarray,
    fwhm_pulse_height_units: float,
    bin_edges: ndarray | None = None,
) -> SmoothedLocalMaximaResult:
    """Find local maxima in a smoothed histogram of pulse heights."""
    pulse_heights = pulse_heights.astype(np.float64)
    assert len(pulse_heights > 10), "not enough pulses"
    smoothed_counts, bin_edges, counts = hist_smoothed(pulse_heights, fwhm_pulse_height_units, bin_edges)
    bin_centers, _step_size = mass2.misc.midpoints_and_step_size(bin_edges)
    local_maxima_inds, local_minima_inds = local_maxima(smoothed_counts)
    # require a minimum before and after a maximum (the first check is redundant with behavior of local_maxima)
    if local_maxima_inds[0] < local_minima_inds[0]:
        local_maxima_inds = local_maxima_inds[1:]
    if local_maxima_inds[-1] > local_minima_inds[-1]:
        local_maxima_inds = local_maxima_inds[:-1]
    return SmoothedLocalMaximaResult(
        fwhm_pulse_height_units,
        bin_centers,
        counts,
        smoothed_counts,
        local_maxima_inds,
        local_minima_inds,
    )

rank_3peak_assignments(ph, e, line_names, max_fractional_energy_error_3rd_assignment=0.1, min_gain_fraction_at_ph_30k=0.25)

Explore and rank possible assignments of pulse heights to energies when there are 3 or more lines.

Source code in mass2/core/rough_cal.py
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
def rank_3peak_assignments(
    ph: NDArray,
    e: NDArray,
    line_names: Iterable[str],
    max_fractional_energy_error_3rd_assignment: float = 0.1,
    min_gain_fraction_at_ph_30k: float = 0.25,
) -> tuple[pl.DataFrame, pl.DataFrame]:
    """Explore and rank possible assignments of pulse heights to energies when there are 3 or more lines."""
    # we explore possible line assignments, and down select based on knowledge of gain curve shape
    # gain = ph/e, and we assume gain starts at zero, decreases with pulse height, and
    # that a 2nd order polynomial is a reasonably good approximation
    # with one assignment we model the gain as constant, and use that to find the most likely
    # 2nd assignments, then we model the gain as linear, and use that to rank 3rd assignments
    dfe = pl.DataFrame({"e0_ind": np.arange(len(e)), "e0": e, "name": line_names})
    dfph = pl.DataFrame({"ph0_ind": np.arange(len(ph)), "ph0": ph})
    # dfph should know about peak_area and use it to weight choices somehow

    # 1st assignments ####
    # e0 and ph0 are the first assignment
    df0 = dfe.join(dfph, how="cross").with_columns(gain0=pl.col("ph0") / pl.col("e0"))
    # 2nd assignments ####
    # e1 and ph1 are the 2nd assignment
    df1 = (
        df0.join(df0, how="cross").rename({"e0_right": "e1", "ph0_right": "ph1"}).drop("e0_ind_right", "ph0_ind_right", "gain0_right")
    )
    # 1) keep only assignments with e0<e1 and ph0<ph1 to avoid looking at the same pair in reverse
    df1 = df1.filter((pl.col("e0") < pl.col("e1")).and_(pl.col("ph0") < pl.col("ph1")))
    # 2) the gain slope must be negative
    df1 = (
        df1
        .with_columns(gain1=pl.col("ph1") / pl.col("e1"))
        .with_columns(gain_slope=(pl.col("gain1") - pl.col("gain0")) / (pl.col("ph1") - pl.col("ph0")))
        .filter(pl.col("gain_slope") < 0)
    )
    # 3) the gain slope should not have too large a magnitude
    df1 = df1.with_columns(gain_at_0=pl.col("gain0") - pl.col("ph0") * pl.col("gain_slope"))
    df1 = df1.with_columns(gain_frac_at_ph30k=(1 + 30000 * pl.col("gain_slope") / pl.col("gain_at_0")))
    df1 = df1.filter(pl.col("gain_frac_at_ph30k") > min_gain_fraction_at_ph_30k)

    # 3rd assignments ####
    # e2 and ph2 are the 3rd assignment
    df2 = df1.join(df0.select(e2="e0", ph2="ph0"), how="cross")
    df2 = df2.with_columns(gain_at_ph2=pl.col("gain_at_0") + pl.col("gain_slope") * pl.col("ph2"))
    df2 = df2.with_columns(e_at_ph2=pl.col("ph2") / pl.col("gain_at_ph2"))
    df2 = df2.filter((pl.col("e1") < pl.col("e2")).and_(pl.col("ph1") < pl.col("ph2")))
    # 1) rank 3rd assignments by energy error at ph2 assuming gain = gain_slope*ph+gain_at_0
    # where gain_slope and gain are calculated from assignments 1 and 2
    df2 = df2.with_columns(e_err_at_ph2=pl.col("e_at_ph2") - pl.col("e2")).sort(by=np.abs(pl.col("e_err_at_ph2")))
    # 2) return a dataframe downselected to the assignments and the ranking criteria
    # 3) throw away assignments with large (default 10%) energy errors
    df3peak = df2.select("e0", "ph0", "e1", "ph1", "e2", "ph2", "e_err_at_ph2").filter(
        np.abs(pl.col("e_err_at_ph2") / pl.col("e2")) < max_fractional_energy_error_3rd_assignment
    )
    return df3peak, dfe

rank_assignments(ph, e)

Rank possible assignments of pulse heights to energies by how locally linear their implied gain curves are.

Source code in mass2/core/rough_cal.py
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
def rank_assignments(ph: ArrayLike, e: ArrayLike) -> tuple[NDArray, NDArray, NDArray]:
    """Rank possible assignments of pulse heights to energies by how locally linear their implied gain curves are."""
    # ph is a list of peak heights longer than e
    # e is a list of known peak energies
    # we want to find the set of peak heights from ph that are closest to being locally linear with the energies in e
    e = np.array(e)
    ph = np.array(ph)
    e.sort()
    ph.sort()
    pha = np.array(list(itertools.combinations(ph, len(e))))
    pha_inds = np.array(list(itertools.combinations(np.arange(len(ph)), len(e))))
    # pha[i,:] is one choice of len(e) values from ph to assign to e
    # we use linear interpolation of the form y = y0 + (y1-y0)*(x-x0)/(x1-x0)
    # on each set of 3 values
    # with y = e and x = ph
    # x is pha[:,1:-1], x0 is pha[:,:-2], x1 is pha[:,2:]
    x = pha[:, 1:-1]
    x0 = pha[:, :-2]
    x1 = pha[:, 2:]
    y0 = e[:-2]
    y1 = e[2:]
    x_m_x0_over_x1_m_x0 = (x - x0) / (x1 - x0)
    y = y0 + (y1 - y0) * x_m_x0_over_x1_m_x0
    y_expected = e[1:-1]
    rms_e_residual = np.asarray(mass2.misc.root_mean_squared(y - y_expected, axis=1))
    # prefer negative slopes for gain
    # gain_first = ph[0]/e[0]
    # gain_last = ph[-1]/e[-1]
    return rms_e_residual, pha, pha_inds

smooth_hist_with_gauassian_by_fft(hist, fwhm_in_bin_number_units)

Smooth a histogram by convolution with a Gaussian, using FFTs.

Source code in mass2/core/rough_cal.py
303
304
305
306
307
def smooth_hist_with_gauassian_by_fft(hist: ndarray, fwhm_in_bin_number_units: float) -> ndarray:
    """Smooth a histogram by convolution with a Gaussian, using FFTs."""
    kernel = smooth_hist_with_gauassian_by_fft_compute_kernel(len(hist), fwhm_in_bin_number_units)
    y = np.fft.irfft(np.fft.rfft(hist) * kernel)
    return y

smooth_hist_with_gauassian_by_fft_compute_kernel(nbins, fwhm_in_bin_number_units)

Compute the DFT of a Gaussian kernel for smoothing a histogram.

Source code in mass2/core/rough_cal.py
310
311
312
313
314
315
316
def smooth_hist_with_gauassian_by_fft_compute_kernel(nbins: int, fwhm_in_bin_number_units: float) -> ndarray:
    """Compute the DFT of a Gaussian kernel for smoothing a histogram."""
    sigma = fwhm_in_bin_number_units / (np.sqrt(np.log(2) * 2) * 2)
    tbw = 1.0 / sigma / (np.pi * 2)
    tx = np.fft.rfftfreq(nbins)
    kernel = np.exp(-(tx**2) / 2 / tbw**2)
    return kernel

Tools for working with continuous data, as taken by the True Bequerel project.

TriggerResult dataclass

A trigger result from applying a triggering filter and threshold to a TrueBqBin data source.

Source code in mass2/core/truebq_bin.py
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
@dataclass(frozen=True)
class TriggerResult:
    """A trigger result from applying a triggering filter and threshold to a TrueBqBin data source."""

    data_source: "TrueBqBin"
    filter_in: np.ndarray
    threshold: float
    trig_inds: np.ndarray
    limit_samples: int

    def plot(
        self, decimate: int = 10, n_limit: int = 100000, offset_raw: int = 0, x_axis_time_s: bool = False, ax: plt.Axes | None = None
    ) -> None:
        """Make a diagnostic plot of the trigger result."""
        if ax is None:
            plt.figure()
            ax = plt.gca()
        plt.sca(ax)

        # raw (full-resolution) index ranges
        raw_start = offset_raw
        raw_stop = raw_start + n_limit * decimate

        data = self.data_source.data

        # scaling for x-axis (applied after decimation)
        x_scale = self.data_source.frametime_s * decimate if x_axis_time_s else 1

        # raw filter output
        filt_raw = fast_apply_filter(data[raw_start:raw_stop], self.filter_in)

        # decimated data and filter
        data_dec = data[raw_start:raw_stop:decimate]
        filt_dec = filt_raw[::decimate]

        # truncate to the same length
        n = min(len(data_dec), len(filt_dec))
        data_dec = data_dec[:n]
        filt_dec = filt_dec[:n]

        # shared x-axis
        x_dec = np.arange(n) * x_scale

        # plot data + filter
        plt.plot(x_dec, data_dec, ".", label="data")
        plt.plot(x_dec, filt_dec, label="filter_out")

        plt.axhline(self.threshold, label="threshold")

        # trigger indices (raw) → restrict to plotted window → convert to decimated indices
        trig_inds_raw = (
            pl
            .DataFrame({"trig_inds": self.trig_inds})
            .filter(pl.col("trig_inds").is_between(raw_start, raw_stop))
            .to_series()
            .to_numpy()
        )
        trig_inds_dec = (trig_inds_raw - raw_start) // decimate
        # clip to avoid indexing past n
        trig_inds_dec = trig_inds_dec[trig_inds_dec < n]
        plt.plot(x_dec[trig_inds_dec], filt_dec[trig_inds_dec], "o", label="trig_inds filt")
        plt.plot(x_dec[trig_inds_dec], data_dec[trig_inds_dec], "o", label="trig_inds data")

        # labels
        plt.title(f"{self.data_source.description}, trigger result debug plot")
        plt.legend()
        plt.xlabel("time with arb offset / s" if x_axis_time_s else "sample number (decimated)")
        plt.ylabel("signal (arb)")

    def get_noise(
        self,
        n_dead_samples_after_pulse_trigger: int,
        n_record_samples: int,
        max_noise_triggers: int = 200,
    ) -> NoiseChannel:
        """Synthesize a NoiseChannel from the data source by finding time periods without pulse triggers."""
        noise_trigger_inds = get_noise_trigger_inds(
            self.trig_inds,
            n_dead_samples_after_pulse_trigger,
            n_record_samples,
            max_noise_triggers,
        )
        inds = noise_trigger_inds[noise_trigger_inds > 0]  # ensure all inds are greater than 0
        inds = inds[inds < (len(self.data_source.data) - n_record_samples)]  # ensure all inds inbounds
        pulses = gather_pulses_from_inds_numpy_contiguous(
            self.data_source.data,
            npre=0,
            nsamples=n_record_samples,
            inds=inds,
        )
        df = pl.DataFrame({"pulse": pulses, "framecount": inds})
        noise = NoiseChannel(
            df,
            header_df=self.data_source.header_df,
            frametime_s=self.data_source.frametime_s,
        )
        return noise

    def to_channel_copy_to_memory(
        self, noise_n_dead_samples_after_pulse_trigger: int, npre: int, npost: int, invert: bool = False
    ) -> Channel:
        """Create a Channel object by copying pulse data into memory."""
        noise = self.get_noise(
            noise_n_dead_samples_after_pulse_trigger,
            npre + npost,
            max_noise_triggers=1000,
        )
        inds = self.trig_inds[self.trig_inds > npre]
        inds = inds[inds < (len(self.data_source.data) - npre - npost)]  # ensure all inds inbounds
        pulses = gather_pulses_from_inds_numpy_contiguous(self.data_source.data, npre=npre, nsamples=npre + npost, inds=inds)
        assert pulses.shape[0] == len(inds), "pulses and trig_inds must have the same length"
        if invert:
            df = pl.DataFrame({"pulse": pulses * -1, "framecount": inds})
        else:
            df = pl.DataFrame({"pulse": pulses, "framecount": inds})
        ch_header = ChannelHeader(
            self.data_source.description,
            None,
            self.data_source.channel_number,
            self.data_source.frametime_s,
            npre,
            npre + npost,
            self.data_source.header_df,
        )
        ch = Channel(df, ch_header, npulses=len(pulses), noise=noise)
        return ch

    def to_channel_mmap(
        self,
        noise_n_dead_samples_after_pulse_trigger: int,
        npre: int,
        npost: int,
        invert: bool = False,
        verbose: bool = True,
    ) -> Channel:
        """Create a Channel object by memory-mapping pulse data from disk."""
        noise = self.get_noise(
            noise_n_dead_samples_after_pulse_trigger,
            npre + npost,
            max_noise_triggers=1000,
        )
        inds = self.trig_inds[self.trig_inds > npre]  # ensure all inds inbounds
        inds = inds[inds < (len(self.data_source.data) - npre - npost)]  # ensure all inds inbounds
        pulses = gather_pulses_from_inds_numpy_contiguous_mmap_with_cache(
            self.data_source.data,
            npre=npre,
            nsamples=npre + npost,
            inds=inds,
            bin_path=self.data_source.bin_path,
            verbose=verbose,
        )
        if invert:
            df = pl.DataFrame({"pulse": pulses * -1, "framecount": inds})
        else:
            df = pl.DataFrame({"pulse": pulses, "framecount": inds})
        ch_header = ChannelHeader(
            self.data_source.description,
            None,
            self.data_source.channel_number,
            self.data_source.frametime_s,
            npre,
            npre + npost,
            self.data_source.header_df,
        )
        ch = Channel(df, ch_header, npulses=len(pulses), noise=noise)
        return ch

get_noise(n_dead_samples_after_pulse_trigger, n_record_samples, max_noise_triggers=200)

Synthesize a NoiseChannel from the data source by finding time periods without pulse triggers.

Source code in mass2/core/truebq_bin.py
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
def get_noise(
    self,
    n_dead_samples_after_pulse_trigger: int,
    n_record_samples: int,
    max_noise_triggers: int = 200,
) -> NoiseChannel:
    """Synthesize a NoiseChannel from the data source by finding time periods without pulse triggers."""
    noise_trigger_inds = get_noise_trigger_inds(
        self.trig_inds,
        n_dead_samples_after_pulse_trigger,
        n_record_samples,
        max_noise_triggers,
    )
    inds = noise_trigger_inds[noise_trigger_inds > 0]  # ensure all inds are greater than 0
    inds = inds[inds < (len(self.data_source.data) - n_record_samples)]  # ensure all inds inbounds
    pulses = gather_pulses_from_inds_numpy_contiguous(
        self.data_source.data,
        npre=0,
        nsamples=n_record_samples,
        inds=inds,
    )
    df = pl.DataFrame({"pulse": pulses, "framecount": inds})
    noise = NoiseChannel(
        df,
        header_df=self.data_source.header_df,
        frametime_s=self.data_source.frametime_s,
    )
    return noise

plot(decimate=10, n_limit=100000, offset_raw=0, x_axis_time_s=False, ax=None)

Make a diagnostic plot of the trigger result.

Source code in mass2/core/truebq_bin.py
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
def plot(
    self, decimate: int = 10, n_limit: int = 100000, offset_raw: int = 0, x_axis_time_s: bool = False, ax: plt.Axes | None = None
) -> None:
    """Make a diagnostic plot of the trigger result."""
    if ax is None:
        plt.figure()
        ax = plt.gca()
    plt.sca(ax)

    # raw (full-resolution) index ranges
    raw_start = offset_raw
    raw_stop = raw_start + n_limit * decimate

    data = self.data_source.data

    # scaling for x-axis (applied after decimation)
    x_scale = self.data_source.frametime_s * decimate if x_axis_time_s else 1

    # raw filter output
    filt_raw = fast_apply_filter(data[raw_start:raw_stop], self.filter_in)

    # decimated data and filter
    data_dec = data[raw_start:raw_stop:decimate]
    filt_dec = filt_raw[::decimate]

    # truncate to the same length
    n = min(len(data_dec), len(filt_dec))
    data_dec = data_dec[:n]
    filt_dec = filt_dec[:n]

    # shared x-axis
    x_dec = np.arange(n) * x_scale

    # plot data + filter
    plt.plot(x_dec, data_dec, ".", label="data")
    plt.plot(x_dec, filt_dec, label="filter_out")

    plt.axhline(self.threshold, label="threshold")

    # trigger indices (raw) → restrict to plotted window → convert to decimated indices
    trig_inds_raw = (
        pl
        .DataFrame({"trig_inds": self.trig_inds})
        .filter(pl.col("trig_inds").is_between(raw_start, raw_stop))
        .to_series()
        .to_numpy()
    )
    trig_inds_dec = (trig_inds_raw - raw_start) // decimate
    # clip to avoid indexing past n
    trig_inds_dec = trig_inds_dec[trig_inds_dec < n]
    plt.plot(x_dec[trig_inds_dec], filt_dec[trig_inds_dec], "o", label="trig_inds filt")
    plt.plot(x_dec[trig_inds_dec], data_dec[trig_inds_dec], "o", label="trig_inds data")

    # labels
    plt.title(f"{self.data_source.description}, trigger result debug plot")
    plt.legend()
    plt.xlabel("time with arb offset / s" if x_axis_time_s else "sample number (decimated)")
    plt.ylabel("signal (arb)")

to_channel_copy_to_memory(noise_n_dead_samples_after_pulse_trigger, npre, npost, invert=False)

Create a Channel object by copying pulse data into memory.

Source code in mass2/core/truebq_bin.py
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
def to_channel_copy_to_memory(
    self, noise_n_dead_samples_after_pulse_trigger: int, npre: int, npost: int, invert: bool = False
) -> Channel:
    """Create a Channel object by copying pulse data into memory."""
    noise = self.get_noise(
        noise_n_dead_samples_after_pulse_trigger,
        npre + npost,
        max_noise_triggers=1000,
    )
    inds = self.trig_inds[self.trig_inds > npre]
    inds = inds[inds < (len(self.data_source.data) - npre - npost)]  # ensure all inds inbounds
    pulses = gather_pulses_from_inds_numpy_contiguous(self.data_source.data, npre=npre, nsamples=npre + npost, inds=inds)
    assert pulses.shape[0] == len(inds), "pulses and trig_inds must have the same length"
    if invert:
        df = pl.DataFrame({"pulse": pulses * -1, "framecount": inds})
    else:
        df = pl.DataFrame({"pulse": pulses, "framecount": inds})
    ch_header = ChannelHeader(
        self.data_source.description,
        None,
        self.data_source.channel_number,
        self.data_source.frametime_s,
        npre,
        npre + npost,
        self.data_source.header_df,
    )
    ch = Channel(df, ch_header, npulses=len(pulses), noise=noise)
    return ch

to_channel_mmap(noise_n_dead_samples_after_pulse_trigger, npre, npost, invert=False, verbose=True)

Create a Channel object by memory-mapping pulse data from disk.

Source code in mass2/core/truebq_bin.py
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
def to_channel_mmap(
    self,
    noise_n_dead_samples_after_pulse_trigger: int,
    npre: int,
    npost: int,
    invert: bool = False,
    verbose: bool = True,
) -> Channel:
    """Create a Channel object by memory-mapping pulse data from disk."""
    noise = self.get_noise(
        noise_n_dead_samples_after_pulse_trigger,
        npre + npost,
        max_noise_triggers=1000,
    )
    inds = self.trig_inds[self.trig_inds > npre]  # ensure all inds inbounds
    inds = inds[inds < (len(self.data_source.data) - npre - npost)]  # ensure all inds inbounds
    pulses = gather_pulses_from_inds_numpy_contiguous_mmap_with_cache(
        self.data_source.data,
        npre=npre,
        nsamples=npre + npost,
        inds=inds,
        bin_path=self.data_source.bin_path,
        verbose=verbose,
    )
    if invert:
        df = pl.DataFrame({"pulse": pulses * -1, "framecount": inds})
    else:
        df = pl.DataFrame({"pulse": pulses, "framecount": inds})
    ch_header = ChannelHeader(
        self.data_source.description,
        None,
        self.data_source.channel_number,
        self.data_source.frametime_s,
        npre,
        npre + npost,
        self.data_source.header_df,
    )
    ch = Channel(df, ch_header, npulses=len(pulses), noise=noise)
    return ch

TrueBqBin dataclass

Represents a binary data file from the True Bequerel project.

Source code in mass2/core/truebq_bin.py
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
@dataclass(frozen=True)
class TrueBqBin:
    """Represents a binary data file from the True Bequerel project."""

    bin_path: Path
    description: str
    channel_number: int
    header_df: pl.DataFrame
    frametime_s: float
    voltage_scale: float
    data: np.ndarray
    # the bin file is a continuous data aqusition, untriggered

    @classmethod
    def load(cls, bin_path: str | Path) -> "TrueBqBin":
        """Create a TrueBqBin object by memory-mapping the given binary file."""
        bin_path = Path(bin_path)
        try:
            # for when it's named like dev2_ai6
            channel_number = int(str(bin_path.parent)[-1])
        except ValueError:
            # for when it's named like 2A
            def bay2int(bay: str) -> int:
                """Convert a bay name like '2A' to a channel number like 4."""
                return (int(bay[0]) - 1) * 4 + "ABCD".index(bay[1].upper())

            channel_number = bay2int(str(bin_path.parent.stem))
        desc = str(bin_path.parent.parent.stem) + "_" + str(bin_path.parent.stem)
        header_np = np.memmap(bin_path, dtype=header_dtype, mode="r", offset=0, shape=1)
        sample_rate_hz = header_np["sample_rate_hz"][0]
        header_df = pl.from_numpy(header_np)
        data = np.memmap(bin_path, dtype=np.int16, mode="r", offset=68)
        return cls(
            bin_path,
            desc,
            channel_number,
            header_df,
            1 / sample_rate_hz,
            header_np["voltage_scale"][0],
            data,
        )

    def trigger(self, filter_in: NDArray, threshold: float, limit_hours: float | None = None, verbose: bool = True) -> TriggerResult:
        """Compute trigger indices by applying the given filter and threshold to the data."""
        if limit_hours is None:
            limit_samples = len(self.data)
        else:
            limit_samples = int(limit_hours * 3600 / self.frametime_s)
        trig_inds = _fasttrig_filter_trigger_with_cache(self.data, filter_in, threshold, limit_samples, self.bin_path, verbose=verbose)
        return TriggerResult(self, filter_in, threshold, trig_inds, limit_samples)

load(bin_path) classmethod

Create a TrueBqBin object by memory-mapping the given binary file.

Source code in mass2/core/truebq_bin.py
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
@classmethod
def load(cls, bin_path: str | Path) -> "TrueBqBin":
    """Create a TrueBqBin object by memory-mapping the given binary file."""
    bin_path = Path(bin_path)
    try:
        # for when it's named like dev2_ai6
        channel_number = int(str(bin_path.parent)[-1])
    except ValueError:
        # for when it's named like 2A
        def bay2int(bay: str) -> int:
            """Convert a bay name like '2A' to a channel number like 4."""
            return (int(bay[0]) - 1) * 4 + "ABCD".index(bay[1].upper())

        channel_number = bay2int(str(bin_path.parent.stem))
    desc = str(bin_path.parent.parent.stem) + "_" + str(bin_path.parent.stem)
    header_np = np.memmap(bin_path, dtype=header_dtype, mode="r", offset=0, shape=1)
    sample_rate_hz = header_np["sample_rate_hz"][0]
    header_df = pl.from_numpy(header_np)
    data = np.memmap(bin_path, dtype=np.int16, mode="r", offset=68)
    return cls(
        bin_path,
        desc,
        channel_number,
        header_df,
        1 / sample_rate_hz,
        header_np["voltage_scale"][0],
        data,
    )

trigger(filter_in, threshold, limit_hours=None, verbose=True)

Compute trigger indices by applying the given filter and threshold to the data.

Source code in mass2/core/truebq_bin.py
306
307
308
309
310
311
312
313
def trigger(self, filter_in: NDArray, threshold: float, limit_hours: float | None = None, verbose: bool = True) -> TriggerResult:
    """Compute trigger indices by applying the given filter and threshold to the data."""
    if limit_hours is None:
        limit_samples = len(self.data)
    else:
        limit_samples = int(limit_hours * 3600 / self.frametime_s)
    trig_inds = _fasttrig_filter_trigger_with_cache(self.data, filter_in, threshold, limit_samples, self.bin_path, verbose=verbose)
    return TriggerResult(self, filter_in, threshold, trig_inds, limit_samples)

fast_apply_filter(data, filter_in)

Apply a filter to the data, returning the filter output.

Source code in mass2/core/truebq_bin.py
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
@njit
def fast_apply_filter(data: NDArray, filter_in: NDArray) -> NDArray:
    """Apply a filter to the data, returning the filter output."""
    cache = np.zeros(len(filter_in))
    filter = np.zeros(len(filter_in))
    filter[:] = filter_in
    filter_len = len(filter)
    filter_out = np.zeros(len(data) - len(filter))
    j = 0
    jmax = len(data) - filter_len - 1
    while j <= jmax:
        cache[:] = data[j : (j + filter_len)]
        filter_out[j] = np.dot(cache, filter)
        j += 1
    return filter_out

fasttrig_filter_trigger(data, filter_in, threshold, verbose)

Apply a filter to the data and return trigger indices where the filter output crosses the threshold.

Source code in mass2/core/truebq_bin.py
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
@njit
def fasttrig_filter_trigger(data: NDArray, filter_in: NDArray, threshold: float, verbose: bool) -> NDArray:
    """Apply a filter to the data and return trigger indices where the filter output crosses the threshold."""
    assert threshold > 0, "algorithm assumes we trigger with positive threshold, change sign of filter_in to accomodate"
    filter_len = len(filter_in)
    inds = []
    jmax = len(data) - filter_len - 1
    # njit only likes float64s, so I'm trying to force float64 use without allocating a ton of memory
    cache = np.zeros(len(filter_in))
    filter = np.zeros(len(filter_in))
    filter[:] = filter_in
    # intitalize a,b,c
    j = 0
    cache[:] = data[j : (j + filter_len)]
    b = np.dot(cache, filter)
    a = b  # won't be used, just need same type
    j = 1
    cache[:] = data[j : (j + filter_len)]
    c = np.dot(cache, filter)
    j = 2
    ready = False
    prog_step = jmax // 100
    prog_ticks = 0
    while j <= jmax:
        if j % prog_step == 0:
            prog_ticks += 1
            if verbose:
                print(f"fasttrig_filter_trigger {prog_ticks}/{100}")
        a, b = b, c
        cache[:] = data[j : (j + filter_len)]
        c = np.dot(cache, filter)
        if b > threshold and b >= c and b > a and ready:
            inds.append(j)
            ready = False
        if b < 0:  # hold off on retriggering until we see opposite sign slope
            ready = True
        j += 1
    return np.array(inds)

filter_and_residual_rms(data, chosen_filter, avg_pulse, trig_inds, npre, nsamples, polarity)

Apply a filter to pulses extracted from data at the given trigger indices, returning filter values and residual RMS.

Source code in mass2/core/truebq_bin.py
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
def filter_and_residual_rms(
    data: NDArray, chosen_filter: NDArray, avg_pulse: NDArray, trig_inds: NDArray, npre: int, nsamples: int, polarity: int
) -> tuple[NDArray, NDArray, NDArray]:
    """Apply a filter to pulses extracted from data at the given trigger indices, returning filter values and residual RMS."""
    filt_value = np.zeros(len(trig_inds))
    residual_rms = np.zeros(len(trig_inds))
    filt_value_template = np.zeros(len(trig_inds))
    template = avg_pulse - np.mean(avg_pulse)
    template /= np.sqrt(np.dot(template, template))
    for i in range(len(trig_inds)):
        j = trig_inds[i]
        pulse = data[j - npre : j + nsamples - npre] * polarity
        pulse -= pulse.mean()
        filt_value[i] = np.dot(chosen_filter, pulse)
        filt_value_template[i] = np.dot(template, pulse)
        residual = pulse - template * filt_value_template[i]
        residual_rms_val = misc.root_mean_squared(residual)
        residual_rms[i] = residual_rms_val
    return filt_value, residual_rms, filt_value_template

gather_pulses_from_inds_numpy_contiguous(data, npre, nsamples, inds)

Gather pulses from data at the given trigger indices, returning a contiguous numpy array.

Source code in mass2/core/truebq_bin.py
442
443
444
445
446
447
448
449
450
def gather_pulses_from_inds_numpy_contiguous(data: NDArray, npre: int, nsamples: int, inds: NDArray) -> NDArray:
    """Gather pulses from data at the given trigger indices, returning a contiguous numpy array."""
    assert all(inds > npre), "all inds must be greater than npre"
    assert all(inds < (len(data) - nsamples)), "all inds must be less than len(data) - nsamples"
    offsets = inds - npre  # shift by npre to start at correct offset
    pulses = np.zeros((len(offsets), nsamples), dtype=np.int16)
    for i, offset in enumerate(offsets):
        pulses[i, :] = data[offset : offset + nsamples]
    return pulses

gather_pulses_from_inds_numpy_contiguous_mmap(data, npre, nsamples, inds, filename='.mmapped_pulses.npy')

Gather pulses from data at the given trigger indices, returning a memory-mapped numpy array.

Source code in mass2/core/truebq_bin.py
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
def gather_pulses_from_inds_numpy_contiguous_mmap(
    data: NDArray, npre: int, nsamples: int, inds: NDArray, filename: str | Path = ".mmapped_pulses.npy"
) -> NDArray:
    """Gather pulses from data at the given trigger indices, returning a memory-mapped numpy array."""
    assert all(inds > npre), "all inds must be greater than npre"
    assert all(inds < (len(data) - nsamples)), "all inds must be less than len(data) - nsamples"
    offsets = inds - npre  # shift by npre to start at correct offset
    pulses = np.memmap(filename, dtype=np.int16, mode="w+", shape=(len(offsets), nsamples))
    for i, offset in enumerate(offsets):
        pulses[i, :] = data[offset : offset + nsamples]
    pulses.flush()
    # re-open the mmap to ensure it is read-only
    del pulses
    pulses = np.memmap(filename, dtype=np.int16, mode="r", shape=(len(offsets), nsamples))
    return pulses

gather_pulses_from_inds_numpy_contiguous_mmap_with_cache(data, npre, nsamples, inds, bin_path, verbose=True)

Gather pulses from data at the given trigger indices, returning a memory-mapped numpy array, using a cache.

Source code in mass2/core/truebq_bin.py
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
def gather_pulses_from_inds_numpy_contiguous_mmap_with_cache(
    data: NDArray, npre: int, nsamples: int, inds: NDArray, bin_path: Path | str, verbose: bool = True
) -> NDArray | np.memmap:
    """Gather pulses from data at the given trigger indices, returning a memory-mapped numpy array, using a cache."""
    bin_full_path = Path(bin_path).absolute()
    inds = inds[inds > npre]  # ensure all inds inbounds
    inds = inds[inds < (len(data) - nsamples)]  # ensure all inds inbounds
    inds_hash = hashlib.sha256(inds.tobytes()).hexdigest()
    to_hash_str = str(npre) + str(nsamples) + str(bin_full_path) + inds_hash
    key = hashlib.sha256(to_hash_str.encode()).hexdigest()
    fname = f".{key}.truebq_pulse_cache.npy"
    cache_dir_path = bin_full_path.parent / "_truebq_bin_cache"
    cache_dir_path.mkdir(exist_ok=True)
    file_path = cache_dir_path / fname
    inds = np.array(inds)
    if file_path.is_file():
        # check if the file is the right size
        Nbytes = len(inds) * nsamples * 2  # 2 bytes per int16
        if file_path.stat().st_size != Nbytes:
            # on windows the error if the file is the wrong size makes it sound like you don't have enough memory
            # and python doesn't seem to catch the exception, so we check the size here
            if verbose:
                print(f"pulse cache is corrupted, re-gathering pulses for {file_path}")
            file_path.unlink()
            cache_hit = False
        else:
            cache_hit = True
    else:
        cache_hit = False

    if cache_hit:
        if verbose:
            print(f"pulse cache hit for {file_path}")
        return np.memmap(file_path, dtype=np.int16, mode="r", shape=(len(inds), nsamples))
    if verbose:
        print(f"pulse cache miss for {file_path}")
    return gather_pulses_from_inds_numpy_contiguous_mmap(data, npre, nsamples, inds, filename=file_path)

get_noise_trigger_inds(pulse_trigger_inds, n_dead_samples_after_previous_pulse, n_record_samples, max_noise_triggers)

Get trigger indices for noise periods, avoiding pulses.

Source code in mass2/core/truebq_bin.py
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
def get_noise_trigger_inds(
    pulse_trigger_inds: ArrayLike,
    n_dead_samples_after_previous_pulse: int,
    n_record_samples: int,
    max_noise_triggers: int,
) -> NDArray:
    """Get trigger indices for noise periods, avoiding pulses."""
    pulse_trigger_inds = np.asarray(pulse_trigger_inds)
    diffs = np.diff(pulse_trigger_inds)
    inds = []
    for i in range(len(diffs)):
        if diffs[i] > n_dead_samples_after_previous_pulse:
            n_make = (diffs[i] - n_dead_samples_after_previous_pulse) // n_record_samples
            ind0 = pulse_trigger_inds[i] + n_dead_samples_after_previous_pulse
            for j in range(n_make):
                inds.append(ind0 + n_record_samples * j)
                if len(inds) == max_noise_triggers:
                    return np.array(inds)
    return np.array(inds)

write_truebq_bin_file(path, data, sample_rate_hz, *, voltage_scale=1.0, format_version=1, schema_version=1, data_reduction_factor=1, acquisition_flags=0, start_time=None, stop_time=None)

Write a binary file that can be opened by TrueBqBin.load().

This function writes data efficiently without copying by using memory mapping and direct file operations.

Args: path: Output file path data: Data array to write (will be converted to int16 if not already) sample_rate_hz: Sample rate in Hz voltage_scale: Voltage scaling factor format_version: File format version (default: 1) schema_version: Schema version (default: 1) data_reduction_factor: Data reduction factor (default: 1) acquisition_flags: Acquisition flags (default: 0) start_time: Start time as uint64 array of length 2 (optional) stop_time: Stop time as uint64 array of length 2 (optional)

Source code in mass2/core/truebq_bin.py
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
def write_truebq_bin_file(
    path: str | Path,
    data: np.ndarray,
    sample_rate_hz: float,
    *,  # force keyword only
    voltage_scale: float = 1.0,
    format_version: int = 1,
    schema_version: int = 1,
    data_reduction_factor: int = 1,
    acquisition_flags: int = 0,
    start_time: np.ndarray | None = None,
    stop_time: np.ndarray | None = None,
) -> None:
    """
    Write a binary file that can be opened by TrueBqBin.load().

    This function writes data efficiently without copying by using memory mapping
    and direct file operations.

    Args:
        path: Output file path
        data: Data array to write (will be converted to int16 if not already)
        sample_rate_hz: Sample rate in Hz
        voltage_scale: Voltage scaling factor
        format_version: File format version (default: 1)
        schema_version: Schema version (default: 1)
        data_reduction_factor: Data reduction factor (default: 1)
        acquisition_flags: Acquisition flags (default: 0)
        start_time: Start time as uint64 array of length 2 (optional)
        stop_time: Stop time as uint64 array of length 2 (optional)
    """
    path = Path(path)
    path.parent.mkdir(parents=True, exist_ok=True)

    # Ensure data is int16 (convert if necessary, but avoid unnecessary copying)
    if data.dtype != np.int16:
        if not np.can_cast(data.dtype, np.int16, casting="safe"):
            print(f"Warning: Converting data from {data.dtype} to int16 may cause data loss")
        data = data.astype(np.int16)

    # Prepare header
    num_samples = len(data)

    # Default time values if not provided
    if start_time is None:
        start_time = np.array([0, 0], dtype=np.uint64)
    if stop_time is None:
        stop_time = np.array([0, 0], dtype=np.uint64)

    # Create header array
    header = np.array(
        [
            (
                format_version,
                schema_version,
                sample_rate_hz,
                data_reduction_factor,
                voltage_scale,
                acquisition_flags,
                start_time,
                stop_time,
                num_samples,
            )
        ],
        dtype=header_dtype,
    )

    # Create the file with the correct size
    with open(path, "wb") as f:
        # Write header
        f.write(header.tobytes())

        # For large data arrays, write in chunks to avoid memory issues
        chunk_size = 1024 * 1024  # 1MB chunks

        if data.nbytes <= chunk_size:
            # Small data, write directly
            f.write(data.tobytes())
        else:
            # Large data, write in chunks
            data_flat = data.ravel()  # Flatten without copying if possible
            for i in range(0, len(data_flat), chunk_size // data.itemsize):
                chunk = data_flat[i : i + chunk_size // data.itemsize]
                f.write(chunk.tobytes())

Various utility functions and classes:

  • MouseClickReader: a class to use as a callback for reading mouse click locations in matplotlib plots.
  • InlineUpdater: a class that loops over a generator and prints a message to the terminal each time it yields.

InlineUpdater

A class to print progress updates to the terminal.

Source code in mass2/core/utilities.py
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
class InlineUpdater:
    """A class to print progress updates to the terminal."""

    def __init__(self, baseString: str):
        self.fracDone = 0.0
        self.minElapseTimeForCalc = 1.0
        self.startTime = time.time()
        self.baseString = baseString
        self.logger = logging.getLogger("mass")

    def update(self, fracDone: float) -> None:
        """Update the progress to the given fraction done."""
        if self.logger.getEffectiveLevel() >= logging.WARNING:
            return
        self.fracDone = fracDone
        sys.stdout.write(f"\r{self.baseString} {self.fracDone * 100.0:.1f}% done, estimated {self.timeRemainingStr} left")
        sys.stdout.flush()
        if fracDone >= 1:
            sys.stdout.write(f"\n{self.baseString} finished in {self.elapsedTimeStr}\n")

    @property
    def timeRemaining(self) -> float:
        """Estimate of time remaining in seconds, or -1 if not enough information yet."""
        if self.elapsedTimeSec > self.minElapseTimeForCalc and self.fracDone > 0:
            fracRemaining = 1 - self.fracDone
            rate = self.fracDone / self.elapsedTimeSec
            try:
                return fracRemaining / rate
            except ZeroDivisionError:
                return -1
        else:
            return -1

    @property
    def timeRemainingStr(self) -> str:
        """String version of time-remaining estimate."""
        timeRemaining = self.timeRemaining
        if timeRemaining == -1:
            return "?"
        else:
            return "%.1f min" % (timeRemaining / 60.0)

    @property
    def elapsedTimeSec(self) -> float:
        """Elapsed time in seconds since the creation of this object."""
        return time.time() - self.startTime

    @property
    def elapsedTimeStr(self) -> str:
        """String version of elapsed time."""
        return "%.1f min" % (self.elapsedTimeSec / 60.0)

elapsedTimeSec property

Elapsed time in seconds since the creation of this object.

elapsedTimeStr property

String version of elapsed time.

timeRemaining property

Estimate of time remaining in seconds, or -1 if not enough information yet.

timeRemainingStr property

String version of time-remaining estimate.

update(fracDone)

Update the progress to the given fraction done.

Source code in mass2/core/utilities.py
28
29
30
31
32
33
34
35
36
def update(self, fracDone: float) -> None:
    """Update the progress to the given fraction done."""
    if self.logger.getEffectiveLevel() >= logging.WARNING:
        return
    self.fracDone = fracDone
    sys.stdout.write(f"\r{self.baseString} {self.fracDone * 100.0:.1f}% done, estimated {self.timeRemainingStr} left")
    sys.stdout.flush()
    if fracDone >= 1:
        sys.stdout.write(f"\n{self.baseString} finished in {self.elapsedTimeStr}\n")

NullUpdater

A do-nothing updater class with the same API as InlineUpdater.

Source code in mass2/core/utilities.py
71
72
73
74
75
76
class NullUpdater:
    """A do-nothing updater class with the same API as InlineUpdater."""

    def update(self, f: float) -> None:
        """Do nothing."""
        pass

update(f)

Do nothing.

Source code in mass2/core/utilities.py
74
75
76
def update(self, f: float) -> None:
    """Do nothing."""
    pass

show_progress(name)

A decorator to show progress updates for another function.

Source code in mass2/core/utilities.py
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
def show_progress(name: str) -> Callable:
    """A decorator to show progress updates for another function."""

    def decorator(func: Callable) -> Callable:
        """A decorator to show progress updates for another function."""

        @functools.wraps(func)
        def work(self: Any, *args: Any, **kwargs: Any) -> None:
            """Update the progress of the wrapped function."""
            try:
                if "sphinx" in sys.modules:  # supress output during doctests
                    print_updater = NullUpdater()
                else:
                    print_updater = self.updater(name)
            except TypeError:
                print_updater = NullUpdater()

            for d in func(self, *args, **kwargs):
                print_updater.update(d)

        return work

    return decorator

Other docstrings

See also Other docstrings for modules other than mass2.core.