### **RT2012** Program and Book of Abstracts

#### **Open 1: Welcome and Opening Session**

Monday, June 11 8:30-10:40 Crystal Ballroom

Open 1-1: Welcome <u>S. Zimmermann</u> *LBNL, Berkeley, USA* 

#### Open 1-2: Dark Energy from the Largest Galaxy Maps D. Schlegel

LBNL, Berkeley, USA

Dark energy is a phenomena causing the Universe to expand more rapidly than can be explained by Einstein's laws of gravity. Its discovery merited the 2011 Nobel Prize in Physics.

The effects of dark energy imprint on large galaxy maps. The Sloan Digital Sky Survey has mapped 200 million galaxies in 2-D, and 2 million galaxies in 3-D. The most recent maps are consistent with the "simplest" modification of Einstein's laws, the additional of a cosmological constant.

BigBOSS will produce the largest 3-D map of the Universe. Starting in 2017, it will map 20 million galaxies and 4 million quasars at an average distance half-way across the visible Universe. From these maps, it will be possible to measure the expansion history and the effects of dark energy to sub-percent precision.

The data challenges from these surveys have evolved. In 1998, the Sloan Survey presented a difficult challenge to store all the bits from the detectors. Future challenges will be to effectively "re-observe" these raw data streams many times, forward modeling cosmological models directly onto the data.

#### **Open 1-3: National Ignition Facility Integrated Computer Control Systems**

C. Marshall, G. Brunton, A. Casey, R. Demaret, J. Fisher, T. Frazier, L. Lagin, B. Reed, R. Shelton, O. Edwards Lawrence Livermore National Laboratory, Livermore, California, USA

The National Ignition Facility (NIF) at the Lawrence Livermore National Laboratory is a stadium-sized facility that will contains a 192-beam, 1.8-Megajoule, 500-Terawatt, ultraviolet laser system together with a 10-meter diameter target chamber with room for multiple experimental diagnostics. NIF is the world's largest and most energetic laser experimental system, providing a scientific center to study inertial confinement fusion (ICF) and matter at extreme energy densities and pressures. NIF's laser beams are designed to compress fusion targets to conditions required for thermonuclear burn. NIF is operated by the Integrated Computer Control System (ICCS) in an object-oriented, CORBA-based system distributed among over 1800 front-end processors, embedded controllers and supervisory servers. In the fall of 2010, a set of experiments began with deuterium and tritium filled targets as part of the National Ignition Campaign (NIC). At present, all 192 laser beams routinely fire to target chamber center to conduct fusion and high energy density experiments. During the past 2 years, the control system was expanded to include automation of cryogenic target system and over 25 diagnostic systems to support fusion experiments were deployed and utilized in experiments in the past year. Real time controls requirements and precision timing range from 100 ms to 20 ps for various systems to support these experiments. This talk discusses the current status of the NIF and the plans for controls and information systems to support these experiments on the path to ignition. \* This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

#### Open 1-4: LHC Trigger & DAQ - an Introductory Overview

N. Neufeld

PH, CERN, Geneva, Switzerland

In this paper I will review the 4 large LHC Trigger and DAQ systems, focusing on the technological and engineering challenges and trying to bring out commonalities in both technology and approach, without omitting to hint at interesting variations. In the trigger part the emphasis will be on implementation and the specific LHC challenges, the physics will only be touched upon as much as is necessary to motivate design decisions.

As the LHC experiments are preparing the first major round of upgrades, this is a very good moment to look at lessons learned from the operation so far, at successful approaches which will be continued and at new, promising ideas.

#### **Open 2: Opening Session 2**

### Monday, June 11 11:05-12:25 Crystal Ballroom Open 2-1: The Development of Large-Area Photodetectors with Sub-Millimeter and Sub-Nanosecond Space and Time Resolutions

H. J. Frisch

University of Chicago,: Enrico Fermi Institute and Physics Dept., Chicago, United States

New detector techniques are `disruptive' in that they enable major changes in science-\cite{horn}. The availability of meter-squared photodetectors with the intrinsic ability to resolve each photon in space and time with deep (at least an order of magnitude) sub-millimeter and sub-nanosecond resolutions would be such a technology. The goal of the Large-Area Psec PhotoDetector Collaboration (LAPPD) is to use recent advances in material science, custom ASIC accessibility, inexpensive FPGA's, GPU's, and commodity PC's for data acquisition, and the

unique properties of micro-channel plates (MCP's) to develop a family of detectors that have such resolution and can be optimized for a wide range of applications. The table below gives a brief summary of the needs, approach, benefits, and competition (NABC~\cite{SRI}) for several of these. The figure shows: (Left) a 0.5-m\$^2\$ `supermodule', composed of (Right) twelve 20-cm-square MCP-PMT's (`tiles').

#### **Open 2-2: Trigger in HEP: Selected Topics for Young Experimentalists**

T. Liu

Particle Physics Division, FNAL, Batavia, United States

Trigger system, often considered as the brain of an experiment, has played a crucial role in particle physics experiments over the past decades. As such, trigger system is intimately connected to the physics goals of the experiment, both at the design stage and the data taking stage. As luminosity or intensity continues to increase, in the ultimate quest for new physics beyond the Standard Model, the role of trigger will become ever more important in the future. In fact, the ultimate physics reach at high luminosity LHC will critically depend on the triggering capabilities of its experiments, and the trigger challenges there will be enormous. In this talk, we will take a brief look at the past, present and what challenges might lie ahead of us, with the view from physics motivations, technological challenges, interplay between particle physics and industry, lessons learned from the past experiences, and perhaps even some of the sociological aspects of the interplay between detector builders and trigger designers. This talk is meant for the young people, the next generation trigger experts.

#### NSET: New Standard, Emerging Technologies and TCA

### Monday, June 11 13:40-14:40 Crystal Ballroom NSET-1: HTML 5, Websockets and Sproutcore - Using Emerging Technologies to Control the Dark Energy Camera (DECam)

<u>K. Honscheid</u><sup>1</sup>, A. Elliott<sup>1</sup>, K. Patton<sup>1</sup>, E. Suchyta<sup>1</sup>, E. Buckley Geer<sup>2</sup> <sup>1</sup>Physics, Ohio State University, Columbus, OH, United States <sup>2</sup>Fermi National Accelerator Laboratory, Batavia, IL, United States

We report on the use of new technologies and web standards for the development of the readout and control system for the Dark Energy Camera (DECam). DECam is the new instrument for the Dark Energy Survey. It is one of the largest CCD cameras ever constructed and is currently being installed on the Blanco 4-m telescope at the Cerro Tololo Inter-American Observatory (CTIO). The readout and control requirements for this instrument are similar to those found at particle and nuclear physics experiments. The demands on the user interface, however, are quite different. DECam will not only by used by the Dark Energy Survey collaboration but for 2/3 of each year is also available to the astronomy community. Individual observers will spend only a few nights at the observatory and they must be able to operate the instrument by themselves. Furthermore, the system must provide support for remote observing since not everybody will be able to travel to CTIO. Web browsers are by far the most familar graphical user interface. They are available on many platforms and remote access is built right into the protocol. Over the past years browser technology has made significant leaps in speed and functionality. WebSockets, included in the emerging HMTL-5 standard and supported by all major web browsers, can provide Unix socket type connections between the browser and the server. This can be used to 'push' data, asynchronously, to the browser. In addition, modern browsers have dramatically increased the speed of interpreting Javascript, the code behind almost every dynamic web page. Once thought of as static documents, web pages can now support the full range of functionality expected in a graphical user interface. The development of browser-based user interfaces is further aided by several open source frameworks such as SproutCore that strive to implement a full functional application framework, similar to Microsoft's .Net or Apple's Cocoa, in the browser context. These frameworks further abstract the browser away as simply a drawing and operating context, removing the requirement for direct manipulation of the browser and it's underlying DOM model.

We will present our design for the DECam user interfaces and discuss our experience with these new technologies.

# NSET-2: Recent Progress in Next-Generation Platform Standards for Physics Instrumentation and Controls

#### R. Larsen

Instrumenetation & Controls, SLAC National Accelerator Laboratory, Menlo Park CA, United States

There has been no major movement in architecture of modular instrument standards in accelerators and physics for two decades, but the architectures of technology drivers used by the physics communities have changed dramatically with the advent of field programmable logic arrays with imbedded high speed serial drivers and receivers, and in the past decade, the advent of ATCA, the Advanced Telecommunications Computer Architecture, and its daughter product MicroTCA platform. These are the first architectures designed specifically for very high availability of 0.99999 at the crate level, achieved through a combination of redundancy, Intelligent Platform Management Interface (IPMI) and true hot-swappability of failed or failing modules aided by all interconnections made through rear IO. Aside from this feature, the architecture is attractive as a replacement for aging Accelerator Controls infrastructure. The two new hardware standards, one for the ATCA RTM (PICMG 3.8) and one for a complete MTCA AMC-RTM and backplane timing and trigger distribution (MTCA.4), have been approved by the PICMG Committee. In addition a Timing Guideline specifying a methodology for clock and trigger distribution using optional serial lines in the standard ATCA backplane is close to completion. On the software side work continues to converge on Guidelines for a Standard Device Model as well as standard protocols for common physics needs such as FPGA addressing and fast time-stamping of incoming data. Participating laboratories and industry partners are collaborating to develop infrastructure crates, modules, controllers, switchers and timing solutions, as well as generic FPGA and high performance digitizer products that can be adapted to a range of uses through variants in Rear Transition Modules. New modules from industry and companion RTMs from laboratories have now appeared for RF, BPM and Industry Pack modules. Among the many challenges facing the introduction of any new standard, the major one today appears to be whether the laboratory engineering physics community has the motivation to adopt a discipline of collaboration among themselves and with industry to advance the standard in a form that benefits all parties economically, as well as achieving the major goal of building controls and data acquisition systems with potentially negligible downtime. A closely related goal is to reduce both engineering and production costs of state-of-the art products for all partners, the same motivation that drove the highly competitive telecom industry to develop and adopt the ATCA/MicroTCA platform that

offers a high degree of interchangeability of competing products. The physics communities have a new opportunity to take full advantage of this potential to counteract increasing machine costs and decreasing budgets. In summary, the lab-industry collaboration has resulted in a very fast hardware development cycle. Industry is now looking to apply the solutions for broader use which will greatly aid cost effectiveness of solutions for the cost-sensitive laboratories.

#### NSET-3: Scalable SpaceWire Backplane System Using uTCA T. Yuasa<sup>1</sup>, M. Nomachi<sup>2</sup>, T. Takahashi<sup>1</sup>, M. Ioki

<sup>1</sup>Astrophysics Department, Japan Aerospace Exploration Agency, Institute of Space and Astronautical Science, Kanagawa, Japan <sup>2</sup>Research Center for Nuclear Physics, Osaka University, Osaka, Japan <sup>3</sup>Institute for Unmanned Space Experiment Free Flyer, Tokyo, Japan

In space instrumentation, a new-generation network interface called SpaceWire has been gaining its momentum following the standardization by the European Space Agency. SpaceWire, providing a peer-to-peer serial data link and network capability with routers, can be used as an infrastructure of intra-satellite data communication, and replace a legacy shared-bus type data transfer interface such as MIL-STD-1553B.

The paper presents our recent development of "SpaceWire Backplane", a uTCA-based backplane system which interconnects AMC cards with SpaceWire interfaces. Physically, a SpaceWire link consists of 4 LVDS pairs, and therefore, a uTCA backplane can be used without almost no modification from the original design which assumes PCIexpress and GigabitEthernet as data link protocol. Our crate can host a crate controller module (or MCH) and 6 AMC cards each having 4 SpaceWire connections to the crate controller. We also developed a SpaceWire router as a mezzanine for a crate controller, which interconnects 24 SpaceWire links with the maximum operating frequency of 200 MHz. The system is thus scalable, and higly compact thanks to reduced number of cables.

In the talk, architecture of the SpaceWire backplane, available AMC modules, and typical usages are described followed by an example of deployment in an on-going Japanese scientific satellite program ASTRO-H for simulating a large-scale onboard SpaceWire network on ground.

#### MN1: Mini-oral 1

#### Monday, June 11 14:40-15:30 Crystal Ballroom

#### PS1-2: Automatic System-Level Synthesis for Heterogeneous Platforms H. A. Andrade, K. Ravindran

National Instruments Corporation, Berkeley, CA, United States

The field of "real-time data acquisition and computing applications in the physical sciences" presents by definition a cyber-physical system design problem that pushes the performance boundaries of the latest I/O, computation, communication, and storage technologies. This trend has forced designers to consider heterogeneous platforms that offer a good balance of performance, power efficiency, and cost. Unfortunately, this comes at a significant increase in programming complexity to match the efficiency gains in hardware, which translates to lower design productivity. The problem is more apparent today given that most system designers are domain experts in the physical sciences and not necessarily 'native' programming experts.

To alleviate this problem, we are prototyping an automated system level synthesis and exploration framework to deploy high level application specifications onto heterogeneous platforms. We view heterogeneous platforms as those platforms that have different types of computing, communication, storage, and I/O elements, but are considered together as one system target for purposes of design and deployment. Computing elements include traditional instruction processors, FPGAs, GPUs, or specialized processors or accelerators. The communication network consists of links that offer different topology, speed, and affinity. Storage could be part of computing nodes, or independent components, such as controllers for streaming to disk, or intelligent memory modules. The I/O elements define the boundaries of the system and are the main observable components at which the responsiveness of the system is measured.

The designer would define by aggregation a system-level target or platform, thereby making all the computing and I/O nodes available to a given set of applications. These applications are specified in suitable models of computation that intuitively capture the concurrency, data flow, timing, and control requirements of these applications. The application language balances expressibility and analyzability to enable automatic synthesis, simulation, and implementation of the system. The language is backed by analysis methods to reason about resource allocation and scheduling decisions across the heterogeneous components of the target. Problems that can be solved by this system-level synthesis framework include:

-Given an application and a system target specification, automatically map application to target

-Given an application, determine a suitable organization of the target and a mapping

-Identify mappings to ensure a specified reliability or other non-functional system properties

-Simulate and verify the system

-Manually map a subsystem if specific optimizations are needed

In this paper, we present key research directions in developing a framework for automatic system-level synthesis for heterogeneous platforms, share preliminary results, and discuss its use for applications of interest to the physical sciences community.

#### PS1-3: Monitoring and Improving the ALICE Data Taking Efficiency

V. Barroso<sup>1</sup>, F. Carena<sup>1</sup>, W. Carena<sup>1</sup>, S. Chapeland<sup>1</sup>, F. Costa<sup>1</sup>, E. Denes<sup>2</sup>, R. Divia<sup>1</sup>, A. Grigore<sup>3</sup>, G. Simonetti<sup>4</sup>, C. Soos<sup>1</sup>, A. Telesca<sup>1</sup>, P. Vande Vyvre<sup>1</sup>, B. von Haller<sup>1</sup>

CERN, Geneva, Switzerland

<sup>2</sup>Hungarian Academy of Sciences, Budapest, Hungary

<sup>3</sup>Polytechnic University of Bucharest, Bucharest, Romania

<sup>4</sup>Universita Bari, Bari, Italy

ALICE (A Large Ion Collider Experiment) is the heavy-ion experiment designed to study the physics of strongly interacting matter and the quark-gluon plasma at the CERN Large Hadron Collider (LHC). Since its successful start-up in 2010, the LHC has been performing

outstandingly, providing to the experiments long periods of stable collisions and an integrated luminosity that greatly exceeds the planned targets.

To fully explore these privileged conditions, it is paramount that the experiment's data taking efficiency during stable collisions is as high as possible. In ALICE, some of the greatest lessons learned in 2011 were how important it is to clearly identify the reasons of inefficiency, closely monitor the efficiency and make the information available to the whole collaboration.

This paper will describe how the ALICE Electronic Logbook (eLogbook) is used to recognize the main causes of inefficiency, helping decision making by providing quantitative information and allowing the Run Coordination team to identify, prioritize, address and follow them. It will also explain how the eLogbook is used to monitor the data taking efficiency, providing reports that allow the collaboration to portray its evolution and evaluate the measures taken to increase it. Finally, it will present the ALICE efficiency since the start-up of the LHC and the future plans to further support the Run Coordination activities.

#### **PS1-4: Open-Standard Blade Systems Enable High Performance Applications**

<u>S. McClellan<sup>1</sup></u>, K. Austin<sup>2</sup>, A. Deikman<sup>2</sup>

<sup>1</sup>Ingram School of Engineering, Texas State University, San Marcos, TX, United States <sup>2</sup>ZNYX Networks, Fremont, CA, United States

This paper describes a new category of mid-sized blade-systems which are fully compliant with ATCA specifications, but provide enhanced vendor-neutral configurability, low cost-of-entry, and extremely high processing density for network-intensive applications. Further, we propose a straightforward method for normalizing blade systems and describing the overall value of candidate system architectures. Our hope is that this approach to comparison will reveal characteristics of blade systems in terms of the primary driving factors for scientific applications: efficiency of system design-time, cost-effective density of components, and breadth of multi-vendor options.

### **PS1-5:** An xTCA Compliant and FPGA Based Data Processing Unit for Trigger and Data Acquisition and Trigger Applications

<u>J. Zhao</u><sup>1</sup>, Z. Liu<sup>1</sup>, H. Xu<sup>1</sup>, W. Kuehn<sup>2</sup> <sup>1</sup>Insititue of High Energy and Physics, Beijing, China <sup>2</sup>II.Physikalisches Institut, Justus-Liebig-Universitaet, Giessen, Germany

This talk will be about An xTCA compliant and FPGA based Data Processing Unit for trigger and data acquisition applications like in PANDA, PXD/BelleII and Lumi/BESIII experiments. The Unit consists of 4 Advanced Mezzanine Cards (AMC, called xFP card), 1 AMC carrier ATCA board(ACAB) and optionally 1 Rear Transition I/O Board(RTM). The ACAB board features 1 Xilinx Virtex-4 FX60 FPGA chip and 2GBytes DDR2 memory for data buffering and switching and the xFP board features 1 xilinx Virtex-5 FX70T FPGA chips and 4GBytes DDR2 memory for data processing. The connection between ACAB board and four xFP boards are by RocketIO port and other LVDS I/O pairs. 8 optical links by 4 xFP2(with two 6Gbps optical IO) cards provide an input bandwidth of 64Gbps. Optical links can either from panel of AMC card or from RTM card. 5 Gbit Ethernet links are provided for output to higher level trigger or for storage. A single ATCA shelf can host up to 14 boards interconnected via a full mesh backplane. A prototype system has been set up and some functions tests have been done and will be reported and discussed.

### PS1-7: Performance Evaluation of 8-Channel ADC ATCA Card for Direct Sampling of 1.3 GHz Signals

S. Bou Habib

Dept. ZUiAM, ISE-WUT/DESY, Warsaw, Poland

Nowadays LLRF control systems for linear accelerators incorporate complex and expensive high-precision field detection receivers with mutlichannel downconverters and low noise LO generation systems. Increasing requirements for field detection precision at most advanced machines reveal limitations of classical LLRF system receivers. Recently developed technology made it possible to design data acquisition cards allowing for direct sampling of cavity field without a need for downconverters. This paper describes the design and measurements of an eight-channel ATCA card developed for the evaluation of direct sampling techniques for 1.3 GHz signals at the FLASH and European XFEL accelerators. Two versions of the board were tested, each holding a different set of analog-to-digital converters. One was equipped with 400 MSPS, 14-bit ADCs with an analog bandwidth of 1.4 GHz while the other held 500 MSPS, 12-bit ADCs with a bandwidth of 2.3 GHz. The boards were tested in the laboratory and with "accelerator-like" signals and revealed very good results. The paper shows results of the measured sampling parameters, as well as results of different non-IQ sampling schemes with various bandwidths and reaction times for acquiring the amplitude and phase of the analyzed signals and determining the precision of the analysis. Drift measurements for determining the long-term stability are also presented. Achieved results satisfy precision requirements for machines like The European XFEL main LINAC and ILC accelerators.

### PS1-9: RF Backplane for MTCA.4 Based LLRF Control System

<u>K. Czuba<sup>1</sup></u>, M. Hoffmann<sup>2</sup>, T. Jezynski<sup>2</sup>, F. Ludwig<sup>2</sup>, H. Schlarb<sup>2</sup> <sup>1</sup>Institute of Electronic Systems, Warsaw University of Technology, Warsaw, Poland <sup>2</sup>MSK, DESY, Hamburg, Germany

The Low Level RF (LLRF) control system developed for linear accelerator based Free Electron Lasers (FEL) require real-time processing of thousands RF signals with very challenging RF field detection precision. To provide a reliable, maintainable and scalable system a new development of the LLRF control based on MTCA.4 architecture was started in DESY for FLASH and European-XFEL. In contrast to standard RF control systems realized in 19" modules, we could demonstrate setup with field detection, RF generation, RF distribution, DAQ system and the high-speed real-time processing entirely embedded in the MTCA.4 crate system. This unique scheme embeds ultra-high precision analog

electronics for detection on the Rear Transition Module (RTM) with powerful digital processing units on the Advanced Mezzanine Card (AMC). To increase system reliability, maintainability and reduce performance limitations by RF cabling network, we developed and embeded in the MTCA.4 crate an unique RF Backplane (uRFB) for RTM cards. This backplane is used for distribution of high-performance Local Oscillator (LO), RF and low-jitter clock signals together with low-noise analog power supply to analog RTM cards in the system. In this paper we present the architecture of the MTCA.4 crate with the uRFB, the backplane design and successful laboratory test results of the LLRF control system demonstrating the performance of our development.

#### PS1-15: Development and Calibration of a Real-Time Airborne Radioactivity Monitor Using Gamma-Ray Spectrometry on a Particulate Filter

R. Casanovas<sup>1</sup>, J. J. Morant<sup>2</sup>, M. Salvado<sup>1</sup>

<sup>1</sup>Unitat de Fisica Medica, Universitat Rovira i Virgili, Reus, Spain <sup>2</sup>Servei de Proteccio Radiologica, Universitat Rovira i Virgili, Reus, Spain

The main objective of an automatic real-time surveillance network is to detect anomalous levels of radioactivity in the environment as quickly as possible. If gamma-ray spectrometry is used, rather than gross counting, it is also possible to identify the involved isotopes in a radiation level increment. This enables to discriminate the naturally occurring radionuclides from the artificial ones. In addition, using gamma-ray spectrometry, the activity concentration of the detected isotopes can be determined, making it possible to establish automatic alerts based on the limit levels provided by the legislation.

For this reason, the use of real-time gamma-ray spectrometry systems in environmental radiation surveillance networks has become common. In this work, we present the general development and calibration aspects of a real-time airborne radioactivity monitor (patent pending). The monitor is based on gamma-ray spectrometry with NaI(Tl) or LaBr<sub>3</sub>(Ce) scintillators and permits, in real-time, to identify and quantify the airborne radioactive isotopes.

The system comprises a suction pump that circulates a constant flow of air through a particulate filter, which is used to concentrate the airborne isotopes. The active part of the filter is faced to a 2"x2" Nal(Tl) or LaBr<sub>3</sub>(Ce) detector connected to a multichannel analyzer and the energy spectrum is measured. Both the detector and the filter are inside a Pb shielding, which is used to reduce the surrounding radiation background. After the selected integration time, the filter is displaced to obtain the next set of measurements in a clean sheet of filter. The monitor has multiple control sensors (detector and air temperature, temperature in the rack, air flow, amount of filter available, positioning and traveling distance of the filter, filter break sensor, etc.). Software was specifically designed for local or remote control to manage data collection and storage, information transmission, sensors management, information on operating parameters, graphical representations of spectra, calculations, etc. The system also comprises a meteorological station for observing the atmospheric conditions. Besides, the monitor can easily integrate several devices, such as Geiger detectors, to complement the radiological measurements.

The calibration of the monitor was performed experimentally, except the efficiency calibration, which was set using Monte Carlo simulations. For the simulations, a user code and a model of the system geometry for the EGS5 system was prepared and validated with experimental measurements. Although the calibration methodology is independent of the scintillation crystal used, the capabilities and performance of the monitor are not. Thus, we finally discuss some characteristics of the monitor when using the different crystals.

#### **PS1-14: The XFEL RF Interlock System**

<u>M. Penno<sup>1</sup></u>, H. Leich<sup>1</sup>, T. Grevsmuehl<sup>2</sup>, C. Rueger<sup>1</sup>, K. Machau<sup>2</sup> <sup>1</sup>EL/Z, DESY Zeuthen, Zeuthen, Germany <sup>2</sup>MHF-p, DESY Hamburg, Hamburg, Germany

A technical interlock system has to prevent any damage from the cost expensive components of the RF stations. The system monitors the behaviour of various system components, collects and processes status information in real-time and reports actual status to the control system. The system is based on self diagnostic and repair strategies to obtain maximum reliability and maximum time of operation. It incorporates a controller and slave modules that perform the I/O operation. The interlock logic is implemented in hardware and operates independent from the software running on the controller. A dedicated backplane with a custom bus protocol has been developed to optimize the data transfer between the interlock controller and the interlock slave modules. The controller utilizes a single board computer that runs a Linux based embedded operating system. The software performs a self-test after power up which includes testing all hardware components, checking all firmware revisions and also validates the system configuration. Furthermore, it provides a TINE server that connects to the control system and provides status signals, analogue values and plots in real time. The XFEL RF Interlock System will be used in the XFEL facility (DESY, Hamburg Site) to protect 27 RF Stations. Since the system will be installed in the XFEL tunnel near the accelerating equipment, provisions are taken to detect and recet to Single Event Upset (SEU). The presentation will present the overview of XFEL RF Interlock System, the concept, interfaces and its components.

## PS1-17: Asynchronous and Synchronous Implementations of the Autocorrelation Function for the FPGA X-Ray Pixel Array Detector

<u>M. S. Hromalik<sup>1,2</sup></u>, K. S. Green<sup>2</sup>, H. T. Philipp<sup>2</sup>, M. T. W. Tate<sup>2</sup>, S. M. Grunet<sup>2,3</sup> <sup>1</sup>Computer Sciene, State University of New York at Oswego, Oswego, NY, United States <sup>2</sup>Laboratory of Atomic and Solid State Physics (LASSP), Cornell University, Ithaca, NY, United States <sup>3</sup>Cornell High Energy Synchrotron Source (CHESS), Ithaca, NY, United States

Abstract The design of the Field Programmable Gate Array Pixel Array Detector (FPGA PAD) prototype and initial experimental results of real-time implementations of its autocorrelation function are presented. This is a pixelated 2D silicon device for detecting X-rays in X-ray Diffraction Experiments and is comprised of three layers: the diode detection and ASIC analog electronics layers connected by a massively parallel interface to a third FPGA layer consisting of a Xilinx XC6VLX550T device. A high-speed labor intensive asynchronous interface as well as a more traditional synchronous interface will be presented. Traditionally X-ray PADs have been application-specific as their

functionality is built into the ASIC layer. In the FPGA PAD, however, the ASIC layer consists of a simple photon counting front end with a single-bit digitized output to the FPGA layer. As most of the functionality is migrated to the FPGA layer, the reconfigurability of the FPGA allows for great flexibility in terms of detector applications. The massively parallel connection between the ASIC layer and the FPGA layer also allows for data-flow implementations of detector algorithms on the parallel input bit array. Real-time data processing realizes lower data transfer rates to offline storage and higher time-resolution during experiments. An example application of a real-time autocorrelation function (ACF) for X-ray Photon Correlation Spectroscopy (XPCS) experiments is also described for a prototype of the FPGA PAD. Both a synchronous implementation and a very high-speed Region of Interest asynchronous implementation were designed. A time resolution range of 100ns to 1s was achieved for the synchronous implementation and a maximum resolution down to 36ns was realized for the Asynchronous Implementation. The required data transfer rate was also reduced from 2.56 Gb/s to 4.4Mb/s over the entire array

#### **PS1-18: Real-Time Fast Controller Prototype for J-TEXT Tokamak**

W. Zheng, M. Zhang, G. Zhuang, C. Weng, R. Liu, Y. He, T. Ding, X. Zhang College of Electrical & Electronic Engineering, Huazhong University of Science & Technology, Wuhan, China

AbstractThe operation of a tokamak device is highly sophisticated, which usually requires high performance real-time controllers. The ITER Control, Data Access and Communication (CODAC) team has made standards for Fast Controller (FC). Following the ITER CODAC standards we have designed the real-time FC (RTFC) for J-TEXT. The RTFC is a FC with dedicated software structure for real-time control. It servers as a design template for all real-time controllers on J-TEXT, and with minor modification it can be used for various application like Plasma Control and real-time diagnosis.

The RTFC mainly features the PXIe bus with multi-core processor hardware, the Reflective-Memory technology and the IEEE-1588 Precision Time Protocol (PTP). It is capable of performing close-loop control with a time cycle below 1 ms. The real-time FC supports the Experimental Physics and Industrial Control System (EPICS), it can be monitored and configured using EPICS, and will work autonomously when integrated into the J-TEXT CODAC system. Preliminary testing results based on a prototype used as the J-TEXT Vertical Field Power Supply Controller will also be presented.

#### **PS1-19: A Dedicated Processor for Monte Carlo Computation in Radiotherapy**

<u>C. Pili<sup>1,2</sup></u>, V. Fanti<sup>1,2</sup>, G. R. Fois<sup>1,2</sup>, R. Marzeddu<sup>1,2</sup>, P. Randaccio<sup>1,2</sup>, S. Siddhanta<sup>1,2</sup>, J. Spiga<sup>1,2</sup>, A. Szostak <sup>1</sup>Department of Physics, University of Cagliari, Cagliari, Italy

<sup>2</sup>INFN Sez. Cagliari, Cagliari, Italy

A high speed Monte Carlo simulator for radiotherapy is being developed at INFN, Cagliari. During radiotherapy treatment planning, when performing Monte Carlo simulations of the radiation dose delivered to the human body, the Compton interaction of a photon with an electron forms an important part. Monte Carlo simulations of the radiation dose delivered to the human body gives precise results over empirical methods but at the cost of computing time. A fast, fully, pipelined, cost effective design for real time simulation of the Compton interaction and dose calculation had been implemented on FPGA based hardware, running at more than 100 MHz, making it feasible to perform high speed Monte Carlo simulations for practical purposes and permit the real time building of maps of dose distribution. A performance comparison is also being made with an implementation on graphic processors.

#### PS1-20: New RFX-Mod Feedback Control System Based on MARTe Real-Time Framework

<u>G. Manduchi</u>, A. Luchetta, C. Taliercio, A. Soppelsa *Consorzio RFX, Padova, Italy* 

A real-time system has been used since 2004 in the RFX-mod nuclear fusion experiment to control the plasma equilibrium configuration and the Magneto Hydrodynamic (MHD) modes. The system is implemented as a network of eight VME racks, each hosting a PowerPC computer and I/O boards, communicating via GBit/s Ethernet. The system handles about 700 input signals and produces about 250 reference waveforms driving the power supply feeding the coils used for plasma position and MHD control. The system operates at a rate of 2.5 kHz with an overall latency of 1.5 ms, higher than the period due to its pipelined organization. The system has been working successfully for seven years, but its latency and limits in computation power prevent the use of the system with new, more computation-intensive control algorithms. To overcome such limitations, a new hardware and software architecture has been developed and a new system provides now a shorter latency and a much increased computation power. Despite its radically different hardware organization, using one multi-core server in place of multiple VME CPUs, the conceptual distributed organization has been retained and a one-to-one mapping between former computers and server cores has been defined, with the possibility of integrating additional cores for future use. Shared memory is now used for communication in place of Ethernet communication, thus removing one of the major bottlenecks of the old system. Generation of the reference waveforms is now achieved using PXI technology, but, due to budget constraints, VME-based data acquisition has been retained in this first stage, using UDP communication to send acquired raw data to the control server. Replacement of VME ADC modules with ATCA-based ones is foreseen as a further step. Two major changes in software have been carried out in the new system: the replacement of VxWorks with real-time Linux and the usage of MARTe, a framework for real-time applications with a growing usage in the fusion community. MARTe provides all functionality that is required to handle supervision and real-time data communication for a configurable set of real-time threads, which are then mapped against the cores of a multi-core server. Every real-time thread executes in cycle a sequence of Generic Application Modules (GAMs) providing the required interaction with the underlying hardware as well as the implementation of control algorithms. Developers can therefore concentrate on the specific components whose configuration (such as the number of threads and the components for each thread) is defined in a property file. The modular approach provided by MARTe has allowed not only a rapid development of the new system, but also its rapid prototyping. By replacing the components for data acquisition with others getting stored input raw data from the experiment database, it has been possible in fact to fully test the control algorithms before system commissioning.

#### PS1-25: A Single-FPGA Full-Time Beam Former

H. Deschamps DSM/IRFU/SEDI, Comissariat a l'Energie Atomique, GIF sur YVETTE, France A full-time beam former for two independent antenna groups, with visibility computation capabilities at a slower rate, have been formerly designed on a single FPGA for the BAO-radio instrument, a radio telescope demonstrator for the study of dark energy by HI probing technique. On the same FPGA, a firmware dedicated for the FAN project at the Nancay radio telescope have been designed, and can provide a full-time dual beam-forming on a single antenna group. It can process an incoming data flow of twelve channels organized each as complex spectrum (2x8bits) of 4096 frequency with a 4GbPS effective rate. The dual-beam capability of the system has been successfully tested by transits of radio sources (Cas A, 3C123) and further observations with source-tracking and building of visibility matrix will be done.

### PS1-26: A Two-Stage Distributed Architecture Designed for DAQ of Thousands-Channel Physical Experiment

K. Song<sup>1,2</sup>, P. Cao<sup>1,2</sup>, J. Yang<sup>1,2</sup>

<sup>1</sup>Modern Physics Dept., University of Science & Technology of China, HeFei, AnHui, China <sup>2</sup>State Key Laboratory of Particle Detection and Electronics, Hefei, Anhui, China

Today, the channels of some physical experimental data acquisition systems achieve thousands. It is impossible to use the centralized architecture to deal with these large amounts of channels' DAQ. This manuscript presents a two-stage distributed architecture for DAQ of thousands-channel physical experiment. The architecture can be divided into two units, the upper unit is for the data collection and storage, also including main control, quality monitoring, and data recording, the lower unit is for data sampling and transmission.

One block has been built for four cables (fiber or twisted-pair electric cable). One cable cascades hundreds of modules for dada sampling and transmitting. The 4-cable block is composed of: a) a CPCI chassis, where 4 FCI (Fiber Channel Interface) boards are inserted gathering data from 4 cables, with each FCI corresponds to a cable; A main board in slot zero of CPCI chassis to receive data from 4 FCIs; b) a VPR (Vision, Plotting and Recording) workstation that receives data uploaded from the main board through Gigabit Ethernet manages data plotting, printing and saving; c) a CCM (Center Controlling and Monitoring) workstation, which receives decimated data uploaded from FCI through Megabit Ethernet, manages parameters configuration and control, displays decimated data in real time to monitor the current working status; d) other auxiliary components such as Ethernet switchers, plotter, printer and disk array.

The architecture is designed to be expandable, the 4-cable block can be expanded easily to a 16-cable system by using the 4-cable block as building block, so as to meet the demand nowadays of much more channels DAQ. The key problems in multi-cable DAQ architecture are synchronous sampling between all channels, pipeline data transmission and real-time manipulation and recording of large-capacity data. There are three-lay synchronizations for the 16-cable system. They are the synchronizations among the four CPCI chassis, among the four cables in a CPCI chassis, and among the hundreds of channels in a cable, respectively. We use both hardware trigger and soft commands to synchronize the different CPCI chassis, and the 4-FCI cables connected to a CPCI chassis. We use clock recovery and PLL to adjust the phase delay, and use the command delay counter to compensate the command transmitting delay in a cable.

We have built a prototype architecture for 16 cables with the associated hardware modules, and tested the sampling synchronization, data transmitting and storage ability by using the cable simulator developed by ourselves. The cable simulator can generate 16 cables' data according to commands and configuration with one cable supporting 1920 channels. Test result shows that the sampling synchronization error between two channels with 100m apart can achieve 1ns. The tested maximum data rate from a cable is 11.52MB/s, thus the total data rate is 1.47456Gbps for 16 cables.

#### PS1-27: An Application Using MicroTCA for Real-Time Event Assembly

<u>R. A. Rivera</u> Fermilab, Batavia, IL, United States

\_\_\_\_

The Electronic Systems Engineering Department of the Computing Sector at the Fermi National Accelerator Laboratory has undertaken the effort of designing an AMC that meets the specifications within the MicroTCA framework. The application chosen to demonstrate the hardware is the real-time event assembly of data taken by a particle tracking pixel telescope. In the past, the telescope would push all of its data to a PC where the data was stored to disk. Then event assembly, geometry inference, and particle tracking were all done at a later time. This approach made it difficult to efficiently assess the quality of the data as it was being taken -- at times, resulting in wasted test beam time. Now, we can insert in the data path, between the telescope and the PC, a commercial MicroTCA crate housing our AMC. The AMC receives, buffers, and processes the data from the tracking telescope and transmits complete, assembled events to the PC in real-time. In this paper, we report on the design approach and the results achieved when the MicroTCA hardware was employed for the first time during a test beam run at the Fermi Test Beam Facility in 2012.

#### PS1-28: Digital Programmable Emulator and Analyzer of Radiation Detection Setups

A. Geraci, <u>A. Abba</u>, F. Caponio

Dept. of Electronics, Politecnico di Milano, Milan, Italy

We present a digital fully configurable architecture that performs the function of signal generation for emulation of radiation detectors and front-end electronics and the function of signal processor from radiation detectors. Many aspects justify the convenience to develop a system of this type. First of all, the improvement of experimental conditions in absence of radioactive source and detecting apparatus, which means health safety of experimenters and the possibility to perform remote experiments independently from the presence of radioactive sources and detectors. Also quality of the experiment is positively affected. In fact, the availability of the configurable virtual signal source simplifies testing of processors, allows absolute and fair comparison among different processing techniques, permits to directly evaluate algorithms and adjust the processing flow. The proposed architecture has been conceived to serve as a general purpose investigation instrument in digital spectroscopy applications, both at hardware and firmware level. It allows the emulation of all parts of an acquisition and processing setup and consequently implements a real and complete hardware and firmware co-design platform. The paper focuses theoretical and practical topics involved in generation of signals equivalent to those produced by radiation detection systems. Operatively, the signal synthesis process is based on a reference shape, statistics of generation of occurrence times, and statistic distribution of shape amplitudes. It is provided the generation of a couple of consecutive events that can be summed together simulating the occurrence of the pile-up phenomenon. The resulting signal is disrupted by noise and baseline deviation, and is shaped in order to take into account the transfer function of the electronic conditioning stage. The generated signal can be made available as output in digital or analog form. The first choice involves also the possibility of introduction of

non-linearity effects and quantization noise, whereas the second one requires a not trivial digital-to-analog conversion process. In particular, the proposed architecture implements an algorithm that retrieves statistic properties of a statistic variable on the basis of its histogram. Moreover, the system allows also to sample an external analog signals in order, for instance, to get shapes and spectra for initialization of the emulation process. From this point of view and using the synergy of emulation and acquisition functions, the system plays also the role of network analyzer for the characterization of preamplifier topologies. A fully configurable processing section allows to test and compare a great variety of algorithms for energy and time estimation, baseline correction and so on. In other words, the proposed solution allows to use the instrument also as a pure configurable ligital processor. The system has been prototyped and tested.

#### PS1-12: Firmware Upgrade in xTCA Sytems

D. Makowski<sup>1</sup>, A. Mielczarek<sup>1</sup>, G. Jablonski<sup>1</sup>, P. Predki<sup>1</sup>, T. Jezynski<sup>2</sup>, H. Schlarb<sup>2</sup>, A. Napieralski<sup>1</sup> <sup>1</sup>Department of Microelectronics and Computer Science, Technical University of Lodz, Lodz, Poland <sup>2</sup>Deutsche Elektronen-Synchotron, Hamburg, Germany

The Advanced Telecommunications Computing Architecture (ATCA) and Micro-Telecommunications Computing Architecture (uTCA) standards, collectively known as xTCA, provide a flexible and scalable infrastructure for designing complex control and data acquisition systems. The xTCA standards are becoming more and more popular in physics applications. Programmable devices, such as Field Programmable Gate Arrays (FPGAs), conventional and Digital Signal Processors (DSPs) are present on Advanced Mezzanine Card (AMC) modules and ATCA blades used in the xTCA crates. Those devices typically boot from non-volatile memories available on the modules. In case of FPGAs, the firmware is usually stored in serial or parallel memories. Other components, e.g. processors, can be booted using external interfaces (Ethernet, USB, etc.). During firmware development dedicated programmers are used. The tools provided by manufactures allow to download firmware to non-volatile PROMs (Programmable Read-Only Memories) and provide debugging functionality. The programmers can be connected using a standard JTAG connector or a proprietary debug connector. Since xTCA systems are composed of various AMC or ATCA modules, with many on-board programmable devices, a large number of different tools can be required. uTCA or ATCA chassis may house more than 10 distinct cards. In case of laboratory development, firmware upgrade can be performed using only a few dedicated programmers, upgrading each programmable device one by one. The situation is more difficult when the xTCA hardware is used to control a complex machine. In this case, the devices cannot be easily accessed. For example, the Low Level Radio Frequency (LLRF) control system of the European X-ray Free Electron Laser (XFEL) will be installed inside an accelerator tunnel. The system will be composed of hundreds of programmable devices. Therefore, the firmware upgrade cannot be performed using only programmers. Along with the deployment of complex control or data acquisition systems, the need for remote and automated firmware upgrade solution becomes urgent. This paper deals with an universal framework and set of tools for upgrading firmware in xTCA systems. The proposed framework uses a fat pipe region interface of uTCA backplane for firmware data transmission and the Intelligent Platform Management Interface (IPMI) standard for PROM memory management and control of the upgrade procedure. The proposed firmware update framework has been tested with the uTCA-based LLRF control system of the Free Electron Laser at Hamburg (FLASH). The prototype LLRF control system of the accelerator is composed of a few AMC modules using FPGA and DSP devices. Those modules include a digitizer, vector modulator, timing and data processing cards.

### **PS2-30:** Advanced Light Source Control System Upgrade Intelligent Local Controller Redesign <u>E. Norum</u>

Lawrence Berkeley National Laboratory, Berkeley, USA

As part of the control system upgrade at the Advanced Light Source the existing intelligent local controller (ILC) modules are being replaced. These modules provide real-time updates of control setpoints and monitored values. This paper describes the architecture and performance of the 'ILC Replacement Modules' which have been developed to take on the duties of the existing modules. The new modules use a 100BaseT network connection to communicate with the ALS Experimental Physics and Industrial Control System (EPICS) and are based on a commercial FPGA evaluation board running a microcontroller-like application.

The IRM application software is compiled to run directly on the MicroBlaze processor embedded in the FPGA with no intervening operating system code. This allows for rapid response to timer and network events. Performance testing shows that over 90% of timer interrupts are acknowledged within 20 microseconds and that the maximum response time is less than 180 microseconds. Setpoint update requests from the ALS EPICS control system are thus handled well within the 1 millisecond required response time. The effect of network load on response times is minimized by placing the IRMs and their controlling EPICS Input/Output Controller (IOC) on a private network segment. To further reduce the effects of network stack response on the transfer of data between the IRMs and IOC a simple UDP-based publish/subscribe protocol is used. The application software in the IRMs and IOC provide error detection and command retransmission rather than relying on the network stacks to provide this function. The result is a system that has been shown to meet the real-time response requirements of the instruments controlled by the IRMs.

Each IRM provides four analog inputs and four analog outputs. All have a range of 10V and a resolution of 16 bits. The analog gain and offset calibration factors for each channel are stored in on-board flash memory. This allows modules to be swapped at any time without the need for recalibration or of operator intervention. A front panel OLED display provides local indication of analog and digital I/O values.

Approximately 125 IRMs will be installed once the upgrade is complete. This will require the addition of three network switches to connect the IRMs to the EPICS IOC. The IRMs will communicate with the switches of copper (100BaseT) network connections. The link between the switches and the IOC will be made with fiber or copper gigabit network links.

This paper presents results of timing and throughput tests of a prototype module as well as a detailed description of the hardware and software design.

### PS1-6: Overview of the Data Acquisition Electronics and Concepts for Photon Experiments and Beamlines at the European XFEL

P. Gessler, C. Youngman, M. Kuster, B. Fernandes, O. Batindek

European X-Ray Free Electron Laser Facility GmbH, Hamburg, Germany

The European X-Ray Free Electron Laser, currently under construction in northern Germany, will deliver up to 2700 short, less than 100fs, xray pulses with wavelengths between 0.05 and 6 nm at a repetition rate of 4.5MHz to several beamlines. The facility will provide X-rays of unique quality for studies in physics, chemistry, life sciences, material research and other disciplines.

In order to set up the beam, position samples and capture imaging data, information from the accelerator, diagnostic devices and detectors have to be digitized, converted, processed, transferred, aggregated, distributed, reorganized, controlled and stored. Boundary conditions like the high data rate and amount, frequently changing processing algorithms in FPGAs, low-latency FPGA-to-FPGA control loops and limited access to hardware reduces the choice of products and standards available. The detector and data acquisition electronics group coordinates and implements electronic hardware for photon beam lines and experiments and is developing a modular firmware programming environment, which fulfil the described requirements and provide an easy-to-use and flexible framework.

This paper gives an overview of the data acquisition electronic hardware and developments, streaming concepts and the FPGA firmware programming framework under development.

#### PS1-8: Vector Modulator Card for MTCA-Based LLRF Control System for Linear Accelerators

I. Rutkowski<sup>1</sup>, K. Czuba<sup>1</sup>, D. Makowski<sup>2</sup>, A. Mielczarek<sup>2</sup>, H. Schlarb<sup>3</sup>, F. Ludwig<sup>3</sup>

<sup>1</sup>Institute of Electronic Systems, Warsaw University of Technology, Warsaw, Poland

<sup>2</sup>Department of Microelectronics and Computer Science, Technical University of Lodz, Lodz, Poland MSK, Dauteches Elektronen Surghantran, Hamburg, Commun.

<sup>3</sup>MŜK, Deutsches Elektronen Synchrotron, Hamburg, Germany

Modern Low Level Radio Frequency (LLRF) control systems of linear accelerators are designed to achieve precise field amplitude and phase regulation inside accelerating cavities. One of the crucial components of the feedback loop is the vector modulator used to drive the high power RF chain supplying the accelerating cavities. The LLRF control systems for the Free Electron Laser in Hamburg (FLASH) and European X-ray Free Electron Laser (XFEL) are based on emerging (MTCA) platform offering numerous advantages for high performance control systems. This paper describes the concept, design and performance evaluation of world's first Vector Modulator (uVM) module dedicated for LLRF systems compatible with MTCA.4 specification. The module was designed as a double-width, mid-size AMC form factor Rear Transition Module. The uVM board incorporates digital, analog and diagnostic subsystems. The digital part is based on Xilinx Spartan 6 family FPGA, with several fast gigalink connections to the control module. The uVM module is equipped with an Intelligent Platform Management (IPMI) circuit required by MTCA.4 standard. The FPGA controls the analog part, which includes fast, high-precision DACs, I/Q modulator chips, programmable attenuators, power amplifier and fast RF gates for external interlock access. The RF chain can be adopted to different carrier frequencies covering frequency range from 50 MHz to 6 GHz. The design has been carefully optimized for high linearity and low output signal phase noise. The diagnostic system makes the uVM an universal device for applications exceeding the LLRF control system. Extensive tests of the board were performed and measurement results are presented and discussed in this paper.

### PS1-21: Real Time FPGA-Based Crosstalk Elimination for Multichannel Interferometry Systems in Fusion Diagnostics

S. Hernandez-Montero<sup>1</sup>, <u>J. A. Lopez-Martin</u><sup>1</sup>, M. Sanchez<sup>2</sup>, L. Esteban<sup>2</sup>

<sup>1</sup>Departamento de Ingenieria Electronica, Universidad Politecnica de Madrid, Madrid, Spain <sup>2</sup>Laboratorio Nacional de Fusion, CIEMAT, Madrid, Spain

Infrared (IR) interferometry is a well-known method for measuring the Line-Integrated electron Density (LID) of fusion plasmas. In the TJ-II stellarator, an FPGA-based IR- interferometer has been recently installed to accurately measure the LID of the plasma in real time.

To guarantee the correct functionality of the interferometer and achieve high precision rates, it is essential to maximize the Signal-to-Noise Ratio (SNR) of the output density signal. In the measurement process, one of the most important distortion sources is the crosstalk or interchannel interference. Thus, in order to increase the SNR of the system, a crosstalk reduction stage has been designed and implemented in a FPGA.

This paper shows a novel crosstalk elimination algorithm that has been optimized for its high-performance hardware implementation. Since the algorithm operates over the complex spectrum of the signals, the N-point Fast Fourier Transform (FFT) is initially performed. Afterwards, the inner product between the spectrums is used to reconstruct an estimation of the transfer function of the interfering system and this reconstruction is used to eliminate the interference in the frequency domain. Finally, the N-point inverse FFT is carried out to obtain an improved version of the time signal required in the phase detection block.

In addition, an M-factor pre-downsampling stage has been included to increase the frequency resolution of the algorithm. This stage downconverts the input signals into low frequency aliases, which decreases the sample frequency as the overall system resolution is signi cantly increased.

This procedure, in combination with the phase detection algorithm currently applied in the TJ-II, performs the required operations in a few microseconds, which allows extracting an accurate measurement of the LID in real time, and enables the capability of controlling the heating systems of the fusion reactor using a feedback loop. This is possible because the algorithm is implemented as a initial block of the processing stage, in contrast with the existent algorithms which are usually applied in post-processing. In our approach, improved results are obtained by eliminating the interference in the detected raw signal.

#### **PS1-23:** Parallel Task Management Library for MARTe

<u>D. F. Valcarcel</u><sup>1</sup>, D. Alves<sup>1</sup>, A. Neto<sup>1</sup>, Č. Reux<sup>2</sup>, B. B. Carvalho<sup>1</sup>, R. Felton<sup>3</sup>, P. J. Lomas<sup>3</sup>, J. Sousa<sup>1</sup>, L. Zabeo<sup>4</sup>, JET EFDA Contributors\*<sup>5</sup> <sup>1</sup>Associacao EURATOM/IST, Instituto de Plasmas e Fusao Nuclear, Instituto Superior Tecnico, UTL, P1049-001 Lisboa, Portugal <sup>2</sup>Ecole Polytechnique, LPP, CNRS UMR 7648, 91128 Palaiseau, France <sup>3</sup>Euratom/CCFE Fusion Association, Culham Science Centre, Abingdon, Oxon, OX14 3DB, UK <sup>4</sup>ITER Organisation, Cadarache, France <sup>5</sup>JET-EFDA, Culham Science Centre, OX14 3DB, Abingdon, UK

The Multithreaded Application Real-Time executor (MARTe) is a real-time framework with increasing popularity and support in the thermonuclear fusion community. It allows to run modular code in a multi-threaded environment leveraging on the current multi-core processor (CPU) technology. One application that relies on the MARTe framework is the JET tokamak WAII Load Limiter System (WALLS). It calculates and monitors the temperature on metal tiles, plasma facing components (PFCs) that can melt or flake if their temperature gets too high when exposed to power loads. One of the main time consuming tasks in WALLS is the calculation of thermal diffusion models in real-time. These models tend to be described by very large state-space models thus making them perfect candidates for parallelisation. MARTe's traditional approach for task parallelisation is to split the problem into several Real-Time Threads, each responsible for a self-contained sequential execution of an input-to-output chain. This is usually possible, but it might not always be practical for algorithmic or technical reasons. Also, it might not be easily scalable with an increase of the available number of CPU cores. The WorkLibrary introduces a GPU-like way of splitting work among the available cores of modern CPUs that is straightforward to use in an application and scalable without code rewrite or recompilation.

The first part of this article explains the motivation behind the library, its architecture and implementation. The second part presents a real application for WALLS, a parallel version of a large state-space model describing the 2D thermal diffusion on a JET tile.

\*See the Appendix of F. Romanelli et al., Proceedings of the 23rd IAEA Fusion Energy Conference 2010, Daejeon, Korea

#### **PS1-30: Ultra-Fast Streaming Camera Platform for Scientific Applications**

M. Caselle, M. Balzer, S. Chilingaryan, A. Herth, A. Kopmann, U. Stevanovic, M. Vogelgesang

IPE, Karlsruhe Institute of Technology, Karlsruhe, Germany

We have developed a novel camera platform for ultra-fast data acquisition, real time signal processing and compression. It is intended for highspeed X-ray tomography within in the project: Ultra-fast Xray imaging with Online data assessment experiment (UFO). The UFO project demands high spatial- and temporal resolutions, down to 1 um at several tens of thousands of frames/s in full streaming mode, and aims to employ image-based feedback loops. The key features of the camera platform are: 1. Continuous data taking at maximum resolution and frame rate with observation times up to several hours. 2. An intelligent signal processing providing features like image-based self-trigger, on-line data reduction and region-of-interest (ROI) readout. 3. The firmware architecture is intended to be open to realize a fully programmable camera to the needs of the application. It is foreseen for implementation of the fastest feedback loops. The hardware setup and the modular FPGA firmware of the camera platform will be presented. The image sensor is integrated on a mezzanine daughter card and connected by a FMC high bandwidth connector to the readout board. The readout board provides programmable logic (FPGA) and a large DDR memory for both temporary data storage and on-line data processing. Finally the camera platform is accomplished by a fast PCI Express cable interface to dedicated GPU compute servers. The firmware development is based on a bus master multichannel DMA architecture to ensure high data throughput. Cyclic-redundancy-check (CRC) logic is used to detect possible errors during data transfer. Real-time data elaboration algorithms for on-line processes like filtering and data compression are applied before sending the data. One of the most important tasks of the signalprocessing unit is the novel intelligent image-based self-event trigger architecture for application for otherwise unpredictable events. The trigger information is used by the readout logic for an efficient ROI readout strategy. This helps to reduce the required bandwidth per frame and can be used to maximize the effective frame rate. A 64-bit Linux driver seamlessly integrates the imaging platform into any GPU server infrastructure. Both ultra-fast streaming camera platform and custom GPU infrastructure ensure an out standing performance for scientific experiments. The first camera demonstrator achieves the maximum frame rate of the image sensor with 340 fps with 2MPixel @ 10bits and a data rate to 1GB/sec. We expect that the current readout architecture is able to reach more than 5GB/sec. The prototype has been tested at ANKA synchrotron machine. The camera platform will be continuously enhanced; e.g. by a new faster image sensor with 50Gb/sec and a highspeed data link (InfiniBand) for tight integration in GPU clusters. Preliminary results and future perspectives are presented.

#### **PS1-1: Implementation of an ATCA/AXIe Board for Fast Control and Data Acquisition Systems of** Nuclear Fusion Devices

<u>A. J. N. Batista<sup>1</sup></u>, C. Leong<sup>2</sup>, V. Bexiga<sup>2</sup>, A. P. Rodrigues<sup>1</sup>, A. Combo<sup>1</sup>, B. B. Carvalho<sup>1</sup>, P. Ricardo<sup>1</sup>, J. Fortunato<sup>1</sup>, B. Santos<sup>1</sup>, P. Carvalho<sup>1</sup>, M. Correia<sup>1</sup>, J. P. Teixeira<sup>2</sup>, I. C. Teixeira<sup>2</sup>, J. Sousa<sup>1</sup>, B. Goncalves<sup>1</sup>, C. A. F. Varandas<sup>1</sup>

<sup>1</sup>Associacao EURATOM/IST, Instituto de Plasmas e Fusao Nuclear - Laboratorio Associado, Instituto Superior Tecnico - Universidade Tecnica de Lisboa, Lisboa, Portugal

<sup>2</sup>INESC-ID, Lisboa, Portugal

A recent implementation of an ATCA/AXIe board, developed for fast control and data acquisition systems of nuclear fusion devices, is presented. The implemented blade was designed for systems requiring high levels of reliability and availability, such as those of long duration discharges or steady-state operation nuclear fusion experiments. Aiming to be included in the instrumentation catalogue of ITER, the board comprises a passive rear transition module for analogue IO cabling connectivity and easy front board maintenance. The board main specifications includes 48 analogue IO channels (galvanically isolated), a Xilinx Virtex 6 FPGA, 2 GB of DDR3 DRAM, PCI Express on the ATCA Fabric, Intelligent Platform Management Controller, Inter-Range Instrumentation Group time code timing/synchronism and full ATCA redundancy. Digitized (2 MSPS @ 18-bits) analogue inputs are filtered/decimated by the boards FPGA firmware and sent to the ATCA/AXIe host (multi-core processor) through PCI Express DMA channels. Real-time control and continuous data acquisition bandwidths are programmable up to 200 kSPS. Event acquisition is fixed to 200 kSPS, with programmable pre-event and post-event data available. All blocks of acquired data are time-stamped. Analogue outputs DACs are updated by the host through PCI Express (refresh rate up to 1 MSPS @ 18 bits). Board tests have been performed using both ATCA and AXIe shelfs, covering the specificities of each standard.

#### **PS1-11: Intelligent Platform Management Controller Software Architecture in ATCA Modules for** Fast Control Systems

<u>A. P. Rodrigues</u><sup>1</sup>, M. Correia<sup>1</sup>, A. J. N. Batista<sup>1</sup>, P. R. Carvalho<sup>1</sup>, B. Santos<sup>1</sup>, B. B. Carvalho<sup>1</sup>, J. Sousa<sup>1</sup>, B. Goncalves<sup>1</sup>, C. C. M. B. Correia<sup>2</sup>, C. A. F. Varandas<sup>1</sup>

<sup>1</sup>Associacao EURATOM/IST, Instituto de Plasmas e Fusao Nuclear, Instituto Superior Tecnico - Universidade Tecnica de Lisboa, Lisbon, Portugal

<sup>2</sup>Departamento de Fisica/Centro de Instrumentacao, Universidade de Coimbra, Coimbra, Portugal

In complex control systems, such as the required for plasma control in nuclear fusion experiments, eg. International Thermonuclear Experimental Reactor (ITER), hardware failure, redundancy, power management, maintenance and firmware version control are very important issues. To overcome these constraints in fast control and data acquisition systems of large-scale physics experiments, where high-availability is crucial, an Intelligent Platform Management Controller (IPMC) was developed by IPFN/IST. This controller was integrated in the Advanced Telecommunications Computing Architecture (ATCA) modules that are developed aiming to be included in the instrumentation catalogue of tokamak ITER. This controller in addition with the shelf manager module of the ATCA crate is responsible for the hardware management, such as hardware failure and redundancy procedures, hot insertion/removal of the modules operation, compatibility between modules that share ATCA resources, power management of each module, ATCA modules firmware management (programming, application specific program selection and version control), temperature monitoring and other module specific configurations, granting high availability to the control and data acquisition system. In this paper the software architecture of the implemented IPMC module is described.

#### **RTSA: Real Time System Architectures**

#### Monday, June 11 17:00-18:40 Crystal Ballroom

### RTSA-1: Auger ACCESS - Remote Monitoring and Controlling the Auger Experiment

T. Jejkal<sup>1</sup>, H.-J. Mathes<sup>2</sup>, J. Rautenberg<sup>3</sup>, M. Kleifges<sup>1</sup>, H. Gemmeke<sup>1</sup>

<sup>1</sup>Institute for Data Processing and Electronics, Karlsruhe Institute of Technology, Karlsruhe, Germany <sup>2</sup>Institute for Nuclear Physics, Karlsruhe Institute of Technology, Karlsruhe, Germany <sup>3</sup>Astroparticlephysics, University of Wuppertal, Wuppertal, Germany

Ultra-high energy cosmic rays are the most energetic and rarest of particles in the universe. They are expected to have energies of up to 1020 eV and occur once per square kilometre and year. To increase the probability of detecting one of these events, a huge detector covering a large area is needed. During the last decades, an international cooperation, the Pierre Auger Project, reached this goal by building up an observatory covering 3000 square kilometres of the Pampa Amarilla in western Argentina. The observatory consists of more than 1600 surface detectors and several fluorescence telescope buildings. All these instruments are controlled within the local campus network; any access from the outside is blocked by a firewall. However, for long term operation it is inevitable to provide remote access in a secure, reliable and easy way to monitor and control the system without sending scientists and other experts periodically to Argentina. Currently, there are two shifts per year, which leads to remarkable travel costs. Remote access would allow to get status information about running components of the experiment and to give the best possible support for the operators at the experiment site in case of severe problems with the detectors. For this reasons, the Auger ACCESS project was started in 2005 to define requirements and to find a solution to provide remote monitoring and control functionality to access the running experiment by the community worldwide. In addition, a new networking infrastructure had to be built up to allow remote access with good performance. As the Auger experiment was already running in 2005, one of the first tasks was to design and to implement a test environment to be able to accomplish software development and testing. For this purpose a virtual infrastructure, reflecting the campus network in Argentina, was build up using enterprise virtualization technologies like VMware ESX. By using this infrastructure, all required components for remote access could be implemented. For this well-known technologies from the field of Grid computing were adopted to build up a service layer secured by X.509 certificates. Finally, a virtual control room running the implemented control and monitoring software was set up at University of Wuppertal. Compared to previous travel costs all expenses for such a control room are amortized by running one shift remotely. At the end of this contribution measurement results are presented comparing both, local and remote access to the Auger observatory.

**RTSA-2:** Belle2Link: an Unified High Speed Data Collection with Slow Control in Belle II Experiment Z.-A. Liu<sup>1</sup>, D. Sun<sup>1</sup>, J. Zhao<sup>1</sup>, H. Lin<sup>1</sup>, F. Guo<sup>1</sup>, C. Wang<sup>1</sup>, M. Nakao<sup>2</sup>, R. Itoh<sup>2</sup>, T. Higuchi<sup>2</sup>, S. Y. Suzuki<sup>3</sup>

<sup>1</sup>EPC, Inst. of High Energy Physics, Chinese Academy of Sciences, Beijing, China <sup>2</sup>IPNS, KEK, Tsukba, Japan

<sup>3</sup>CRC, KEK, Tsukuba, Japan

This paper describes an unified data collecting and transmission system with slow control function over a same high speed link designed for use in Belle II experiment at KEK which features as: 1. Unification in hardware design, 2. Unification in firmware design, 3. Over 3Gbps High speed data transmission function, 4. Partial slow control function(parameter setting of FEE), 5. Versatile for different data input rate, 6. Home brewed transmission protocol. From hardware point of view, Belle2link share unified parts of the hardware on different Detector Front End Electronics board, including partial FPGA resources, timing and trigger input, GTP port and a SFP transceiver, together with a specially designed Mezzanine card(HSLB) for COPPER board in the backend readout. From the Firmware point of view, Belle2Link implements the main function in Xilinx FPGA and CPLD. The firmware for back end is the same and works with HSLB as a general readout device for all subsystems. The firmware for Front end is unified and be set so that is suitable for different system and is integrated in FEE FPGA. Special transmistion protocol has been designed so that not only the normal data, but also the so-called slow control command and data can be transmitted over the same link. Model system have setup based on hardware and firmware development, joint tests with drift chamber prototype have done in July 2010 and December 2011 at KEK, Comic ray test have been done in February 2012. Some results of these tests will be given and followed some discussion.

### **RTSA-3:** A System for Monitoring and Tracking the LHC Beam Spot Within the ATLAS High Level Trigger

R. Bartoldus<sup>1</sup>, J. Cogan<sup>1</sup>, A. Salnikov<sup>1</sup>, E. Strauss<sup>1</sup>, <u>F. Winklmeier</u><sup>2</sup> <sup>1</sup>ATLAS Dept., SLAC, Menlo Park, CA, United States <sup>2</sup>PH Dept., CERN, Geneva, Switzerland

The parameters of the beam spot produced by the LHC in the ATLAS interaction region are computed online using the ATLAS High Level Trigger (HLT) system. The high rate of triggered events is exploited to make precise measurements of the position, size and orientation of the luminous region in near real-time, as these parameters change significantly even during a single data-taking run. We present the challenges, solutions and results for the online determination, monitoring and beam spot feedback system in ATLAS. A specially designed algorithm, which uses tracks registered in the silicon detectors to reconstruct event vertices, is executed on the HLT processor farm of several thousand CPU cores. Monitoring histograms from all the cores are sampled and aggregated across the farm every 60 seconds. The reconstructed beam values are corrected for detector resolution effects, measured in situ from the separation of vertices whose tracks have been split into two collections. Furthermore, measurements for individual bunch crossings have allowed for studies of single-bunch distributions as well as the behavior of bunch trains, calibrated to the beam average. Run control invokes a comparison of the nominal and measured beam spot values, and when threshold conditions are satisfied the farm configuration is updated. To achieve sharp time boundaries across the event stream, which is trigger nodes. Thousands of clients then fetch the same set of values from the conditions database in a fraction of a second via an efficient near-simultaneous access made possible through a dedicated CORAL Server and Proxy tree.

#### **RTSA-4: Data Flow and High Level Trigger of Belle II DAQ System**

<u>R. Itoh</u><sup>1</sup>, T. Higuchi<sup>1</sup>, M. Nakao<sup>1</sup>, S. Y. Suzuki<sup>1</sup>, S. Lee<sup>7</sup> <sup>1</sup>KEK, Tsukuba, Japan <sup>2</sup>Korea University, Seoul, Korea

The Belle II experiment is a new generation B-factory experiment at KEK in Japan upgraded from Belle, with a new accelerator SuperKEKB upgraded from KEKB. Since more than 40 times higher luminosity is expected by the accelerator, the average Level 1 trigger rate is estimated to become up to 30 kHz with a total raw data size of 1M bytes in Belle II. To manage this huge data flow, the Belle II DAQ system is designed to perform a multi-step data reduction using a large number of CPUs implemented in various DAQ components from the detector readout modules to the high level trigger farm.

The readout module called COPPER is equipped with an add-on PrPMC CPU card. About 300 COPPER modules are utilized to receive detector hits through ~700 unified optical links called Belle2link and then to perform the data reduction by the on-board processing. The output data are sent to the readout PCs via GbE-T links and the 1st level event building is performed with an optional data reduction. The events are finally built at a switch-based event builder via 10GbE links and fed into the high level trigger farm (HLT). The HLT consists of O(10) HLT units each of which is equipped with a number of processing nodes together with input and output nodes to distribute and collect events for the parallel processing. A full event reconstruction using the same offline software is performed in real-time and the physics event selections, such as the hadronic event selection and the tau-pair event selection, are used as the software trigger.

The track information obtained for the HLT-selected events are sent to the readout system of the pixel detector (PXD) and the PXD hits are associated with the tracks to discard noise hits. The output of HLT are merged with associated PXD hits at the second event builder and finally recorded in the fast RAID system.

All CPUs in the DAQ components from COPPERs to HLTs are operated by the same Linux, and a unified processing framework based on the offline framework (basf2) is used for the ROOT based object oriented data flow.

The design of the data flow in Belle II DAQ system is reported at the conference. In particular, 1) the performance study of the object oriented data flow in a realistic test bench, 2) the architecture of the HLT and the preliminary performance of the prototype, and 3) the design of the PXD data migration, are discussed in detail.

#### **RTSA-5: A Prototype Clock System for LHAASO WCDA**

L. Shang<sup>1,2</sup>, K. Song<sup>1,2</sup>, <u>P. Cao<sup>1,2</sup></u>, C. Li<sup>1,2</sup>, S. Liu<sup>1,2</sup>, Q. An<sup>1,2</sup>

<sup>1</sup>Department of modern physics, University of Science and Technology of China, Hefei, Anhui, China <sup>2</sup>State Key Laboratory of Particle Detection and Electronics, Hefei, Anhui, China

The LHAASO project is a large Extensive Air Shower (EAS) particle detector array of 1km2 at 4300m above sea level at Yangbajing, Tibet, targeting gamma astronomy at energies between 100GeV and 30TeV. The WCDAs with an active area of 90,000m2 are one of the major components for cosmic ray physics in the LHAASO project. In WCDA, a distributed architecture that digitalization is implemented nearby detectors in the front is imperative because the detectors are 150m far from the counting room.

However, in this distributed architecture, how to provide an accurate synchronization clock of stable phase for all the front-end electronics (FEE) is not an easy task. Thus in this paper a prototype clock system consisting of clock source module, clock transmitting modules and clock receiving modules is presented to achieve the aim that both the skew and the jitter parameter are less than 100ps.

Clock source module provides precise system clock directly affecting the time measurement precision for all the readout electronics. In order to associate the measurement results acquired by all the astronomical observatories global, this system clock is locked to the global synchronous clock GPS. Clock transmitting modules are the connecting bridge between clock source module and clock receiving modules. They obtain high precise system clock from clock source module and distribute it to clock receiving modules via 150 meters fiber Based on

Serializer/Deserializer (SerDes) and fiber transmission. Clock Receiving modules, the basic of the LHAASO WCDA clock system, recover the system clock based on SerDes clock data recovery (CDR) technology and perform precise clock for the analog part of FEEs.

However, there is a transmission delay, which is not fixed affected by factors such as temperature variation, between the recovered clock of clock receiving module and the transmitter clock of clock transmitting module. In order to realize the clock skew parameter of less than 100ps, referred to White Rabbit project, a reduced and improved scheme is presented to automatically adjust the propagation delay and keep the phase alignment. In the scheme, an FPGA Time-to-Digital Converter (TDC) whose resolution is 1.25ns is presented to measure the phase difference zoomed in by digital Dual Mixer Time Difference (DMTD) technique. With the help of direct phase-shifting mode of FPGAs DCM, the phase can be dynamically and repetitively moved forward and backwards by the value of one DCM\_TAP which is less than 40ps. The experimental

results show that both the clock jitter and clock skew of all the FEEs are less than 100ps, meeting the requirement of LHAASO WCDA. It has been proved that this prototype clock system can be used in the whole WCDA electronics. Furthermore, this work provides a general approach to develop high quality synchronous clock systems for front end electronics in large physics experiments.

#### **PS1: Poster Session 1**

#### Monday, June 11 15:50-17:00 Boiler room

### **PS1-1: Implementation of an ATCA/AXIe Board for Fast Control and Data Acquisition Systems of Nuclear Fusion Devices**

<u>A. J. N. Batista</u><sup>1</sup>, C. Leong<sup>2</sup>, V. Bexiga<sup>2</sup>, A. P. Rodrigues<sup>1</sup>, A. Combo<sup>1</sup>, B. B. Carvalho<sup>1</sup>, P. Ricardo<sup>1</sup>, J. Fortunato<sup>1</sup>, B. Santos<sup>1</sup>, P. Carvalho<sup>1</sup>, M. Correia<sup>1</sup>, J. P. Teixeira<sup>2</sup>, I. C. Teixeira<sup>2</sup>, J. Sousa<sup>1</sup>, B. Goncalves<sup>1</sup>, C. A. F. Varandas<sup>1</sup>

<sup>1</sup>Associacao EURATOM/IST, Instituto de Plasmas e Fusao Nuclear - Laboratorio Associado, Instituto Superior Tecnico - Universidade Tecnica de Lisboa, Lisboa, Portugal

<sup>2</sup>INESC-ID, Lisboa, Portugal

A recent implementation of an ATCA/AXIe board, developed for fast control and data acquisition systems of nuclear fusion devices, is presented. The implemented blade was designed for systems requiring high levels of reliability and availability, such as those of long duration discharges or steady-state operation nuclear fusion experiments. Aiming to be included in the instrumentation catalogue of ITER, the board comprises a passive rear transition module for analogue IO cabling connectivity and easy front board maintenance. The board main specifications includes 48 analogue IO channels (galvanically isolated), a Xilinx Virtex 6 FPGA, 2 GB of DDR3 DRAM, PCI Express on the ATCA Fabric, Intelligent Platform Management Controller, Inter-Range Instrumentation Group time code timing/synchronism and full ATCA redundancy. Digitized (2 MSPS @ 18-bits) analogue inputs are filtered/decimated by the boards FPGA firmware and sent to the ATCA/AXIe host (multi-core processor) through PCI Express DMA channels. Real-time control and continuous data acquisition bandwidths are programmable up to 200 kSPS. Event acquisition is fixed to 200 kSPS, with programmable pre-event and post-event data available. All blocks of acquired data are time-stamped. Analogue outputs DACs are updated by the host through PCI Express (refresh rate up to 1 MSPS @ 18 bits). Board tests have been performed using both ATCA and AXIe shelfs, covering the specificities of each standard.

#### PS1-2: Automatic System-Level Synthesis for Heterogeneous Platforms

H. A. Andrade, K. Ravindran

National Instruments Corporation, Berkeley, CA, United States

The field of "real-time data acquisition and computing applications in the physical sciences" presents by definition a cyber-physical system design problem that pushes the performance boundaries of the latest I/O, computation, communication, and storage technologies. This trend has forced designers to consider heterogeneous platforms that offer a good balance of performance, power efficiency, and cost. Unfortunately, this comes at a significant increase in programming complexity to match the efficiency gains in hardware, which translates to lower design productivity. The problem is more apparent today given that most system designers are domain experts in the physical sciences and not necessarily 'native' programming experts.

To alleviate this problem, we are prototyping an automated system level synthesis and exploration framework to deploy high level application specifications onto heterogeneous platforms. We view heterogeneous platforms as those platforms that have different types of computing, communication, storage, and I/O elements, but are considered together as one system target for purposes of design and deployment. Computing elements include traditional instruction processors, FPGAs, GPUs, or specialized processors or accelerators. The communication network consists of links that offer different topology, speed, and affinity. Storage could be part of computing nodes, or independent components, such as controllers for streaming to disk, or intelligent memory modules. The I/O elements define the boundaries of the system and are the main observable components at which the responsiveness of the system is measured.

The designer would define by aggregation a system-level target or platform, thereby making all the computing and I/O nodes available to a given set of applications. These applications are specified in suitable models of computation that intuitively capture the concurrency, data flow, timing, and control requirements of these applications. The application language balances expressibility and analyzability to enable automatic synthesis, simulation, and implementation of the system. The language is backed by analysis methods to reason about resource allocation and scheduling decisions across the heterogeneous components of the target. Problems that can be solved by this system-level synthesis framework include:

-Given an application and a system target specification, automatically map application to target

-Given an application, determine a suitable organization of the target and a mapping

-Identify mappings to ensure a specified reliability or other non-functional system properties

-Simulate and verify the system

-Manually map a subsystem if specific optimizations are needed

In this paper, we present key research directions in developing a framework for automatic system-level synthesis for heterogeneous platforms, share preliminary results, and discuss its use for applications of interest to the physical sciences community.

#### **PS1-3: Monitoring and Improving the ALICE Data Taking Efficiency**

V. Barroso<sup>1</sup>, F. Carena<sup>1</sup>, W. Carena<sup>1</sup>, S. Chapeland<sup>1</sup>, F. Costa<sup>1</sup>, E. Denes<sup>2</sup>, R. Divia<sup>1</sup>, A. Grigore<sup>3</sup>, G. Simonetti<sup>4</sup>, C. Soos<sup>1</sup>, A. Telesca<sup>1</sup>,

P. Vande Vyvre<sup>1</sup>, B. von Haller<sup>1</sup>

<sup>1</sup>CERN, Geneva, Switzerland

<sup>2</sup>*Hungarian Academy of Sciences, Budapest, Hungary* 

<sup>3</sup>Polytechnic University of Bucharest, Bucharest, Romania

<sup>4</sup>Universita Bari, Bari, Italy

ALICE (A Large Ion Collider Experiment) is the heavy-ion experiment designed to study the physics of strongly interacting matter and the quark-gluon plasma at the CERN Large Hadron Collider (LHC). Since its successful start-up in 2010, the LHC has been performing

outstandingly, providing to the experiments long periods of stable collisions and an integrated luminosity that greatly exceeds the planned targets.

To fully explore these privileged conditions, it is paramount that the experiment's data taking efficiency during stable collisions is as high as possible. In ALICE, some of the greatest lessons learned in 2011 were how important it is to clearly identify the reasons of inefficiency, closely monitor the efficiency and make the information available to the whole collaboration.

This paper will describe how the ALICE Electronic Logbook (eLogbook) is used to recognize the main causes of inefficiency, helping decision making by providing quantitative information and allowing the Run Coordination team to identify, prioritize, address and follow them. It will also explain how the eLogbook is used to monitor the data taking efficiency, providing reports that allow the collaboration to portray its evolution and evaluate the measures taken to increase it. Finally, it will present the ALICE efficiency since the start-up of the LHC and the future plans to further support the Run Coordination activities.

#### **PS1-4: Open-Standard Blade Systems Enable High Performance Applications**

S. McClellan<sup>1</sup>, K. Austin<sup>2</sup>, A. Deikman<sup>2</sup>

<sup>1</sup>Ingram School of Engineering, Texas State University, San Marcos, TX, United States <sup>2</sup>ZNYX Networks, Fremont, CA, United States

This paper describes a new category of mid-sized blade-systems which are fully compliant with ATCA specifications, but provide enhanced vendor-neutral configurability, low cost-of-entry, and extremely high processing density for network-intensive applications. Further, we propose a straightforward method for normalizing blade systems and describing the overall value of candidate system architectures. Our hope is that this approach to comparison will reveal characteristics of blade systems in terms of the primary driving factors for scientific applications: efficiency of system design-time, cost-effective density of components, and breadth of multi-vendor options.

### **PS1-5:** An xTCA Compliant and FPGA Based Data Processing Unit for Trigger and Data Acquisition and Trigger Applications

<u>J. Zhao<sup>1</sup></u>, Z. Liu<sup>1</sup>, H. Xu<sup>1</sup>, W. Kuehn<sup>2</sup> <sup>1</sup>Insititue of High Energy and Physics, Beijing, China <sup>2</sup>11.Physikalisches Institut, Justus-Liebig-Universitaet, Giessen, Germany

This talk will be about An xTCA compliant and FPGA based Data Processing Unit for trigger and data acquisition applications like in PANDA, PXD/BelleII and Lumi/BESIII experiments. The Unit consists of 4 Advanced Mezzanine Cards (AMC, called xFP card), 1 AMC carrier ATCA board(ACAB) and optionally 1 Rear Transition I/O Board(RTM). The ACAB board features 1 Xilinx Virtex-4 FX60 FPGA chip and 2GBytes DDR2 memory for data buffering and switching and the xFP board features 1 xilinx Virtex-5 FX70T FPGA chips and 4GBytes DDR2 memory for data processing. The connection between ACAB board and four xFP boards are by RocketIO port and other LVDS I/O pairs. 8 optical links by 4 xFP2(with two 6Gbps optical IO) cards provide an input bandwidth of 64Gbps. Optical links can either from panel of AMC card or from RTM card. 5 Gbit Ethernet links are provided for output to higher level trigger or for storage. A single ATCA shelf can host up to 14 boards interconnected via a full mesh backplane. A prototype system has been set up and some functions tests have been done and will be reported and discussed.

### **PS1-6:** Overview of the Data Acquisition Electronics and Concepts for Photon Experiments and Beamlines at the European XFEL

P. Gessler, C. Youngman, M. Kuster, B. Fernandes, O. Batindek European X-Ray Free Electron Laser Facility GmbH. Hamburg. Germany

The European X-Ray Free Electron Laser, currently under construction in northern Germany, will deliver up to 2700 short, less than 100fs, xray pulses with wavelengths between 0.05 and 6 nm at a repetition rate of 4.5MHz to several beamlines. The facility will provide X-rays of unique quality for studies in physics, chemistry, life sciences, material research and other disciplines.

In order to set up the beam, position samples and capture imaging data, information from the accelerator, diagnostic devices and detectors have to be digitized, converted, processed, transferred, aggregated, distributed, reorganized, controlled and stored. Boundary conditions like the high data rate and amount, frequently changing processing algorithms in FPGAs, low-latency FPGA-to-FPGA control loops and limited access to hardware reduces the choice of products and standards available. The detector and data acquisition electronics group coordinates and implements electronic hardware for photon beam lines and experiments and is developing a modular firmware programming environment, which fulfil the described requirements and provide an easy-to-use and flexible framework.

This paper gives an overview of the data acquisition electronic hardware and developments, streaming concepts and the FPGA firmware programming framework under development.

### PS1-7: Performance Evaluation of 8-Channel ADC ATCA Card for Direct Sampling of 1.3 GHz Signals

#### S. Bou Habib

Dept. ZUiAM, ISE-WUT/DESY, Warsaw, Poland

Nowadays LLRF control systems for linear accelerators incorporate complex and expensive high-precision field detection receivers with mutlichannel downconverters and low noise LO generation systems. Increasing requirements for field detection precision at most advanced machines reveal limitations of classical LLRF system receivers. Recently developed technology made it possible to design data acquisition cards allowing for direct sampling of cavity field without a need for downconverters. This paper describes the design and measurements of an eight-channel ATCA card developed for the evaluation of direct sampling techniques for 1.3 GHz signals at the FLASH and European XFEL

accelerators. Two versions of the board were tested, each holding a different set of analog-to-digital converters. One was equipped with 400 MSPS, 14-bit ADCs with an analog bandwidth of 1.4 GHz while the other held 500 MSPS, 12-bit ADCs with a bandwidth of 2.3 GHz. The boards were tested in the laboratory and with "accelerator-like" signals and revealed very good results. The paper shows results of the measured sampling parameters, as well as results of different non-IQ sampling schemes with various bandwidths and reaction times for acquiring the amplitude and phase of the analyzed signals and determining the precision of the analysis. Drift measurements for determining the long-term stability are also presented. Achieved results satisfy precision requirements for machines like The European XFEL main LINAC and ILC accelerators.

#### PS1-8: Vector Modulator Card for MTCA-Based LLRF Control System for Linear Accelerators

I. Rutkowski<sup>1</sup>, K. Czuba<sup>1</sup>, D. Makowski<sup>2</sup>, A. Mielczarek<sup>2</sup>, H. Schlarb<sup>3</sup>, F. Ludwig<sup>3</sup> <sup>1</sup>Institute of Electronic Systems, Warsaw University of Technology, Warsaw, Poland <sup>2</sup>Department of Microelectronics and Computer Science, Technical University of Lodz, Lodz, Poland <sup>3</sup>MSK, Deutsches Elektronen Synchrotron, Hamburg, Germany

Modern Low Level Radio Frequency (LLRF) control systems of linear accelerators are designed to achieve precise field amplitude and phase regulation inside accelerating cavities. One of the crucial components of the feedback loop is the vector modulator used to drive the high power RF chain supplying the accelerating cavities. The LLRF control systems for the Free Electron Laser in Hamburg (FLASH) and European X-ray Free Electron Laser (XFEL) are based on emerging (MTCA) platform offering numerous advantages for high performance control systems. This paper describes the concept, design and performance evaluation of world's first Vector Modulator (uVM) module dedicated for LLRF systems compatible with MTCA.4 specification. The module was designed as a double-width, mid-size AMC form factor Rear Transition Module. The uVM board incorporates digital, analog and diagnostic subsystems. The digital part is based on Xilinx Spartan 6 family FPGA, with several fast gigalink connections to the control module. The uVM module is equipped with an Intelligent Platform Management (IPMI) circuit required by MTCA.4 standard. The FPGA controls the analog part, which includes fast, high-precision DACs, I/Q modulator chips, programmable attenuators, power amplifier and fast RF gates for external interlock access. The RF chain can be adopted to different carrier frequencies covering frequency range from 50 MHz to 6 GHz. The design has been carefully optimized for high linearity and low output signal phase noise. The diagnostic system of RF chain allows to monitor input and output power levels and detect failures in RF part. The low-noise and high performance clocking system makes the uVM an universal device for applications exceeding the LLRF control system. Extensive tests of the board were performed and measurement results are presented and discussed in this paper.

#### PS1-9: RF Backplane for MTCA.4 Based LLRF Control System

<u>K. Czuba<sup>1</sup></u>, M. Hoffmann<sup>2</sup>, T. Jezynski<sup>2</sup>, F. Ludwig<sup>2</sup>, H. Schlarb<sup>2</sup> <sup>1</sup>Institute of Electronic Systems, Warsaw University of Technology, Warsaw, Poland <sup>2</sup>MSK, DESY, Hamburg, Germany

The Low Level RF (LLRF) control system developed for linear accelerator based Free Electron Lasers (FEL) require real-time processing of thousands RF signals with very challenging RF field detection precision. To provide a reliable, maintainable and scalable system a new development of the LLRF control based on MTCA.4 architecture was started in DESY for FLASH and European-XFEL. In contrast to standard RF control systems realized in 19" modules, we could demonstrate setup with field detection, RF generation, RF distribution, DAQ system and the high-speed real-time processing entirely embedded in the MTCA.4 crate system. This unique scheme embeds ultra-high precision analog electronics for detection on the Rear Transition Module (RTM) with powerful digital processing units on the Advanced Mezzanine Card (AMC). To increase system reliability, maintainability and reduce performance limitations by RF cabling network, we developed and embeded in the MTCA.4 crate an unique RF Backplane (uRFB) for RTM cards. This backplane is used for distribution of high-performance Local Oscillator (LO), RF and low-jitter clock signals together with low-noise analog power supply to analog RTM cards in the system. In this paper we present the architecture of the MTCA.4 crate with the uRFB, the backplane design and successful laboratory test results of the LLRF control System demonstrating the performance of our development.

#### PS1-10: Timing Distribution and Synchronization of an ATCA Fast Controller for Fusion Devices

<u>M. Correia<sup>1</sup></u>, J. Sousa<sup>1</sup>, B. B. Carvalho<sup>1</sup>, A. Combo<sup>1</sup>, A. P. Rodrigues<sup>1</sup>, A. J. N. Batista<sup>1</sup>, B. Santos<sup>1</sup>, P. R. F. Carvalho<sup>1</sup>, B. Goncalves<sup>1</sup>, C. M. B. A. Correia<sup>2</sup>, C. A. F. Varandas<sup>1</sup>

<sup>1</sup>Associacao EURATOM/IST, Instituto de Plasmas e Fusao Nuclear, Instituto Superior Tecnico - Universidade Tecnica de Lisboa, Lisboa, Portugal

<sup>2</sup>Centro de Instrumentacao, Departamento de Fisica, Universidade de Coimbra, Coimbra, Portugal

An ATCA-based fast plant system controller was developed by IPFN to meet ITERs requirements. Timing synchronization and distribution for all hardware units in the ATCA shelf is implemented on a PCIe-based, AMC quad-carrier module that includes a timing switch. This hardware architecture allows for AMC COTS products to be rapidly integrated into the system, resulting in innumerous possibilities for PCIe host and ATCA/AMC clock source locations. For this purpose, timing distribution and synchronization is fully programmable. Each ATCA/AMC clock may originate from any physical location, to be distributed elsewhere within the ATCA shelf. This is achieved via firmware, implemented in a Virtex-6 Field Programmable Gate Array (FPGA). Firmware also includes IEEE-1588-2008 and White Rabbit over Ethernet support. This paper describes the developed firmware and demonstrates the timing and synchronization distribution for several configuration possibilities enabled with this architecture, as well as its implementation and role on the ITER prototype fast plant system controller.

#### PS1-11: Intelligent Platform Management Controller Software Architecture in ATCA Modules for Fast Control Systems

<u>A. P. Rodrigues<sup>1</sup></u>, M. Correia<sup>1</sup>, A. J. N. Batista<sup>1</sup>, P. R. Carvalho<sup>1</sup>, B. Santos<sup>1</sup>, B. B. Carvalho<sup>1</sup>, J. Sousa<sup>1</sup>, B. Goncalves<sup>1</sup>, C. C. M. B. Correia<sup>2</sup>, C. A. F. Varandas<sup>1</sup>

<sup>1</sup>Associacao EURATOM/IST, Instituto de Plasmas e Fusao Nuclear, Instituto Superior Tecnico - Universidade Tecnica de Lisboa, Lisbon, Portugal

<sup>2</sup>Departamento de Fisica/Centro de Instrumentacao, Universidade de Coimbra, Coimbra, Portugal

In complex control systems, such as the required for plasma control in nuclear fusion experiments, eg. International Thermonuclear Experimental Reactor (ITER), hardware failure, redundancy, power management, maintenance and firmware version control are very important issues. To overcome these constraints in fast control and data acquisition systems of large-scale physics experiments, where high-availability is crucial, an Intelligent Platform Management Controller (IPMC) was developed by IPFN/IST. This controller was integrated in the Advanced Telecommunications Computing Architecture (ATCA) modules that are developed aiming to be included in the instrumentation catalogue of tokamak ITER. This controller in addition with the shelf manager module of the ATCA crate is responsible for the hardware management, such as hardware failure and redundancy procedures, hot insertion/removal of the modules operation, compatibility between modules that share ATCA resources, power management of each module, ATCA modules firmware management (programming, application specific program selection and version control), temperature monitoring and other module specific configurations, granting high availability to the control and data acquisition system. In this paper the software architecture of the implemented IPMC module is described.

#### **PS1-12: Firmware Upgrade in xTCA Sytems**

<u>D. Makowski</u><sup>1</sup>, A. Mielczarek<sup>7</sup>, G. Jablonski<sup>1</sup>, P. Predki<sup>1</sup>, T. Jezynski<sup>2</sup>, H. Schlarb<sup>2</sup>, A. Napieralski<sup>1</sup> <sup>1</sup>Department of Microelectronics and Computer Science, Technical University of Lodz, Lodz, Poland <sup>2</sup>Deutsche Elektronen-Synchotron, Hamburg, Germany

The Advanced Telecommunications Computing Architecture (ATCA) and Micro-Telecommunications Computing Architecture (uTCA) standards, collectively known as xTCA, provide a flexible and scalable infrastructure for designing complex control and data acquisition systems. The xTCA standards are becoming more and more popular in physics applications. Programmable devices, such as Field Programmable Gate Arrays (FPGAs), conventional and Digital Signal Processors (DSPs) are present on Advanced Mezzanine Card (AMC) modules and ATCA blades used in the xTCA crates. Those devices typically boot from non-volatile memories available on the modules. In case of FPGAs, the firmware is usually stored in serial or parallel memories. Other components, e.g. processors, can be booted using external interfaces (Ethernet, USB, etc.). During firmware development dedicated programmers are used. The tools provided by manufactures allow to download firmware to non-volatile PROMs (Programmable Read-Only Memories) and provide debugging functionality. The programmers can be connected using a standard JTAG connector or a proprietary debug connector. Since xTCA systems are composed of various AMC or ATCA modules, with many on-board programmable devices, a large number of different tools can be required. uTCA or ATCA chassis may house more than 10 distinct cards. In case of laboratory development, firmware upgrade can be performed using only a few dedicated programmers, upgrading each programmable device one by one. The situation is more difficult when the xTCA hardware is used to control a complex machine. In this case, the devices cannot be easily accessed. For example, the Low Level Radio Frequency (LLRF) control system of the European X-ray Free Electron Laser (XFEL) will be installed inside an accelerator tunnel. The system will be composed of hundreds of programmable devices. Therefore, the firmware upgrade cannot be performed using only programmers. Along with the deployment of complex control or data acquisition systems, the need for remote and automated firmware upgrade solution becomes urgent. This paper deals with an universal framework and set of tools for upgrading firmware in xTCA systems. The proposed framework uses a fat pipe region interface of uTCA backplane for firmware data transmission and the Intelligent Platform Management Interface (IPMI) standard for PROM memory management and control of the upgrade procedure. The proposed firmware update framework has been tested with the uTCA-based LLRF control system of the Free Electron Laser at Hamburg (FLASH). The prototype LLRF control system of the accelerator is composed of a few AMC modules using FPGA and DSP devices. Those modules include a digitizer, vector modulator, timing and data processing cards.

#### PS1-13: Standalone First Level Event Selection Package for the CBM Experiment

I. Kisel<sup>1,2,3</sup>, I. Kulakov<sup>1,3</sup>, M. Zyzak<sup>1</sup>

<sup>1</sup>Goethe University Frankfurt, Frankfurt am Main, Germany
<sup>2</sup>FIAS Frankfurt Institute for Advanced Studies, Frankfurt am Main, Germany
<sup>3</sup>GSI Helmholtzzentrum fuer Schwerionenforschung, Darmstadt, Germany

The CBM (Compressed Baryonic Matter) experiment is an experiment being prepared to operate at the future Facility for Anti-Proton and Ion Research (FAIR, Darmstadt, Germany). Its main focus is the measurement of very rare probes, which requires interaction rates of up to 10 MHz. Together with the high multiplicity of charged tracks produced in heavy-ion collisions, this leads to huge data rates of up to 1 TB/s. Most trigger signatures are complex (short-lived particles, e.g. open charm decays) and require information from several detector sub-systems. First Level Event Selection (FLES) in the CBM experiment will be performed on-line on a dedicated processor farm. This requires the development of fast and precise reconstruction algorithms suitable for on-line data processing. The algorithms have to be intrinsically local and parallel and thus require a fundamental redesign of traditional approaches to event data processing in order to use the full potential of modern many-core CPU/GPU architectures. Massive hardware parallelization has to be reflected in mathematical and computational optimization of the algorithms.

The Cellular Automaton (CA) algorithm is used for track reconstruction. The CA algorithm creates short track segments (triplets) in each three neighboring stations, then links them into track-candidates and selects them according to the maximum length and minimum chi^2 criteria. The algorithm is optimized with respect to time, vectorized, fully implemented in single precision and robust with respect to the detector geometry and inefficiency. Reconstruction of minimum-bias heavy-ion collisions shows 98% efficiency for most of signal particles and speed of 11 ms per event per core. The Kalman filter (KF) based track fit is used for precise estimation of track parameters.

The KFParticle package for short-lived particles reconstruction, based on the Kalman filter, has rich functionality: the complete particle reconstruction with momentum and covariance matrix calculation; reconstruction of decay chains; daughter particles can be added one by one; simple access to parameters of the particle, such as mass, lifetime, decay length, rapidity, and their errors; transport of the particle; estimation of the distance between particles etc. The KFParticle package has been also vectorized using the SIMD instructions set.

An overview of the on-line FLES processor farm concept, different levels of parallel data processing in the farm from the supervisor down to the multi-threading and the SIMD vectorization, implementation of the algorithms in single precision, memory optimization, scalability on up to 80 CPU cores, efficiency, precision and speed of the FLES algorithms are presented and discussed.

#### **PS1-14: The XFEL RF Interlock System**

<u>M. Penno<sup>1</sup></u>, H. Leich<sup>1</sup>, T. Grevsmuehl<sup>2</sup>, C. Rueger<sup>1</sup>, K. Machau<sup>2</sup> <sup>1</sup>EL/Z, DESY Zeuthen, Zeuthen, Germany <sup>2</sup>MHF-p, DESY Hamburg, Hamburg, Germany

A technical interlock system has to prevent any damage from the cost expensive components of the RF stations. The system monitors the behaviour of various system components, collects and processes status information in real-time and reports actual status to the control system. The system is based on self diagnostic and repair strategies to obtain maximum reliability and maximum time of operation. It incorporates a controller and slave modules that perform the I/O operation. The interlock logic is implemented in hardware and operates independent from the software running on the controller. A dedicated backplane with a custom bus protocol has been developed to optimize the data transfer between the interlock controller and the interlock slave modules. The controller utilizes a single board computer that runs a Linux based embedded operating system. The software performs a self-test after power up which includes testing all hardware components, checking all firmware revisions and also validates the system configuration. Furthermore, it provides a TINE server that connects to the control system and provides status signals, analogue values and plots in real time. The XFEL RF Interlock System will be used in the XFEL facility (DESY, Hamburg Site) to protect 27 RF Stations. Since the system will be installed in the XFEL tunnel near the accelerating equipment, provisions are taken to detect and react to Single Event Upset (SEU). The presentation will present the overview of XFEL RF Interlock System, the concept, interfaces and its components.

#### PS1-15: Development and Calibration of a Real-Time Airborne Radioactivity Monitor Using Gamma-Ray Spectrometry on a Particulate Filter

R. Casanovas<sup>1</sup>, J. J. Morant<sup>2</sup>, M. Salvado<sup>1</sup>

<sup>1</sup>Unitat de Fisica Medica, Universitat Rovira i Virgili, Reus, Spain <sup>2</sup>Servei de Proteccio Radiologica, Universitat Rovira i Virgili, Reus, Spain

The main objective of an automatic real-time surveillance network is to detect anomalous levels of radioactivity in the environment as quickly as possible. If gamma-ray spectrometry is used, rather than gross counting, it is also possible to identify the involved isotopes in a radiation level increment. This enables to discriminate the naturally occurring radionuclides from the artificial ones. In addition, using gamma-ray spectrometry, the activity concentration of the detected isotopes can be determined, making it possible to establish automatic alerts based on the limit levels provided by the legislation.

For this reason, the use of real-time gamma-ray spectrometry systems in environmental radiation surveillance networks has become common. In this work, we present the general development and calibration aspects of a real-time airborne radioactivity monitor (patent pending). The monitor is based on gamma-ray spectrometry with NaI(TI) or  $LaBr_3(Ce)$  scintillators and permits, in real-time, to identify and quantify the airborne radioactive isotopes.

The system comprises a suction pump that circulates a constant flow of air through a particulate filter, which is used to concentrate the airborne isotopes. The active part of the filter is faced to a 2"x2" NaI(Tl) or LaBr<sub>3</sub>(Ce) detector connected to a multichannel analyzer and the energy spectrum is measured. Both the detector and the filter are inside a Pb shielding, which is used to reduce the surrounding radiation background. After the selected integration time, the filter is displaced to obtain the next set of measurements in a clean sheet of filter. The monitor has multiple control sensors (detector and air temperature, temperature in the rack, air flow, amount of filter available, positioning and traveling distance of the filter, filter break sensor, etc.). Software was specifically designed for local or remote control to manage data collection and storage, information transmission, sensors management, information on operating parameters, graphical representations of spectra, calculations, etc. The system also comprises a meteorological station for observing the atmospheric conditions. Besides, the monitor can easily integrate several devices, such as Geiger detectors, to complement the radiological measurements.

The calibration of the monitor was performed experimentally, except the efficiency calibration, which was set using Monte Carlo simulations. For the simulations, a user code and a model of the system geometry for the EGS5 system was prepared and validated with experimental measurements. Although the calibration methodology is independent of the scintillation crystal used, the capabilities and performance of the monitor are not. Thus, we finally discuss some characteristics of the monitor when using the different crystals.

### PS1-16: Using Data-Oriented Storage Method to Build a High-Parallel and High-Efficiency Disk Cluster

<u>J. Wu</u><sup>1,2</sup>, L. F. Liu<sup>1,2</sup>, Z. Han<sup>1,2</sup>, S. Chen<sup>1,2</sup>, J. Shan<sup>1,2</sup>, K. Y. Tian<sup>1,2</sup>, J. Dong<sup>1,2</sup> <sup>1</sup>Department of Modern Physics, University of Sci.&Tech. of China, Hefei, Anhui, China, 230026 <sup>2</sup>State Key Laboratory of Particle Detection & Electronics, University of Sci.&Tech. of China, Hefei, Anhui, China, 230026

In high energy physics and seismic physics experiments, mass data is produced. The processing of such data sets is a huge challenge for modern architectures. Many research papers show that during the procedure of data processing the disk IO speed is always the bottleneck, because the development of hard disk technology is much slower than the processor. The Solid State Disk (SSD) is a good way to get a high IO speed. But the flash memories, which are used to build SSD, face significant scaling challenges due to their dependence upon reductions in lithographic resolution as well as fundamental physical limitations beyond the 22 nm process node. So for the situations of mass data storage, hard disk is irreplaceable for the immediate future, and it is necessary to find a way to speed up the IO performance. The inherent mechanical motion factors of hard disk lead to that it is hard to get a significant speed promotion, so the only way is to use parallel disk access technology. The key points of parallel are high parallel volume and high parallel efficiency. Currently, RAID is widely used to get a high IO performance. But RAID needs all the hard disks to be connected with the controller. The physical length of the traces due to the signal integrity limits the maximum number of disks. An Internet Small Computer System Interface (iSCSI) can also be used to build iRAID that is possible to build high volume nodes. But iRAID still needs a computer to act as a center controller. Related research shows the controller is likely to be the bottleneck when the number of storage nodes increases. This leads a low parallel efficiency.

We noticed in these experiments the data that they generated can be treated as some independent units, such as the events for high energy

physics and the traces for seismic physics. Based on the independence of those data units, a new data-oriented storage method is introduced. Unlike the traditional file based storage, the smallest storage elements are the data units which are generated during the experiments. Under a uniformly and randomly distributed HASH arithmetic, these data units are distributed in several hard disks. This method makes these disks work independently, and the coupling between disk to disk is eliminated, which makes it possible to get a huge number of hard disk to work parallelly while no performance drops. Through this way a high speed IO performance is achieved. An experiment result shows with the joint of 23 hard disks a system can do 3680 IO transfers per second (IOPS). The average IOPS is 160.02, comparing with the 161.16 IOPS for single disk mode only 0.7% performance decreases due to the parallelization. By using the data-oriented storage method, it is also possible to realize data-awared prefetching, which shows a dramatically IO performance improvements. Varying with the scale of the data processing time and the data fetch time, the maximum 97.8% promotion is achieved.

#### PS1-17: Asynchronous and Synchronous Implementations of the Autocorrelation Function for the **FPGA X-Ray Pixel Array Detector**

M. S. Hromalik<sup>1,2</sup>, K. S. Green<sup>2</sup>, H. T. Philipp<sup>2</sup>, M. T. W. Tate<sup>2</sup>, S. M. Gruner<sup>2,3</sup>

<sup>1</sup>Computer Sciene, State University of New York at Oswego, Oswego, NY, United States

<sup>2</sup>Laboratory of Atomic and Solid State Physics (LASSP), Cornell University, Ithaca, NY, United States

<sup>3</sup>Cornell High Energy Synchrotron Source (CHESS), Ithaca, NY, United States

Abstract The design of the Field Programmable Gate Array Pixel Array Detector (FPGA PAD) prototype and initial experimental results of real-time implementations of its autocorrelation function are presented. This is a pixelated 2D silicon device for detecting X-rays in X-ray Diffraction Experiments and is comprised of three layers: the diode detection and ASIC analog electronics layers connected by a massively parallel interface to a third FPGA layer consisting of a Xilinx XC6VLX550T device. A high-speed labor intensive asynchronous interface as well as a more traditional synchronous interface will be presented. Traditionally X-ray PADs have been application-specific as their functionality is built into the ASIC layer. In the FPGA PAD, however, the ASIC layer consists of a simple photon counting front end with a single-bit digitized output to the FPGA layer. As most of the functionality is migrated to the FPGA layer, the reconfigurability of the FPGA allows for great flexibility in terms of detector applications. The massively parallel connection between the ASIC layer and the FPGA layer also allows for data-flow implementations of detector algorithms on the parallel input bit array. Real-time data processing realizes lower data transfer rates to offline storage and higher time-resolution during experiments. An example application of a real-time autocorrelation function (ACF) for X-ray Photon Correlation Spectroscopy (XPCS) experiments is also described for a prototype of the FPGA PAD. Both a synchronous implementation and a very high-speed Region of Interest asynchronous implementation were designed. A time resolution range of 100ns to 1s was achieved for the synchronous implementation and a maximum resolution down to 36ns was realized for the Asynchronous Implementation. The required data transfer rate was also reduced from 2.56 Gb/s to 4.4Mb/s over the entire array

### **PS1-18: Real-Time Fast Controller Prototype for J-TEXT Tokamak**

W. Zheng, M. Zhang, G. Zhuang, C. Weng, R. Liu, Y. He, T. Ding, X. Zhang,

College of Electrical & Electronic Engineering, Huazhong University of Science & Technology, Wuhan, China

AbstractThe operation of a tokamak device is highly sophisticated, which usually requires high performance real-time controllers. The ITER Control, Data Access and Communication (CODAC) team has made standards for Fast Controller (FC). Following the ITER CODAC standards we have designed the real-time FC (RTFC) for J-TEXT. The RTFC is a FC with dedicated software structure for real-time control. It servers as a design template for all real-time controllers on J-TEXT, and with minor modification it can be used for various application like Plasma Control and real-time diagnosis.

The RTFC mainly features the PXIe bus with multi-core processor hardware, the Reflective-Memory technology and the IEEE-1588 Precision Time Protocol (PTP). It is capable of performing close-loop control with a time cycle below 1 ms. The real-time FC supports the Experimental Physics and Industrial Control System (EPICS), it can be monitored and configured using EPICS, and will work autonomously when integrated into the J-TEXT CODAC system. Preliminary testing results based on a prototype used as the J-TEXT Vertical Field Power Supply Controller will also be presented.

**PS1-19: A Dedicated Processor for Monte Carlo Computation in Radiotherapy** <u>C. Pili</u><sup>1,2</sup>, V. Fanti<sup>1,2</sup>, G. R. Fois<sup>1,2</sup>, R. Marzeddu<sup>1,2</sup>, P. Randaccio<sup>1,2</sup>, S. Siddhanta<sup>1,2</sup>, J. Spiga<sup>1,2</sup>, A. Szostak<sup>1,2</sup> <sup>1</sup>Department of Physics, University of Cagliari, Cagliari, Italy <sup>2</sup>INFN Sez. Cagliari, Cagliari, Italv

A high speed Monte Carlo simulator for radiotherapy is being developed at INFN, Cagliari. During radiotherapy treatment planning, when performing Monte Carlo simulations of the radiation dose delivered to the human body, the Compton interaction of a photon with an electron forms an important part. Monte Carlo simulations of the radiation dose delivered to the human body gives precise results over empirical methods but at the cost of computing time. A fast, fully, pipelined, cost effective design for real time simulation of the Compton interaction and dose calculation had been implemented on FPGA based hardware, running at more than 100 MHz, making it feasible to perform high speed Monte Carlo simulations for practical purposes and permit the real time building of maps of dose distribution. A performance comparison is also being made with an implementation on graphic processors.

#### PS1-20: New RFX-Mod Feedback Control System Based on MARTe Real-Time Framework

G. Manduchi, A. Luchetta, C. Taliercio, A. Soppelsa Consorzio RFX, Padova, Italy

A real-time system has been used since 2004 in the RFX-mod nuclear fusion experiment to control the plasma equilibrium configuration and the Magneto Hydrodynamic (MHD) modes. The system is implemented as a network of eight VME racks, each hosting a PowerPC computer and I/O boards, communicating via GBit/s Ethernet. The system handles about 700 input signals and produces about 250 reference waveforms

driving the power supply feeding the coils used for plasma position and MHD control. The system operates at a rate of 2.5 kHz with an overall latency of 1.5 ms, higher than the period due to its pipelined organization. The system has been working successfully for seven years, but its latency and limits in computation power prevent the use of the system with new, more computation-intensive control algorithms. To overcome such limitations, a new hardware and software architecture has been developed and a new system provides now a shorter latency and a much increased computation power. Despite its radically different hardware organization, using one multi-core server in place of multiple VME CPUs, the conceptual distributed organization has been retained and a one-to-one mapping between former computers and server cores has been defined, with the possibility of integrating additional cores for future use. Shared memory is now used for communication in place of Ethernet communication, thus removing one of the major bottlenecks of the old system. Generation of the reference waveforms is now achieved using PXI technology, but, due to budget constraints. VME-based data acquisition has been retained in this first stage, using UDP communication to send acquired raw data to the control server. Replacement of VME ADC modules with ATCA-based ones is foreseen as a further step. Two major changes in software have been carried out in the new system: the replacement of VxWorks with real-time Linux and the usage of MARTe, a framework for real-time applications with a growing usage in the fusion community. MARTe provides all functionality that is required to handle supervision and real-time data communication for a configurable set of real-time threads, which are then mapped against the cores of a multi-core server. Every real-time thread executes in cycle a sequence of Generic Application Modules (GAMs) providing the required interaction with the underlying hardware as well as the implementation of control algorithms. Developers can therefore concentrate on the specific components whose configuration (such as the number of threads and the components for each thread) is defined in a property file. The modular approach provided by MARTe has allowed not only a rapid development of the new system, but also its rapid prototyping. By replacing the components for data acquisition with others getting stored input raw data from the experiment database, it has been possible in fact to fully test the control algorithms before system commissioning.

### PS1-21: Real Time FPGA-Based Crosstalk Elimination for Multichannel Interferometry Systems in Fusion Diagnostics

S. Hernandez-Montero<sup>1</sup>, <u>J. A. Lopez-Martin</u><sup>1</sup>, M. Sanchez<sup>2</sup>, L. Esteban<sup>2</sup> <sup>1</sup>Departamento de Ingenieria Electronica, Universidad Politecnica de Madrid, Madrid, Spain <sup>2</sup>Laboratorio Nacional de Fusion, CIEMAT, Madrid, Spain

Infrared (IR) interferometry is a well-known method for measuring the Line-Integrated electron Density (LID) of fusion plasmas. In the TJ-II stellarator, an FPGA-based IR- interferometer has been recently installed to accurately measure the LID of the plasma in real time.

To guarantee the correct functionality of the interferometer and achieve high precision rates, it is essential to maximize the Signal-to-Noise Ratio (SNR) of the output density signal. In the measurement process, one of the most important distortion sources is the crosstalk or interchannel interference. Thus, in order to increase the SNR of the system, a crosstalk reduction stage has been designed and implemented in a FPGA.

This paper shows a novel crosstalk elimination algorithm that has been optimized for its high-performance hardware implementation. Since the algorithm operates over the complex spectrum of the signals, the N-point Fast Fourier Transform (FFT) is initially performed. Afterwards, the inner product between the spectrums is used to reconstruct an estimation of the transfer function of the interfering system and this reconstruction is used to eliminate the interference in the frequency domain. Finally, the N-point inverse FFT is carried out to obtain an improved version of the time signal required in the phase detection block.

In addition, an M-factor pre-downsampling stage has been included to increase the frequency resolution of the algorithm. This stage downconverts the input signals into low frequency aliases, which decreases the sample frequency as the overall system resolution is signicantly increased.

This procedure, in combination with the phase detection algorithm currently applied in the TJ-II, performs the required operations in a few microseconds, which allows extracting an accurate measurement of the LID in real time, and enables the capability of controlling the heating systems of the fusion reactor using a feedback loop. This is possible because the algorithm is implemented as a initial block of the processing stage, in contrast with the existent algorithms which are usually applied in post-processing. In our approach, improved results are obtained by eliminating the interference in the detected raw signal.

### PS1-22: A Real-Time Architecture for the Identification of Faulty Magnetic Sensors in the JET Tokamak

A. C. Neto<sup>1</sup>, D. Alves<sup>1</sup>, B. B. Carvalho<sup>1</sup>, G. De Tommasi<sup>2</sup>, R. Felton<sup>3</sup>, H. Fernandes<sup>1</sup>, P. J. Lomas<sup>3</sup>, F. Maviglia<sup>2</sup>, F. G. Rimini<sup>3</sup>, F. Sartori<sup>4</sup>, A. V. Stephen<sup>3</sup>, <u>D. F. Valcarcel<sup>1</sup></u>, L. Zabeo<sup>5</sup>

<sup>1</sup>EURATOM-IST, Lisbon, Portugal <sup>2</sup>EURATOM-ENEA/CREATE, Naples, Italy <sup>3</sup>EURATOM-CCFE, Abingdon, United Kingdom <sup>4</sup>Fusion for Energy, Barcelona, Spain <sup>5</sup>ITER Organisation, Cadarache, France

In a tokamak, the accurate estimation of the plasma boundary is not only essential to maximise the fusion performance but is also the first line of defence for the physical integrity of the device. In particular, the first wall components might get severely damaged if over-exposed to a high plasma thermal load.

The most common approach to calculate the plasma geometry and related parameters is based in a large set of different types of magnetic sensors. Using this information, real-time plasma equilibrium codes infer a flux map and calculate the shape and geometry of the plasma boundary and its distance to a known reference (e.g. first wall). These are inputs to one or more controllers capable of acting on the shape and trajectory based in pre-defined requests.

Depending on the device, the error of the estimated boundary distance must usually be less than 1 centimetre, which translates into very small errors on the magnetic measurement itself. Moreover, asymmetries in the plasma generated and surrounding magnetic fields can produce local shape deformations potentially leading to an unstable control of the plasma geometry.

The JET tokamak was recently upgraded to a new and less thermally robust all-metal wall, also known as the ITER-like wall. Currently the shape controller system uses the output of a single reconstruction algorithm to drive the plasma geometry and the protection systems have no input from the plasma boundary reconstruction. These choices are historical and were due to architectural, hardware and processing power limitations.

Taking advantage of new multi-core systems and of the already proved robustness of the JET real-time network, this paper proposes a distributed architecture for the real-time identification of faults in the magnetic measurements of the JET tokamak. Besides detecting simple faults, such as short-circuits and open-loops, the system compares the expected measurement at the coil location and the real measurement, producing a confidence value. Several magnetic reconstructions, using sensors from multiple toroidally distributed locations, can run in parallel, allowing for a voting or averaging scheme selection. Finally, any fault warnings can be directly fed to the real-time protection sequencer system, whose main function is to coordinate the protection of the JET's first wall.

#### **PS1-23: Parallel Task Management Library for MARTe**

<u>D. F. Valcarcel<sup>1</sup>, D. Alves<sup>1</sup>, A. Neto<sup>1</sup>, C. Reux<sup>2</sup>, B. B. Carvalho<sup>1</sup>, R. Felton<sup>3</sup>, P. J. Lomas<sup>3</sup>, J. Sousa<sup>1</sup>, L. Zabeo<sup>4</sup>, JET EFDA Contributors<sup>\*5</sup> <sup>1</sup>Associacao EURATOM/IST, Instituto de Plasmas e Fusao Nuclear, Instituto Superior Tecnico, UTL, P1049-001 Lisboa, Portugal <sup>2</sup>Ecole Polytechnique, LPP, CNRS UMR 7648, 91128 Palaiseau, France <sup>3</sup>Euratom/CCFE Fusion Association, Culham Science Centre, Abingdon, Oxon, OX14 3DB, UK <sup>4</sup>ITER Organisation, Cadarache, France</u>

<sup>5</sup>JET-EFDA, Culham Science Centre, OX14 3DB, Abingdon, UK

The Multithreaded Application Real-Time executor (MARTe) is a real-time framework with increasing popularity and support in the thermonuclear fusion community. It allows to run modular code in a multi-threaded environment leveraging on the current multi-core processor (CPU) technology. One application that relies on the MARTe framework is the JET tokamak WAll Load Limiter System (WALLS). It calculates and monitors the temperature on metal tiles, plasma facing components (PFCs) that can melt or flake if their temperature gets too high when exposed to power loads. One of the main time consuming tasks in WALLS is the calculation of thermal diffusion models in real-time. These models tend to be described by very large state-space models thus making them perfect candidates for parallelisation. MARTe's traditional approach for task parallelisation is to split the problem into several Real-Time Threads, each responsible for a self-contained sequential execution of an input-to-output chain. This is usually possible, but it might not always be practical for algorithmic or technical reasons. Also, it might not be easily scalable with an increase of the available number of CPU cores. The WorkLibrary introduces a GPU-like way of splitting work among the available cores of modern CPUs that is straightforward to use in an application and scalable without code rewrite or recompilation.

The first part of this article explains the motivation behind the library, its architecture and implementation. The second part presents a real application for WALLS, a parallel version of a large state-space model describing the 2D thermal diffusion on a JET tile.

\*See the Appendix of F. Romanelli et al., Proceedings of the 23rd IAEA Fusion Energy Conference 2010, Daejeon, Korea

### PS1-24: A Real-Time Data Transmission Method Based on Linux for Physical Experimental Readout Systems

<u>P. Cao</u><sup>1,2</sup>, K. Song<sup>1,2</sup>, J. Yang<sup>1,2</sup>, K. Zhang<sup>1,2</sup>

<sup>1</sup>State Key Laboratory of Particle Detection and Electronics, HeFei/AnHui, China <sup>2</sup>Dept. of Modern Phys., University of Science and Technology of China, HeFei/AnHui, China

In a typical physical experimental instrument, the readout system generally implements an interface between the data acquisition (DAQ) system and the front-end electronics (FEE). The key task of a readout system is to read, pack and forward the data from the FEE to the backend data concentration center in real-time. To guarantee the real-time performance, VxWorks operating system (OS) is widely used in readout systems. However, VxWorks is not an open source OS which makes it has many disadvantages. With the development of multi-core processor and new scheduling algorithm, Linux OS exhibits the similar performance in real-time applications compared to VxWorks. It has been successfully used even for some hard real-time systems. Discussions and evaluations of real-time Linux solutions for a possible replacement of VxWorks arose naturally. In this paper, a real-time transmission method based on Linux is introduced. To reduce the number of transfer cycle for a large amount of data, a large block of contiguous memory buffer for DMA transfer is allocated by modifying the Linux Kernel (version 2.6) slightly. To increase the throughput for network transmission, the user software is designed into formation of parallelism. To guarantee the read-time performance of data transfer from hardware to software, mapping technique is used to avoid unnecessary data copying. A simplified readout system is implemented with 4 readout modules in a PXI crate. This system can support up to 48Mbytes/sec data throughput from the front-end hardware to the back-end concentration center through a Gigabit Ethernet connection. Theres no any restriction to use this method, hardware or software, which makes it can be migrated to other interrupt related applications easily.

#### **PS1-25: A Single-FPGA Full-Time Beam Former**

H. Deschamps DSM/IRFU/SEDI, Comissariat a l'Energie Atomique, GIF sur YVETTE, France

A full-time beam former for two independent antenna groups, with visibility computation capabilities at a slower rate, have been formerly designed on a single FPGA for the BAO-radio instrument, a radio telescope demonstrator for the study of dark energy by HI probing technique. On the same FPGA, a firmware dedicated for the FAN project at the Nancay radio telescope have been designed, and can provide a full-time dual beam-forming on a single antenna group. It can process an incoming data flow of twelve channels organized each as complex spectrum (2x8bits) of 4096 frequency with a 4GbPS effective rate. The dual-beam capability of the system has been successfully tested by transits of radio sources (Cas A, 3C123) and further observations with source-tracking and building of visibility matrix will be done.

# PS1-26: A Two-Stage Distributed Architecture Designed for DAQ of Thousands-Channel Physical Experiment

<u>K. Song</u><sup>1,2</sup>, P. Cao<sup>1,2</sup>, J. Yang<sup>1,2</sup>

<sup>1</sup>Modern Physics Dept., University of Science & Technology of China, HeFei, AnHui, China <sup>2</sup>State Key Laboratory of Particle Detection and Electronics, Hefei, Anhui, China

Today, the channels of some physical experimental data acquisition systems achieve thousands. It is impossible to use the centralized architecture to deal with these large amounts of channels' DAQ. This manuscript presents a two-stage distributed architecture for DAQ of thousands-channel physical experiment. The architecture can be divided into two units, the upper unit is for the data collection and storage, also including main control, quality monitoring, and data recording, the lower unit is for data sampling and transmission.

One block has been built for four cables (fiber or twisted-pair electric cable). One cable cascades hundreds of modules for dada sampling and transmitting. The 4-cable block is composed of: a) a CPCI chassis, where 4 FCI (Fiber Channel Interface) boards are inserted gathering data from 4 cables, with each FCI corresponds to a cable; A main board in slot zero of CPCI chassis to receive data from 4 FCIs; b) a VPR (Vision, Plotting and Recording) workstation that receives data uploaded from the main board through Gigabit Ethernet manages data plotting, printing and saving; c) a CCM (Center Controlling and Monitoring) workstation ,which receives decimated data uploaded from FCI through Megabit Ethernet, manages parameters configuration and control, displays decimated data in real time to monitor the current working status; d) other auxiliary components such as Ethernet switchers, plotter, printer and disk array.

The architecture is designed to be expandable, the 4-cable block can be expanded easily to a 16-cable system by using the 4-cable block as building block, so as to meet the demand nowadays of much more channels DAQ. The key problems in multi-cable DAQ architecture are synchronous sampling between all channels, pipeline data transmission and real-time manipulation and recording of large-capacity data. There are three-lay synchronizations for the 16-cable system. They are the synchronizations among the four CPCI chassis, among the hundreds of channels in a cable, respectively. We use both hardware trigger and soft commands to synchronize the different CPCI chassis, and the 4-FCI cables connected to a CPCI chassis. We use clock recovery and PLL to adjust the phase delay, and use the command delay counter to compensate the command transmitting delay in a cable.

We have built a prototype architecture for 16 cables with the associated hardware modules, and tested the sampling synchronization, data transmitting and storage ability by using the cable simulator developed by ourselves. The cable simulator can generate 16 cables' data according to commands and configuration with one cable supporting 1920 channels. Test result shows that the sampling synchronization error between two channels with 100m apart can achieve 1ns. The tested maximum data rate from a cable is 11.52MB/s, thus the total data rate is 1.47456Gbps for 16 cables.

#### PS1-27: An Application Using MicroTCA for Real-Time Event Assembly

R. A. Rivera

Fermilab, Batavia, IL, United States

The Electronic Systems Engineering Department of the Computing Sector at the Fermi National Accelerator Laboratory has undertaken the effort of designing an AMC that meets the specifications within the MicroTCA framework. The application chosen to demonstrate the hardware is the real-time event assembly of data taken by a particle tracking pixel telescope. In the past, the telescope would push all of its data to a PC where the data was stored to disk. Then event assembly, geometry inference, and particle tracking were all done at a later time. This approach made it difficult to efficiently assess the quality of the data as it was being taken -- at times, resulting in wasted test beam time. Now, we can insert in the data path, between the telescope and the PC, a commercial MicroTCA crate housing our AMC. The AMC receives, buffers, and processes the data from the tracking telescope and transmits complete, assembled events to the PC in real-time. In this paper, we report on the design approach and the results achieved when the MicroTCA hardware was employed for the first time during a test beam run at the Fermi Test Beam Facility in 2012.

#### PS1-28: Digital Programmable Emulator and Analyzer of Radiation Detection Setups

A. Geraci, A. Abba, F. Caponio

Dept. of Electronics, Politecnico di Milano, Milan, Italy

We present a digital fully configurable architecture that performs the function of signal generation for emulation of radiation detectors and front-end electronics and the function of signal processor from radiation detectors. Many aspects justify the convenience to develop a system of this type. First of all, the improvement of experimental conditions in absence of radioactive source and detecting apparatus, which means health safety of experimenters and the possibility to perform remote experiments independently from the presence of radioactive sources and detectors. Also quality of the experiment is positively affected. In fact, the availability of the configurable virtual signal source simplifies testing of processors, allows absolute and fair comparison among different processing techniques, permits to directly evaluate algorithms and adjust the processing flow. The proposed architecture has been conceived to serve as a general purpose investigation instrument in digital spectroscopy applications, both at hardware and firmware level. It allows the emulation of all parts of an acquisition and processing setup and consequently implements a real and complete hardware and firmware co-design platform. The paper focuses theoretical and practical topics involved in generation of signals equivalent to those produced by radiation detection systems. Operatively, the signal synthesis process is based on a reference shape, statistics of generation of occurrence times, and statistic distribution of shape amplitudes. It is provided the generation of a couple of consecutive events that can be summed together simulating the occurrence of the pile-up phenomenon. The resulting signal is disrupted by noise and baseline deviation, and is shaped in order to take into account the transfer function of the electronic conditioning stage. The generated signal can be made available as output in digital or analog form. The first choice involves also the possibility of introduction of non-linearity effects and quantization noise, whereas the second one requires a not trivial digital-to-analog conversion process. In particular, the proposed architecture implements an algorithm that retrieves statistic properties of a statistic variable on the basis of its histogram. Moreover, the system allows also to sample an external analog signals in order, for instance, to get shapes and spectra for initialization of the emulation process. From this point of view and using the synergy of emulation and acquisition functions, the system plays also the role of network analyzer for the characterization of preamplifier topologies. A fully configurable processing section allows to test and compare a great variety of algorithms for energy and time estimation, baseline correction and so on. In other words, the proposed solution allows to use the instrument also as a pure configurable digital processor. The system has been prototyped and tested.

### PS1-29: Phase and Amplitude Drift Calibration of the RF Detectors in a MTCA.4 Based LLRF System

<u>J. Piekarski</u><sup>1</sup>, K. Czuba<sup>1</sup>, M. Hoffmann<sup>2</sup>, W. Jalmuzna<sup>3</sup>, F. Ludwig<sup>2</sup>, H. Schlarb<sup>2</sup>, C. Schmidt<sup>2</sup>, B. Yang<sup>2</sup> <sup>1</sup>Institute of Electronic Systems, Warsaw, Poland <sup>2</sup>Deutsches Elektronen-Synchrotron, Hamburg, Germany <sup>3</sup>Department of Microelectronics and Computer Science, Technical University of Lodz, Lodz, Poland

One of the key components of Low-Level RF systems (LLRF) in Free Electron Lasers (FELs) is the RF field detector that converts the detected cavity field signal to an intermediate frequency (IF) for digital sampling. Amplitude and phase drifts appearing in RF field detectors significantly limit the system precision and they can not be corrected automatically by digital control loops basing on standard signals. To solve this problem a drift calibration scheme was developed to measure exact drift values and correct them during LLRF system operation. Because nowadays FELs are operating in the pulse mode, there is a period of time when the detection chain can be characterized by an injection of a reference signal. In order to achieve high accuracy drifts have to be measured just before the normal operation of the RF detector. For that purpose, a special RF Drift Calibration Module (DCM) has been designed which cooperates with the MTCA.4 based LLRF system. In this paper we present the drift calibration method and the DCM design. Laboratory results and tests at the Cryo-Module Test Bench (CMTB) are demonstrated as well.

#### PS1-30: Ultra-Fast Streaming Camera Platform for Scientific Applications

M. Caselle, M. Balzer, S. Chilingaryan, A. Herth, A. Kopmann, U. Stevanovic, M. Vogelgesang *IPE, Karlsruhe Institute of Technology, Karlsruhe, Germany* 

We have developed a novel camera platform for ultra-fast data acquisition, real time signal processing and compression. It is intended for highspeed X-ray tomography within in the project: Ultra-fast Xray imaging with Online data assessment experiment (UFO). The UFO project demands high spatial- and temporal resolutions, down to 1 um at several tens of thousands of frames/s in full streaming mode, and aims to employ image-based feedback loops. The key features of the camera platform are: 1. Continuous data taking at maximum resolution and frame rate with observation times up to several hours. 2. An intelligent signal processing providing features like image-based self-trigger, on-line data reduction and region-of-interest (ROI) readout. 3. The firmware architecture is intended to be open to realize a fully programmable camera to the needs of the application. It is foreseen for implementation of the fastest feedback loops. The hardware setup and the modular FPGA firmware of the camera platform will be presented. The image sensor is integrated on a mezzanine daughter card and connected by a FMC high bandwidth connector to the readout board. The readout board provides programmable logic (FPGA) and a large DDR memory for both temporary data storage and on-line data processing. Finally the camera platform is accomplished by a fast PCI Express cable interface to dedicated GPU compute servers. The firmware development is based on a bus master multichannel DMA architecture to ensure high data throughput. Cyclic-redundancy-check (CRC) logic is used to detect possible errors during data transfer. Real-time data elaboration algorithms for on-line processes like filtering and data compression are applied before sending the data. One of the most important tasks of the signalprocessing unit is the novel intelligent image-based self-event trigger architecture for application for otherwise unpredictable events. The trigger information is used by the readout logic for an efficient ROI readout strategy. This helps to reduce the required bandwidth per frame and can be used to maximize the effective frame rate. A 64-bit Linux driver seamlessly integrates the imaging platform into any GPU server infrastructure. Both ultra-fast streaming camera platform and custom GPU infrastructure ensure an out standing performance for scientific experiments. The first camera demonstrator achieves the maximum frame rate of the image sensor with 340 fps with 2MPixel @ 10bits and a data rate to 1GB/sec. We expect that the current readout architecture is able to reach more than 5GB/sec. The prototype has been tested at ANKA synchrotron machine. The camera platform will be continuously enhanced; e.g. by a new faster image sensor with 50Gb/sec and a highspeed data link (InfiniBand) for tight integration in GPU clusters. Preliminary results and future perspectives are presented.

#### PS1-31: The LHCb off-Site HLT Farm Demonstration

G. Liu, N. Neufeld

CERN, Geneva, Switzerland

The LHCb High Level Trigger (HLT) farm consists of about 1300 nodes, which are housed in the underground server room of the experiment point. Due to the constraints of the power supply and cooling system, it is difficult to install more servers in this room for the future. Off-site computing farm is a solution to enlarge the computing capacity.

In this paper, we will demonstrate the LHCb off-site HLT farm which locate in the CERN computing center. Since we use private IP addresses for the HLT farm, we would need virtual private network (VPN) to bridge both sites. There are two kinds of traffic in the event builder: control traffic for the control and monitoring of the farm and the Data Acquisition (DAQ) traffic. We adopt IP tunnel for the control traffic and Network Address Translate (NAT) for the DAQ traffic. The performance of the off-site farm have been tested and compared with the on-site farm. The effect of the network latency has been studied. To employ a large off-site farm, one of the potential bottleneck is IP tunnel and NAT gateway. To eliminate the bottleneck, we will deploy an FPGA card NetFPGA on the gateway to perform the IP tunneling and NATing.

#### PS1-32: A New Generation of Real-Time Systems in the JET Tokamak

D. M. Alves<sup>1</sup>, A. C. Neto<sup>1</sup>, D. F. Valcrcel<sup>1</sup>, R. Felton<sup>2</sup>, J. M. Lopez<sup>3</sup>, A. Barbalace<sup>4</sup>, L. Boncagni<sup>5</sup>, P. Card<sup>2</sup>, A. Goodyear<sup>2</sup>, S. Jachmich<sup>6,7</sup>, P. J. Lomas<sup>2</sup>, F. Maviglia<sup>8</sup>, P. A. McCullen<sup>2</sup>, A. Murari<sup>4</sup>, M. Rainford<sup>2</sup>, C. Reux<sup>9</sup>, F. Rimini<sup>2</sup>, F. Sartori<sup>10</sup>, A. V. Stephen<sup>2</sup>, J. Vega<sup>11</sup>, R. Vitelli<sup>12</sup>, L. Zabeo<sup>13</sup>, K.-D. Zastrow<sup>2</sup>

<sup>1</sup>Associao EURATOM/IST, Instituto de Plasmas e Fuso Nuclear - Laboratrio Associado, Lisbon, Portugal

<sup>2</sup>EURATOM/CCFE Fusion Association, Culham Science Centre, Abingdon, Oxon, OX14 3DB, Culham, United Kingdom <sup>3</sup>CAEND.Universidad Politcnica de Madrid, Spain, Madrid, Spain

<sup>4</sup>Associazione EURATOM-ENEA sulla Fusione, Consorzio RFX, Padova, Italy, Padova, Italy

<sup>5</sup>Associazione EURATOM/ENEA, 00040 Frascati, Italy, Frascati, Italy

<sup>6</sup>Laboratory for Plasma Physics, Ecole Royale Militaire/Koninklijke Militaire School, EURATOM-Associat, Brussels, Belgium <sup>7</sup>EFDA-CSU, Culham Science Centre, Abingdon, OX14 3DB, UK, Culham, United Kingdom

<sup>8</sup>Associazione EURATOM-ENEA-CREATE, Univ. di Napoli Federico II, Via Claudio 21, 80125, Napoli, Italy, Napoli, Italy <sup>9</sup>Ecole Polytechnique, LPP, CNRS UMR 7648, 91128 Palaiseau, France, Palaiseau, France

<sup>10</sup>Fusion for Energy, 08019 Barcelona, Spain, Barcelona, Spain

<sup>11</sup>Laboratorio Nacional de Fusion, Asociacion EURATOM-CIEMAT, Madrid, Spain, Madrid, Spain

<sup>12</sup>Dipartimento di Informatica, Sistemi e Produzione, Universit di Roma Tor Vergata 00133 Rome, Italy, Rome, Italy

<sup>13</sup>ITER, St. Paul-Lez-Durance 13108, France, St. Paul-lez-Durance, France

Recently a new recipe for developing and deploying real-time systems has become increasingly adopted in the JET tokamak. Powered by the advent of x86 multi-core technology and the reliability of JETs well established Real-Time Data Network (RTDN) to handle all real-time I/O, an official Linux vanilla kernel has been demonstrated to be able to provide real-time performance to user-space applications that are required to meet stringent timing constraints. In particular, a careful rearrangement of the Interrupt ReQuests (IRQs) affinities together with the kernels CPU isolation mechanism allows to obtain either soft or hard real-time behavior depending on the synchronization mechanism adopted. Finally, the Multithreaded Application Real-Time executor (MARTe) framework is used for building applications particularly optimised for exploring multi-core architectures.

In the past year, four new systems based on this philosophy have been installed and are now part of the JETs routine operation. Two of those systems, the Vessel Thermal Map (VTM) and the WAII Load Limiting System (WALLS), are indispensable components of the core machine protection ensemble and are responsible for preventing excessive thermal loads to the JETs first wall. Recently two other systems have been installed. BETALI is a system that calculates important plasma parameters (e.g. internal inductance), in real-time, and makes them available in the RTDN. The Advanced Predictor Of DISruptions (APODIS), currently under commissioning, is a system whose aim is to anticipate plasma disruptions thus potentially allowing for timely protective actions.

The focus of the present work is on the configuration and interconnection of the ingredients that enable these new systems real-time capability and on the impact that the JETs distributed real-time architecture has on system engineering requirements, such as algorithm testing and plant commissioning. Details are given about the common real-time configuration and development path of these systems, followed by a brief description of each system together with results regarding the real-time performance. A comparative jitter analysis of a user-space MARTebased application synchronising over a network when deployed on a vanilla kernel and when deployed on the same kernel with the Messaging Realtime Grid (MRG) patch will also be presented.

#### **TRG: Triggers**

#### Tuesday, June 12 08:40-10:40 Crystal Ballroom TRG-1: A Hardware Tracker Finder (FTK) for ATLAS Trigger G. Volpi

INFN, Frascati, Italy

The existing three level ATLAS trigger system is deployed to reduce the event rate from the bunch crossing rate of 40 MHz to ~400 Hz for permanent storage at the LHC design luminosity of  $10^{34}$  cm<sup>-2</sup> s<sup>-1</sup>. When the LHC reaches beyond the design luminosity, the load on the Level-2 trigger system will significantly increase due to both the need for more sophisticated algorithms to suppress background and the larger event sizes. The Fast TracKer (FTK) is a custom electronics system that will operate at the full Level-1 accepted rate of 100 KHz and provide high quality tracks at the beginning of processing in the Level-2 trigger, by performing track reconstruction in hardware with massive parallelism of associative memories and FPGAs.

The performance in important areas including b-tagging, tau-tagging and lepton isolation will be demonstrated with the MC simulation at different LHC luminosities. The system design will be overviewed. The latest R&D progress of individual components will be presented and related technologies will be discussed.

#### TRG-2: The ALICE High Level Trigger: the 2011 Run Experience.

<u>T. Kollegger (\*)</u> FIAS/University of Frankfurt, Frankfurt, Germany

(\*) for the ALICE Collaboration

The High Level Trigger (HLT) of the ALICE detector system, one of the four big experiments at the Large Hadron Collider (LHC) at CERN, is a dedicated realtime system for online event reconstruction and selection. Its main task is to reduce the large volume of raw data of up to 25 Gbyte/s read out from the detector systems by an order of magnitude to fit within the available data acquisition bandwidth. A dedicated computing cluster of ~220 processing nodes, connected by an Infiniband high-speed network, is in operation to provide the

A dedicated computing cluster of ~220 processing nodes, connected by an infinitiand nigh-speed network, is in operation to provide the necessary computing resources for this task. The available computing power is supplemented by utilizing FPGAs for the first steps of the processing, as well as 64 GPUs which are used at later stages of the event reconstruction.

During the 2011 LHC heavy-ion run, the HLT was for the first time actively used to reduce the data volume. For this the raw data of the Time Projection Chamber, the largest data source in ALICE, was replaced by the results of the online FPGA based cluster finder. A further reduction of the data volume by roughly a factor 4 was achieved by optimizing the data format for a subsequent standard Huffman compression. For this, entropy reducing data transformations have been implemented. In this contribution, we will present the experience gained during the 2011 run, both on the technical and operational levels of the system, as well as from a physics performance point of view. Building on the success, of the 2011 run, possibilities for even more advanced uses of online reconstruction results in the future will be discussed as well.

### **TRG-3:** The evolution and performance of the ATLAS calorimeter-based triggers in 2011 and 2012 M. Wessels<sup>1</sup>, I. Radoslavova Hristova<sup>2</sup>

<sup>1</sup>University of Heidelberg, KIP, Heidelberg, Germany

<sup>2</sup>Humboldt-Universitaet zu Berlin, Berlin, Germany

In 2011 the ATLAS detector at the LHC collected approximately 5fb-1 of protonproton collision data at a centre of mass energy of 7TeV. During the data-taking period the LHC conditions changed significantly, with the instantaneous luminosity increasing by a factor of 15 and the average number of interactions per bunch crossing (pile up) reaching 20. In early 2012 the LHC running conditions rapidly evolved to produce record instantaneous luminosities and pile-up. The efficient performance of the ATLAS trigger has therefore been vital to maintain the recorded data rate within its limits while minimising the loss of signal events.

The ATLAS trigger is hardware based at Level-1 and uses software algorithms running on a farm of commercial processors in the Level-2 and Event Filter levels of the Higher Level Trigger (HLT). A large fraction of the ATLAS physics programme is covered by the calorimeter-based triggers, which can select events with candidate electrons, photons, jets, taus or those with large missing transverse energy. The ATLAS Level-1 Calorimeter trigger receives input from ATLAS' main calorimeters and determines the calibrated energies sent to the algorithmic trigger processors that identify the high-ET physics objects and global energy sums. The reconstruction at Level-2 is then seeded by the Level-1 result in Regions of Interest (ROIs) and the HLT calorimeter-based software algorithms perform the selection of electrons, photons, jets, taus and also events with missing transverse energy using all available detector data.

We present the performance of the L1 calorimeter trigger hardware and the HLT selection algorithms in 2011 and 2012, highlighting the achievements of the different signatures. For electrons and photons, at Level-1 the thresholds have been raised and configured separately in various rapidity regions to account for energy losses in the upstream material and hadronic isolation requirements have been implemented. At the HLT many variables from the calorimeters and tracking detectors have been tuned to achieve both high efficiency and background rejection. For the jet trigger, the ROI-based strategy has been extended with the possibility of unpacking the full calorimeter at Event Filter level and even at an intermediate level between Level-1 and Level-2. Additionally the use of calibrated energy scale at trigger level and noise cuts to reduce rate spikes have been introduced. For the tau trigger, topological triggers in combination with the other objects have been developed, notably a hadronic tau trigger to select events where a tau lepton decays into one or more hadrons, a challenge due to the high production rate of multi-jet events.

In summary, this contribution gives an overview of the optimisation and performance of the calorimeter-based triggers, demonstrating the robustness of the trigger system in the high luminosity, high pile-up environment of the LHC in 2011 and 2012.

# TRG-4: Use of Expert System and Data Analysis Technologies in Automation of Error Detection, Diagnosis and Recovery for ATLAS Trigger-DAQ Controls Framework

A. Kazarov<sup>1</sup>, G. Lehmann Miotto<sup>2</sup>, L. Magnoni<sup>2</sup>, A. Corso Radu<sup>3</sup>

<sup>1</sup>Petersburg Nuclear Physics Institute, NRC Kurchatov Institute, Gatchina, Russia

<sup>2</sup>PH, CERN, Geneva, Switzerland

<sup>3</sup>University of California Irvine, California, USA

The Trigger and DAQ (TDAQ) system of the ATLAS experiment is a very complex distributed computing system, composed of O(10000) of applications running on more than 2000 many-core CPUs. The TDAQ Controls system has to guarantee the smooth and synchronous operations of all TDAQ components and has to provide the means to minimize the downtime of the system caused by runtime failures, which are inevitable for a system of such scale and complexity.

During data taking runs, streams of information messages sent or published by TDAQ applications are the main sources of knowledge about correctness of running operations. The huge flow of operational monitoring data produced (with an average rate of O(1-10KHz)) is constantly monitored by experts to detect problem or misbehaviour.

Given the scale of the system and the rates of data to be analysed, the automation of the Control system functionality in areas of operational monitoring, system verification, error detection, diagnosis and recovery is an essential requirement. Automation allows to reduce the operations man power needs and to assure a constant high quality of problem detection and following recovery.

To accomplish its objective, the Controls system includes components which are based on advanced knowledge-base technologies, namely the rule-based expert system (ES) and the complex event processing (CEP) engines. The chosen approach allow to store and to reuse the TDAQ experts' knowledge in the Control framework and thus to assist TDAQ shift crew to accomplish its task.

DVS (Diagnostics and Verification System) and Online Recovery components are responsible for the automation of system testing and verification, diagnostics of failures and recovery procedures. These components are built on top of a forward-chaining ES framework (based on CLIPS expert system shell), that allows to program the behaviour of a system in terms of if-then rules and to easily extend or modify the knowledge base.

The core of AAL (Automated monitoring and AnaLysis) component is a CEP engine (implemented using ESPER in Java) used for correlation and analysis of operational messages and events and for producing operator-friendly alerts, assisting TDAQ operators to react promptly in case of problems or to perform important routine tasks. Design foresees a machine learning module to detect anomaly and problems that cannot be defined in advance.

The described components are constantly used for the ATLAS Trigger-DAQ system operations, and the knowledge base is growing as more expertise is acquired. Deployment of the tools substantially reduced the load on the TDAQ control shift crew and allowed to suppress one of the shifters desks.

The paper presents the design and present implementation of the control automation components and also the experience of their use in a real operational environment of the ATLAS experiment.

#### **TRG-5: Evolution and Performance of the ATLAS Trigger System with p-p Collisions at 7 TeV** T. Kono

1. K0II

IFAE, Barcelona, Spain

During the data taking period from 2009 until 2011, the ATLAS TDAQ system has been used very successfully to collect proton-proton data at LHC centre-of-mass energies between 900 GeV and 7 TeV. The TDAQ system is mostly made of off-the-shelf processing units organized in a farm of 2000 elements. The trigger system is designed in three levels reducing the event rate from the design bunch-crossing rate of 40 MHz to an average recording rate of about 300 Hz. Using custom electronics with input from the calorimeter and muon detectors, the first level rejects most background collisions in less than 2.5 ns. The two following levels are software-based triggers with average decision times of 40 ms and 4 s respectively. The trigger system is designed to select events by identifying muons, electrons, photons, taus, jets, and B hadron candidates, as well as using global event signatures, such as missing transverse energy.

In 2011, the TDAQ system has been operated with an overall efficiency of 94%, while meeting evolving and demanding conditions. With the LHC peak luminosity increase through 2011, the scalability and operational margins in terms of bandwidth and dataflow have been stressed.

During the heavy ion runs in 2011 the system was operated at the limit of the installed computing power, enabling the evaluation of the effectiveness of the current installation and the validation of the operation modeling tools.

We give a description of the system together with the operational experience with an emphasis to the data taking in 2011. We also give an overview of the performance of the different trigger selections. Distributions of selection variables used by the different trigger selection are shown and compared with the offline reconstruction. Examples of trigger efficiencies with respect to offline reconstructed signals are presented and compared to simulation. These results illustrate a very good level of understanding of both the detector and trigger performance. Furthermore, we describe how the trigger selections have evolved with increasing LHC luminosity to cope with the increasing pileup conditions. In addition, driven by the lessons learned from operation, the ATLAS TDAQ system current limitations together with the present strategies being put in place to solve them will also be described.

#### TRG-6: Recent Experience and Future Evolution of the CMS High Level Trigger System

<u>A. C. Spataru</u><sup>1</sup>, G. Bauer<sup>2</sup>, U. Behrens<sup>3</sup>, J. Branson<sup>4</sup>, S. Bukowiec<sup>1</sup>, O. Chaze<sup>1</sup>, S. Cittolin<sup>5,4</sup>, J. A. Coarasa<sup>1</sup>, C. Deldicque<sup>1</sup>, M. Dobson<sup>1</sup>, A. Dupont<sup>1</sup>, S. Erhan<sup>6</sup>, D. Gigi<sup>1</sup>, F. Glege<sup>1</sup>, R. Gomez-Reino<sup>1</sup>, C. Hartl<sup>1</sup>, A. Holzner<sup>4</sup>, L. Masetti<sup>1</sup>, F. Meijers<sup>1</sup>, E. Meschi<sup>1</sup>, R. K. Mommsen<sup>7</sup>, C. Nunez-Barranco-Fernandez<sup>1</sup>, V. O'Dell<sup>7</sup>, L. Orsini<sup>1</sup>, C. Paus<sup>2</sup>, A. Petrucci<sup>1</sup>, M. Pieri<sup>4</sup>, G. Polese<sup>1</sup>, A. Racz<sup>1</sup>, O. Raginel<sup>2</sup>, H. Sakulin<sup>1</sup>, M. Sani<sup>4</sup>, C. Schwick<sup>1</sup>, F. Stoeckli<sup>2</sup>, K. Sumorok<sup>2</sup>

<sup>1</sup>CERN, Geneva, Switzerland

<sup>2</sup>Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
<sup>3</sup>DESY, Hamburg, Germany
<sup>4</sup>University of California, San Diego, San Diego, California, USA
<sup>5</sup>Eidgenossische Technische Hochschule, Zurich, Switzerland
<sup>6</sup>University of California, Los Angeles, Los Angeles, California, USA

<sup>7</sup>FNAL, Chicago, Illinois, USA

The CMS experiment at the LHC uses a two-stage trigger system, with events flowing from the first level trigger at a rate of 100 kHz. These events are read out by the Data Acquisition system (DAQ), assembled in memory in a farm of computers, and finally fed into the high-level trigger (HLT) software running on the farm. The HLT software selects interesting events for offline storage and analysis at a rate of a few hundred Hz. The HLT algorithms consist of sequences of offline-style reconstruction and filtering modules, executed on a farm of 0(10000) CPU cores built from commodity hardware. Experience from the 2010-2011 collider run is detailed, as well as the current architecture of the CMS HLT, and its integration with the CMS reconstruction framework and CMS DAQ. The short- and medium-term evolution of the HLT software infrastructure is discussed, with future improvements aimed at supporting extensions of the HLT computing power, and addressing remaining performance and maintenance issues.

#### **MSP1: Monitoring and Signal Processing 1**

#### Tuesday, June 12 11:05-12:25 Crystal Ballroom MSP1-1: Novel, Highly-Parallel Software for the Online Storage System of the ATLAS Experiment at CERN: Design and Performances T. Colombo<sup>1,2</sup>, W. Vandelli<sup>1</sup>

<sup>1</sup>*PH/ADT*, CERN, Meyrin, Switzerland <sup>2</sup>Dipartimento di Fisica, Universita' di Pavia, Pavia, Italy

The ATLAS experiment observes proton-proton collisions delivered by the LHC accelerator at CERN. The ATLAS Trigger and Data Acquisition (TDAQ) system selects interesting events online in a three-level trigger system in order to store them at a budgeted rate of several hundred Hz, for an average event size of  $\sim$ 1.2 MB.

This paper focuses on the TDAQ data-logging system and in particular on the implementation and performance of a novel software design, reporting on the effort of exploiting the full power of recently installed multi-core hardware. In this respect, the main challenge presented by the data-logging workload is the conflict between the largely parallel nature of the event processing, including the recently introduced online event compression, and the constraint of sequential file writing and checksum evaluation. This is further complicated by the necessity of operating in a fully data-driven mode, to cope with continuously evolving trigger and detector configurations.

The novel SW design is based on a thread pool, implemented in C++ using modern parallel programming tools and techniques, as provided by libraries like TBB(1) and Boost(2). Lock-less patterns, atomic instructions and concurrent containers have been employed to provide an efficient implementation able to cope with the above requirements. In this paper we report on the design of the new ATLAS on-line storage software. In particular we will briefly discuss our development experience using recent concurrency-oriented libraries. We will then concentrate on the results of performance measurements performed on the current data-logging hardware. We will show that, even in the worst workload, the new parallel design is able to compete with the previous single-threaded one, while it is outperforming it in more favourable, realistic workloads. We will as well demonstrate the minimal overhead introduced by the above parallel techniques, considering the whole data-logging software performances with respect to the bare processing speed on the same hardware. Finally, we will discuss the effects of simultaneous multi-threading technologies, as found on recent CPUs. The data-logging operation in fact, mixing data processing and I/O, allows to efficiently exploit the features provided by these technologies.

(1) http://threadingbuildingblocks.org/ (2) http://www.boost.org/

#### MSP1-2: Advanced Visualization System for Monitoring the ATLAS TDAQ Network in Real-Time

S. Batraneanu<sup>1</sup>, D. Campora Perez<sup>2</sup>, B. Martin<sup>3</sup>, D. Savu<sup>3</sup>, S. Stancu<sup>3</sup>, L. Leahu<sup>4</sup>

<sup>1</sup>University of California, Irvine, Irvine, United States

<sup>2</sup>University of Seville, Seville, Spain

<sup>3</sup>CERN, Geneva, Switzerland

<sup>4</sup>Politehnica University Bucharest, Bucharest, Romania

The data acquisition system of the ATLAS experiment at CERN comprises 2500 servers (most of them multi-homed) interconnected by three Ethernet networks, totaling 250 switches. Due to its real-time nature, there are additional speed and performance requirements in comparison to conventional networks. Health and performance monitoring tools to gather historical and real-time statistics are needed to understand the systems behavior, to ensure proper operation and to perform post-mortem troubleshooting. Also, in order to maintain a complete system view, the information cannot be restricted to just network statistics but must include parameters such as environmental statistics and data-taking variables

A comprehensive monitoring framework has been developed for expert use. A 2D web-based interface offers an intuitive display for historical statistics analysis. However, non experts may experience difficulties in using it and interpreting data. Moreover, specific performance issues, such as component saturation or unbalanced workload, need to be spotted with ease, in real-time, and understood in the context of the full system view.

We addressed these issues by developing an innovative visualization system where the users benefit from the advantages of 3D graphics to visualize the large monitoring parameter space associated with our system. This has been done by developing a hierarchical model of the system onto which we overlaid geographical, logical and real-time monitoring information.

In order to easily understand the current state of subsystems and how they interact, the system performs bottom-up statistics aggregation and error propagation. Status information is color-coded based on predetermined thresholds which vary for each subsystem. At every aggregation layer, historical statistics plots are available to track recent component behavior.

All navigation mechanisms available in advanced 3D systems, such as navigation paradigms and predefined viewpoints, have been adapted to the TDAQ system particularities, hence providing predictable user interaction scenarios. This allows the user to reach any desired information in a few clicks. Visualizing thousands of objects which are frequently updated makes the 3D scene rendering a challenging task. Low-level optimizations were performed in order to reach the optimal frame rate. A full scene update is very disruptive for navigation, and therefore the scene is updated progressively using a mechanism based on object visibility. The Level of Detail technique is used to hide expensive details and speed up the rendering.

This article briefly describes the network monitoring framework and introduces the system's visualization challenges. The functionality, design and implementation of the 3D visualization system are then described in detail, with a focus on the model design, user interaction, navigation mechanisms and methods used to achieve scalability when rendering many objects in real-time.

#### MSP1-3: A High-Throughput Platform for Real-Time X-Ray Imaging

S. A. Chilingaryan<sup>1</sup>, M. Vogelgesang<sup>1</sup>, T. dos Santos Rolo<sup>2</sup>, A. Mirone<sup>3</sup>, A. Kopmann<sup>1</sup> <sup>1</sup>IPE, Karlsruhe Institute of Technology, Karlsruhe, Germany

<sup>2</sup>ISS, Karlsruhe Institute of Technology, Karlsruhe, Germany

<sup>3</sup>European Synchrotron Radiation Facility, Grenoble, France

X-ray tomography has been proven to be a valuable tool for understanding internal, otherwise invisible, mechanisms in biology, materials research and other fields. Detectors employed at modern synchrotrons are able to deliver images with high resolution and at high frame rates generating up to several gigabytes per second. The ability to process this information in real-time and present to the users without long processing delays is extremely important for synchrotron operation. It will increase experiment throughput and enable image-based control of dynamical processes under study. We have developed a GPU-based platform for high speed tomography optimized for continuous operation with streamed data. The core part of our system is a parallel processing framework. As first application imaging techniques for high speed tomographic reconstruction have been implemented. The framework operates with streamed data that is either coming directly from the camera or from pre-recorded sequences. Image processing algorithms are implemented as pipelines of filter nodes. These filters are implemented using OpenCL and are able to be executed on multiple CPUs or/and GPUs. Optimized code for specific hardware platforms is possible if necessary The framework abstracts details of underlying hardware, cares for memory management, and provides several methods to simplify parallel programming. Access to a variety of high speed cameras is standardized by a camera abstraction layer (libuca). Linux support for scientific cameras from pco are included. The storage engine uses special features of modern file systems to enhance performance of data streaming. The framework is implemented in pure C using Glib object model. Automatic bindings for multiple scripting languages are provided by the interospection interface of GObject. Only a few lines of python code are required to instantiate a chain of filters, set the parameters, and start continuous data processing. A first prototype of our system is currently evaluated at the tomography beamline of KIT's synchrotron ANKA. Using only a single GPU server we are able to handle the full throughput of CameraLink interface with 850 MB/s. The complete data flow from high-speed cameras to the data storage will be discussed with special focus on I/O, integration with the Tango control bus, hardware platform, and optimal usage of all available computing resources. An efficient GPU-implementation of filtered back projection will be presented and differences of implementations for different GPU architectures will be highlighted. Finally, we will report on our ongoing effort to extend the parallel processing framework to heterogeneous clusters with CPUs and GPUs.

#### MSP1-4: Architecture and Operation of the Control System for ALICE Detector at CERN

P. Chochula

PH/AID, CERN, Geneva, Switzerland

The control systems developed for large high energy physics experiments usually do not have strict real time constraints. On the other hand, the correct functionality of detector electronics cannot be achieved without a reliable control system. Different architecture and operational principles of fast front-end electronic modules and complex SCADA systems create a challenge for overall design and integration. In the presentation we describe the Detector Control System of ALICE one of the 4 LHC experiments at CERN. Eighteen sub-detectors based on different technologies, control architectures and operational requirements need to be integrated into a coherent system allowing for centralized operation. We describe the strategy used in ALICE with focus on the operation of front-end electronics. A standardized software abstraction level allows for unified operation of modules, hiding the complexity of the underlying hardware architecture and represents the different front-ends as devices reacting to the same set of commands. We describe the software implementation of this approach and its integration into the central system. On vertical slices of the system we explain the implementation principles covering the full chain from the configuration database to device registers. The read-back data carries information on the device status and conditions. It provides feedback to the operators as well as to the monitoring and data analysis software. We explain how the data flow is organized in ALICE and how it is used to allow for operation of millions of readout channels by [a ]single operator in a smooth and efficient way.

#### **MSP2: Monitoring and Signal Processing 2**

#### Tuesday, June 12 13:40-14:40 Crystal Ballroom

### MSP2-1: artdaq: An Event Filtering Framework for Fermilab Experiments

K. Biery, C. Green, J. Kowalkowski, M. Paterno, R. Rechenmacher

Scientific Computing Division, Fermi National Accelerator Lab, Batavia, Illinois, United States

Several current and proposed experiments at the Fermi National Accelerator Laboratory have novel data acquisition needs. These include (1) continuous digitization, using commercial high-speed digitizers, of signals from the detectors, (2) the transfer of all of the digitized waveform data to commodity processors, (3) the filtering and/or compression of the waveform data, and (4) the writing of the resultant data to disk for later, more complete, analysis.

To address these needs, members of the Accelerator and Detector Simulation and Support Department within the Scientific Computing Division at Fermilab have chosen to use parallel processing technologies in the development of a generic data acquisition toolkit, artdaq. The artdaq toolkit uses MPI (Message Passing Interface) and art, an established common event framework for Intensity Frontier experiments. In an artdaq program, the digitized data are transferred into processors nodes using commodity PCIe cards, event fragments are combined into complete events using MPI, and filtering and compression algorithms are run on the data using art. To test the toolkit, a cluster of five 32-core high-performance computing nodes has been assembled and connected with a QDR InfiniBand network. Initial testing of data throughput shows event building rates in excess of 1.5 GB/s.

This paper describes the architecture and implementation of the first phase of the artdaq toolkit and shows early performance results with configurations that match upcoming experiments such as Mu2e, uBooNE, and DarkSide50.

# MSP2-2: FPGA/NIOS Implementation of an Adaptive FIR Filter Using Linear Prediction to Reduce Narrow-Band RFI for Radio Detection of Cosmic Rays

Z. Szadkowski<sup>1</sup>, D. Fraenkel<sup>2</sup>, A. M. van den Berg<sup>2</sup>

<sup>1</sup>Department of Physics and Applied Informatics, University of Lodz, Lodz, Poland <sup>2</sup>Kernfysisch Versneller Instituut, University of Groningen, Groningen, Netherlands

We present the FPGA/NIOS implementation of an adaptive FIR filter based on linear prediction to suppress radio frequency interference (RFI). This technique will be used for experiments that observe coherent radio emission from extensive air showers induced by ultra-high-energy cosmic rays. These experiments are designed to make a detailed study of the development of the electromagnetic part of air showers. Therefore, these radio signals provide information that is complementary to that obtained by water-Cherenkov detectors which are predominantly sensitive to the particle content of an air shower at ground. The radio signals from air showers are caused by the coherent emission due to geomagnetic and charge-excess processes. They can be observed in the frequency band between 10 - 100 MHz. However, this range in frequencies is significantly contaminated by narrow-band RFI and other human-made distortions. A FIR filter implemented in the FPGA logic segment of the front-end electronics of a radio sensor significantly improves the signal-to-noise ratio. In this paper we discuss an adaptive filter, which is based on linear prediction. The coefficients for the linear predictor are dynamically refreshed and calculated in the NIOS processor, which is implemented in the same FPGA chip. The Levinson recursion, used to obtain the filter coefficients, is also implemented in the NIOS and is partially supported by direct multiplication in the DSP blocks of the logic FPGA segment. We will show that tests confirm that the linear predictor can be an alternative to other methods involving multiple time-to-frequency domain conversions using an FFT procedure. These multiple conversions draw heavily on the power consumption of the FPGA and are avoided by the linear prediction approach. The FIR filter has been successfully tested in the Altera development kit with the EP4CE115F29C7 from the Altera Cyclone IV family at a 150 MHz sampling rate, a 12-bit I/O resolution, and an internal 39-bit dynamic range. Most of the slow floating-point NIOS calculations have been moved to the FPGA logic segments as extended fixed-point operations, which significantly reduced the refreshing time of the coefficients used in the linear prediction.

#### MSP2-3: FPGA-Based Algorithm for Center of Gravity Calculation of Clustered Signals

<u>A. A. Ushakov<sup>1</sup></u>, B. Mindur<sup>2</sup>, T. Fiutowski<sup>2</sup>, C. Schulz<sup>1</sup>, F. Winklmeier<sup>1</sup>

<sup>1</sup>Detector laboratory G-A1, Helmholtz-Zentrum Berlin, Berlin, Germany <sup>2</sup>Faculty of Physics and Applied Computer Science, AGH University for Science and Technology, Krakow, Poland

The acquisition system for a neutron detector consisting of 157Gd-CsI converter and a Micro-Strip Gas Chamber has been developed earlier [1]. For the prototype it incorporates 4 ASICs able to process 128 stripes of one detector coordinate delivering analog signals and digitized timing of incoming events. Further, analog signals are digitized on an additional board and are there aligned to an external clock by a programmable delay line. One neutron can create signals on 3-5 detector strips, thus the main task is to identify such clusters and determine their center of gravity. The center of gravity calculation takes place in a FPGA allowing real time processing and visualization. The data for calculation are arranged in 4 FIFOs with individual buffers each assigned to an ASIC. A cluster is identified by the following rules: signals have to have neighboring coordinates within a maximum distance of 5 strips and having a time stamp within a period of maximum three clock cycles. Due to the nature of the token-ring buffer the ASIC output is not well-ordered in time; however signals are unambiguously identifiable by their time stamp. The algorithm is processing incoming signals on-the-fly and fills a buffer with event information. The algorithm examines one of four cluster events associated with one ASIC for time expiration on the first clock cycle. Then completed clusters are eliminated from buffers on the second cycle. Finally the new signal is merged with an already identified cluster or a new one is opened. Therefore, the algorithm has to perform with tripled rate relatively to FPGA input buffer frequency. Moreover, it processes the case when a new signal fits in between two existing clusters and merges all to a single cluster. The center of gravity calculation is done simultaneously with the previously mentioned sorting algorithm. Right after the cluster is completed its called an event and coordinate data are sent to the host PC via Ethernet for real time visualization and storage. The arithmetic core involves DSP resources like adders, accumulators and dividers which consume significant resources of a Virtex-5 FPGA and limits the performance. The algorithm processes 4 data streams from 4 ASICs and treats also events happening at the edges of adjacent ASICs. The algorithm delivers a calculated position resolution of 0.08 mm compared to a detector strip pitch of 0.635 mm. [1] Alimov, S. S. et al., "Development of very high rate and resolution neutron detectors with novel readout and DAQ

hard- and software in DETNI," Nuclear Science Symposium Conference Record, 2008. NSS '08. IEEE , pp.1887-1900, 19-25 Oct. 2008 doi: 10.1109/NSSMIC.2008.4774759

#### MO3: Mini-orals 3

#### Tuesday, June 12 14:40-15:30 Crystal Ballroom PS2-3: A MAC Layer Congestion Control Method to Achieve High Network Performance for EAST Tokamak

J. Luo<sup>1</sup>, <u>K. Shi<sup>2,3</sup></u>, Y. Shu<sup>3</sup>, S. Lin<sup>2</sup> <sup>1</sup>Institute of Plasma Physics, Academia Sinica, Hefei, China <sup>2</sup>Tianjin University of Technology, Tianjin, China <sup>3</sup>Tianjin University, Tianjin, China

Many applications would require fast data transfer in Wireless Local Area Networks (WLANs). A representative example is that EAST experiment data are retrieved by some physics researchers using the Transmission Control Protocol (TCP). However, due to the high contention degree and the high error rate in wireless networks, the packets may be loss for wireless reasons but not for congestion. This will greatly degrade the TCP performance. On one hand, the wireless packet loss is not congestion, but the traditional TCP assumes that every packet drop is congestion and thus decreases its congestion window, which will degrade its performance. On the other hand, due to the MAC layer retransmission policy employed by the IEEE 802.11 DCF mechanism, the lost packets at the MAC layer will be retransmitted for some times. Thus the waiting time of the packets in the MAC layer queue will be increased. So if we ignore all the packet loss for wireless reasons as the other improved mechanisms do, the network work congestion will be aggravated and its performance will be degraded. To alleviate the impact of the wireless packet loss to TCP in WLANs, this paper proposes a MAC layer congestion control method which is implemented at the end wireless nodes based on IEEE 802.11b DCF mechanism. At first, we propose a concept of MAC layer congestion window which means the MAC layer will send all the packets in a window when it gets access to the wireless channel, other than just sends only one packet as the traditional DCF mechanism does. Then our congestion control mechanism adjusts the MAC layer congestion window based on the contention degree and the MAC layer packet loss rate. If the MAC layer contention degree or packet error rate is high, we will increase the congestion window to improve the successful transmission rate, and we will decrease the congestion window when the packet loss rate is lower than the average wireless packet loss rate. We also use a threshold to control the increase of the congestion window. The threshold is set according to the number of wireless nodes. By performing wireless congestion control at the MAC layer, our mechanism can mitigate the effect of wireless loss to TCP, and therefore improve the TCP performance. The simulation and experiment results show that our mechanism can have better performance than traditional MAC layer mechanisms in WLANs.

### PS2-4: High Performance Event Building with InfiniBand Network in CBM Experiment

S. Linev

GSI Helmholtzzentrum fuer Schwerionenforschung, Darmstadt, Germany

The main challenge of CBM (Compressed Baryonic Matter) experiment at FAIR (Facility for Antiproton and Ion Research, Darmstadt, Germany) will be measuring of rare (1E-6 - 1E-9) probes at high (1E7 1/s) interactions rate. Due to complex signature of interested events front-end electronics will measure all signals in self-triggered mode and push data to the FLES (First Level Event Selection) computing farm - the first place where event selection decision could be done. The central part of the FLES will be high-performance network fabric, which should sort and distribute 1 TB/s of original data over the computing nodes in real time.

InfiniBand is high-throughput low-latency interconnect technology with low CPU consumption and, that is also very important, with affordable prices. Since several years InfiniBand is considered as most probable candidate for usage in CBM FLES. A number of tests were performed verifying throughput capabilities of InfiniBand fabric for the traffic patterns, expected in the FLES. Resent tests were performed on LOEWE-CSC cluster (http://csc.uni-frankfurt.de), which consists of about 800 nodes, equipped with QDR InfiniBand host adapters and connected via half fat-tree switch fabric.

Main approach of the performed tests was scheduling of data transfer in the way that no conjunctions are produced in the network. The crucial point on this way - explore and use fabric interconnect topology, where many physical paths between two nodes are exist. To allow exact timing of data transfers time synchronization between computing nodes with sub-microseconds precision was required. Dedicated software was implemented to execute such scheduled data transfer showing very promising results. Designed code will be integrated into the next version of the DABC (http://dabc.gsi.de) - general-purpose framework for DAQ software development.

#### PS2-5: Modulator-Based, High Bandwidth Optical Links for HEP Experiments

W. S. Fernando<sup>1</sup>, R. W. Stanek<sup>1</sup>, <u>D. G. Underwood<sup>1</sup></u>, D. Lopez<sup>2</sup>

<sup>1</sup>High Energy Physics Division, Argonne National Lab, Argonne, Il, United States <sup>2</sup>Center for Nanoscale Materials, Argonne National Lab, Argonne, Il, United States

Optical links will be an integral part of intelligent tracking systems at various scales from coupled sensors through intra-module and off detector communication. These links will be particularly useful if they utilize light modulators which are very small, low power, high bandwidth, and are very radiation hard. As a concern with the reliability, bandwidth and mass of future optical links in LHC experiments, we are investigating CW lasers and light modulators as an alternative to VCSELs.

We have constructed a test system with 3 such links, each operating at 10 Gb/s. We present the quality of these links (jitter, rise and fall time, BER) and eye mask margins (10GbE) for 3 different types of modulators: LiNbO3-based, InP-based, and Si-based. We present the results of radiation hardness measurements with up to ~10^12 protons/cm^2 and ~65 krad total ionizing dose (TID), confirming no single event effects (SEE) at 10Gb/s with either of the 3 types of modulators.

We have used a Si-based photonic transceiver to build a complete 40 Gb/s bi-directional link (10 Gb/s in each of four fibers) for a 100m run and have characterized it to compare with standard VCSEL-based optical links. Some future developments of optical modulator-based high bandwidth optical readout systems, and applications based on both fiber and free space data links, such as local triggering and data readout and trigger-clock distribution, are also discussed.

# PS2-9: A High Density Time-to-Digital Converter Prototype Module for BESII End-Cap TOF Upgrade

<u>H. Fan</u><sup>1,2</sup>, C. Feng<sup>1,2</sup>, W. Sun<sup>1,2</sup>, C. Yin<sup>1,2</sup>, S. Liu<sup>1,2</sup>, Q. An<sup>1,2</sup>

<sup>1</sup>State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei, Anhui, China <sup>2</sup>Department of Modern Physics, University of Science and Technology of China, Hefei, Anhui, China

A high precision and high density time-to-digital converter (TDIG) module is described in this paper. The end-cap time-of-flight (ETOF) of a Beijing Spectrometer (BESIII) will be upgraded to improve its total time resolution. After upgrade, ETOF will be built using Multigap Resistive Plate Chambers (MRPC) and will have 1728 readout channels. The readout electronics must achieve high density due to the huge readout channel number. The time resolution of the readout electronics is required to be better than 25 ps, and the time resolution of the TDC is required to be better than 20ps.

A 9U VME module with TDC function is designed for the ETOF upgrade. The signals, which are produced by the detector and then amplified and discriminated by the frontend electronics, are sent to the TDIG module to be digitized. The TDIG module uses the CERN HPTDC technique to achieve high precision time-to-digital converter. Each module applies nine HPTDC chips, which are programmed into the very high resolution mode, to realize 72 time measurement channels. The primary hit measurements from HPTDC chips are forwarded to one Cyclone field-programmable gate array (FPGA) to be processed. Finally, the data are sent to DAQ server by Ethernet. The VME interface logic is implemented in an Altera CPLD. The TDIG module can also accept an external trigger and pick out the expected events which are correlated with the given trigger. A series of experiment tests show that the time resolution of the TDIG module is better than 20 ps.

#### PS2-10: Development of White Rabbit Interface for Synchronous Data Acquisition and Timing Control

Q. Du, G. Gong, W. Pan, H. Lu Tsinghua University, Beijing, China

In large scale physical experiments such as Large High Altitude Air Shower Observatory (LHAASO), sub-nanosecond accuracy timing distribution is required for thousands of detector DAQ frontends. The recent advances in White Rabbit Protocol (WR) provide a novel solution for such synchronous data acquisition application. We demonstrate a compact design of WR slave in FMC(FPGA Mezzanine Card) format that could work as a Giga-bit Ethernet interface to each detector readout circuit, and provide stabilized frequency distribution and timestamp synchronization using data link.

#### PS2-18: Data Formatter System for the ATLAS Fast TracKer

J. Olsen<sup>1</sup>, T. Liu<sup>1</sup>, B. Penning<sup>1</sup>, H. L. Li<sup>2</sup> <sup>1</sup>Fermi National Accelerator Laboratory, Batavia, Illinois, United States

<sup>2</sup>University of Chicago, Chicago, Illinois, United States

Collisions in the LHC occur with an instantaneous luminosity of 1E34 cm<sup>-2s<sup>-</sup>1</sup>. The ATLAS detector trigger system must reject a vast majority of these events, and only 200 events per second can be stored for later analysis. Instantaneous luminosity is expected to increase to 3E34 with an average of 75 proton-proton interactions per crossing. Under these conditions the existing ATLAS trigger is strained and the need for a tracking trigger is clear. The Fast Tracker (FTK) upgrade adds a hardware based level-2 track trigger to the ATLAS DAQ system. Complications arise from the fact that the ATLAS inner detector was not designed for track triggering. Inner detector modules are not organized into the symmetric eta-phi towers that the track finder algorithms require.

The FTK system requires a Data Formatter hardware layer to perform data compression, remapping and repackaging of inner detector hits. The Data Formatter hardware accepts data delivered over fiber links from over 200 Readout Drivers (ROD). The first stage involves compressing pixel hits using an FPGA-based 2D clustering algorithm. Pixel clusters and SCT strip hits are then exchanged between Data Formatter boards, repackaged and sent downstream to the FTK core processing crates.

Prior to settling on any particular hardware platform, the Data Formatter system was extensively simulated using high-level tools written in C++ and Python. A balance between board density, physical I/O limitations, and backplane complexity was sought, and achieved, with the baseline design being comprised of 32 boards with up to eight fiber link inputs per board. While the simulation tools determined the ideal arrangement of crates, boards and input fibers, it quickly became clear that the number of data paths between board was irregular. A given Data Formatter board must exchange data with between eight and nineteen other boards. Designing specific inter-board connections into a custom backplane was possible but a hard-wired backplane severely limits the possibility of future expansion. The Advanced Telecommunications Computing Architecture (ATCA) full mesh backplane provides the ideal hardware solution: every board in the shelf can exchange data directly over high speed serial interconnects. The baseline design is comprised of four 14-slot ATCA shelves; each shelf contains eight Data Formatter boards and six empty slots are available for future expansion. Data Formatter boards are hardware-based, incorporating large FPGAs with many multi-gigabit SERDES components. While the Data Formatter is designed for a level-2 trigger, the architecture also lends itself to scalable, high performance level-1 trigger systems.

ATCA hardware is designed for high availability with particular emphasis on redundant power and a robust management interface. The Data Formatter is our first design to target the ATCA platform and this paper chronicles our design process from conception to first prototype.

#### PS2-20: Commissioning and Performance of a Fast Level-2 Trigger System at VERITAS

B. Zitzer<sup>1</sup>, A. Weinstein<sup>2</sup>, M. Schroedter<sup>3</sup>, M. Orr<sup>3</sup>, M. Oberling<sup>1</sup>, A. Kreps<sup>1</sup>, F. Krennrich<sup>2</sup>, G. Drake<sup>1</sup>, K. Byrum<sup>1</sup>, <u>J. T. Anderson<sup>1</sup></u> <sup>1</sup>HEP Divison, Argonne National Laboratory, Lemont, IL, United States <sup>2</sup>Iowa State University, Ames, IA, United States <sup>3</sup>Smithsonian Astrophysical Observatory, Amado, AZ, United States

We have built a new three-stage FPGA-based high-speed camera-level pattern trigger for VERITAS, an array of ground-based imaging atmospheric Cherenkov telescopes (IACTs) located in Arizona. This trigger has the ability to recognize patterns of Cherenkov light generated

by atmospheric air showers initiated by incident extra- terrestrial gamma rays. The new trigger has programmable coincidence recognition timing and programmable delay compensation over 499 pixel channels in an IACT camera. Measurement of and compensation for system timing variations is achieved through the use of an FPGA-based time-to-digital converter (TDC) and FPGA-based programmable delay elements. The trigger pattern is the coincidence of any three adjacent pixels within the camera. Night-sky background is suppressed as the ratio of the squares of the coincidence gate widths. The tighter coincidence width achieved by the new system therefore permits operation at lower discriminator threshold. The new trigger has now been successfully installed on all four of the IACTs of VERITAS, replacing the previous system. We present measurements of the performance of this new trigger in comparison with that of the previous system and of the effect of the new trigger upon overall array performance.

#### **PS2-21:** The ATLAS Hadronic Tau Trigger

C. Cuenca Almenar

Department of Physics, Yale University, New Haven, Switzerland

Hadronic tau decays play a crucial role in the search for physics beyond the Standard Model as well as in Standard Model measurements. However, hadronic tau decays are difficult to identify and trigger on due to their resemblance to QCD jets. Given the large production cross section of QCD processes, designing and operating a trigger system with the capability to efficiently select hadronic tau decays, while maintaining the rate within the bandwidth limits is a difficult challenge.

The ATLAS trigger is a complex system, structured in three level, each of them accessing more precise information, having more allocated time and running more sophisticated algorithms. These algorithms not only have to reconstruct and identify hadronic tau products very fast, but they also need to reject backgrounds to keep the output rate of the trigger in the allocated bandwidth.

This contribution will summarize the status and performance of the ATLAS tau trigger system during the 2011 data taking period, and the upgrades put in place for the current 2012 run. Special emphasis will be placed on the key role of identification and rejection capabilities of the different sub-detectors of ATLAS and the algorithms used. Finally, first results and prospects on the performance in 2012 will be presented.

### PS2-24: Multifunction-Timing Card ITTEV2 for CoDaC Systems of Wendelstein 7-X

J. Schacht<sup>1</sup>, J. Skodzik<sup>2</sup>

<sup>1</sup>CoDaC/Machine Control, Max-Planck-Institute for Plasmaphysics, Greifswald, Germany <sup>2</sup>Institute for Applied Microelectronic, University Rostock, Rostock, Germany

The timing system is a crucial element for the CoDaC (Control, Data Acquisition and Communication) system of the steady state fusion experiment Wendelstein 7-X (W7-X). Its main task is the synchronization of all clocks with sufficient accuracy. Furthermore, it is able to send, receive, and process event messages and to offer a wide range of time related functions, e.g., time capturing, pulse generation, realization of time delays, and sending and receiving of trigger signals. The overall timing system consists of a central timing system and a considerable number of local timing systems. Most of the technical systems like heating system, power supplies, gas inlet, and all diagnostic systems include a local timing system in a so called control station. Until now, there exist two different types of local timing systems: the local Trigger Time Event card (ITTEV1) for control stations with real time requirements and the local Time to Digital Converter card (TDC) for control stations used for data acquisition. Both card types have a standard parallel PCI or cPCI bus interface. A revision of the ITTEV1 and TDC cards is necessary as many components used for their fabrication are no longer available. Furthermore, the state-of-the-art bus interface is the serial PCI bus. The need for a new bus interface with long term availability has led to the decision to use a GBit Ethernet interface. It will connect the new TTE card (ITTEV2), the successor of the ITTEV1 and TDC, with a host PC. Additionally, DDR3 memory is integrated to allow for the realization of high-resolution time capture processes. By choosing a more powerful FPGA device (Xilinx Virtex 6), it was possible to increase the time resolution by a factor of two. Starting with a short introduction of the W7-X timing system, this contribution describes the key properties, all extended as well as new features of the ITTEV2 card to face new requirements regarding data acquisition. The actual state of the development is given.

#### PS2-26: HAWC TeV Gamma Ray Observatory Trigger System

M. DuVernois

University of Wisconsin, Madison, WI, United States

The High Altitude Water Cherenkov (HAWC) experiment is currently under construction at 4100m above sea level in a valley between Sierra Negra and Orizaba near Puebla, Mexico. The experiment is intended as an all-sky TeV gamma ray observatory with a significant temporal and data overlap with the Fermi gamma ray satellite observatory. The detector array will consist of 300 water tanks instrumented with 1200 photomultiplier tubes sensitive to cosmic ray airshowers (background) and electromagnetic showers from primary gammas (signal). We present here the design, implementation, and performance of the FPGA-based digital trigger system for the HAWC experiment. It performs majority logic on 1200 channels of time-over-threshold (ToT) data and compares the number of tubes above threshold as a function of time over scales from 25ns to 1000ns. The trigger is implemented in Altera FPGAs along with data simulators for testing of the data acquisition readout which is centered on 1200 channels of multihit time to digital conversion.

### **PS2-27:** Development of the Control Card for the Digitizers of the Second Generation Electronics of AGATA

D. Barrientos<sup>1,2,3</sup>, V. Gonzalez<sup>3</sup>, M. Bellato<sup>2</sup>, A. Gadea<sup>1</sup>, D. Bazzacco<sup>2</sup>, J. M. Blasco<sup>3</sup>, D. Bortolato<sup>2</sup>, F. J. Egea<sup>1,3</sup>, R. Isocrate<sup>2</sup>, A. Pullia<sup>4</sup>, G. Rampazzo<sup>2</sup>, E. Sanchis<sup>3</sup>, A. Triossi<sup>2</sup>

<sup>1</sup>Instituto de Fisica Corpuscular (CSIC-UV), Valencia, Spain

<sup>2</sup>Istituto de Fisica Nucleare (INFN), Sezione di Padova, Padova, Italy

<sup>3</sup>Departamento Ingeniera Electronica, Universitat de Valencia, Valencia, Spain

<sup>4</sup>Istituto de Fisica Nucleare (INFN), Sezione di Milano, Milano, Italy

The Advanced GAmma Tracking Array (AGATA) is a last generation gamma-ray spectrometer composed of segmented High-Purity Germanium (HPGe) detectors that performs Pulse Shape Analysis (PSA) and gamma-ray tracking techniques in order to get high efficiency and resolution. For that purpose, an accurate determination of the energy, time and position of every interaction within the detector volume is required, which is implemented with a concurrent digitization at 100 Msamples/s of each 36-fold detector crystal of the array. For the present, a fully operational system for the electronics is currently acquiring data during the experimental campaigns. However, quick improvements in electronic devices make possible to redesign the system, preserving specifications, but gaining in compactness, power compsumption and costs. In this work, the novel control card for the digitizers boards of the system is presented. The unit is charged to communicate with the pre-processing electronics, through four optical links, and with four digitizer units, through a custom backplane. From the optical links, the unit receives the sampling clock from the Global Trigger and Synchronization (GTS) system. Another two bidirectional optical links are provided for latency measurements and slow control purposes. The aim of this board is to receive the clock, to clean it and to broadcast it with the same latency to four digitizer units. It has also to broadcast the signals for measuring the latency, as well as the slow control signals needed to control each digitizer unit.

In order to perform the tasks described previously, the card mounts a Spartan-6 Field Programmable Gate Array (FPGA), from Xilinx. Ethernet, mini-USB and SMB connectors have been added for the use of the card without the optical interface. The design and qualification processes for the card are presented in this work, including a detailed description of the design, simulation and

### performance tests.

### PS2-29: Evolution and Performance of Electron and Photon Triggers in ATLAS in the Year 2011

A. Tricoli<sup>1</sup>, T. Kono<sup>2</sup>, <u>V. Solovyev<sup>3</sup></u> <sup>1</sup>CERN, Geneva, Switzerland <sup>2</sup>DESY, Hamburg, Germany <sup>3</sup>B.P. Konstantinov Petersburg Nuclear Physics Institute, Leningrad, Russia

The electron and photon triggers are among the most widely used triggers in ATLAS physics analyses.

In 2011, the increasing luminosity and pile-up conditions demanded higher and higher thresholds and the use of tighter and tighter selections for the electron triggers. Optimizations were performed at all three levels of the ATLAS trigger system. At the high-level trigger (HLT), many variables from the calorimeters and tracking detectors are used to achieve high efficiency and large rejection power. At L1, the thresholds were raised and optimised to account for \$\eta\$-dependence and hadronic isolation was implemented.

In addition to physics triggers, dedicated triggers for collecting a large number of control samples of J/psi->ee, W->enu and jet background, for calibration, efficiency and fake rate measurements were developed.

This contribution summarizes the algorithms and performance of ATLAS electron and photon triggers used in 2011 data taking.

## **PS2-2:** Low Power, Accurate Time Synchronization MAC Protocol for Real-Time Wireless Data Acquisition

J. Zhang<sup>1,2</sup>, J. Wu<sup>1,2</sup>, Z. Han<sup>1,2</sup>, L. Liu<sup>1,2</sup>, K. Tian<sup>1,2</sup>

<sup>1</sup>Department of modern physics, University of Science and Technology of China, Hefei, Anhui, China <sup>2</sup>State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei, Anhui, China

The issues of real-time wireless data acquisition, how to design and manage thousands of sensor nodes located over a vast geographical area.

have received much attention during the last years. Energy efficiency, time synchronization and other requirements to support real time processing make a great challenge for such system. This paper proposes a real-time wireless data acquisition MAC protocol meeting the requirements of high throughput, low end-to-end latency, low energy consumption and accurate time synchronization. A hybrid approach, combining the advantages of Time Division Multiple Access (TDMA) and Frequency Hopping Spread Spectrum (FHSS), is adopted for anti-jamming and collision prevention. The hopping sequences of FHSS are carefully selected to reduce interference to a minimum. The packets of commands and data are delivered in a "bucket brigade"-like manner for optimum bandwidth utilization and low end-to-end latency. Experiments show that the bandwidth utilization exceeds 25%, and the mean latency per hop is kept to 8 to 9 milliseconds. Correspondingly, we propose a two-step time synchronization approach to balance synchronization performance versus energy consumption. First, the microcontroller of sensor node uses its internal low-power and low-precision RC oscillator to setup and manage the network connection. The frequency difference between local clock and parent's clock is estimated by periodically receiving time-stamped beacons. Then, when about to start acquisition, the sensor nodes enable the phase-locked loog (PLL) circuits, using microcontroller's counter, software low pass filter, digital-to-analog converter (DAC) and voltage-controlled crystal oscillator (VCXO) to generate the low-jitter synchronous clock. At the end of acquisition, the sensor nodes go back to the first step by turning off the PLL circuits to save power. In this way, the average per-hop synchronization (RBS) and Timing-sync Protocol for Sensor Networks (TPSN).

#### PS2-6: Waveform Timing Algorithms with a 5 GS/s Fast Pulse Sampling Module

J. Wang<sup>1,2</sup>, L. Zhao<sup>1,2</sup>, C. Feng<sup>1,2</sup>, Y. Zhang<sup>1,2</sup>, S. Liu<sup>1,2</sup>, Q. An<sup>1,2</sup>

<sup>1</sup>State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei, Anhui, China <sup>2</sup>Anhui Key Laboratory of Physical Electronics, Department of Modern Physics, University of Science and Technology of China, Hefei, Anhui, China

There are several algorithms to extract the arrival time of detectors as the time from a characteristic position on sampled waveforms (e.g., the central gravity of its waveform). In this paper, we first analysis the characteristics of three timing algorithms: digital const fraction discrimination (d-CFD), sliding widow with amplitude-weighted time (SWAWT), and optimal filtering in both time and frequency domain. We then built a fast pulse sampling module with the 4th version of Domino Ring Sampler (DRS4), and verify the timing performance of these algorithms in some of our physics experiments. There are a total of six channels on the module with the sampling rate up to 5GS/s (giga samples per second) per channel. We proved the module is capable of sub-10 ps RMS timing precision at about 5GS/s after applying such strategies as DC offset calibration and uneven sampling interval compensation.

In our evaluation, we first evaluated the timing performances of these algorithms at lab with reconstructed pulse of Multi-gap Resistive Plate

Chamber (MRPC). MRPC signals are generated from a template and distributed to two channels with a constant delay between the channels. The time intervals are derived from the algorithms above with respect to the pulse shape, and the timing performances are all in the range of about 15 ps RMS (10.6 ps RMS per channel). Currently, we setup a reference start with four plastic scintillators (EJ-200), the rise time of which is about 0.7 ns. We will evaluate the timing resolution of the reference start and give our consideration of the optimal timing algorithms.

#### PS2-1: High Performance FPGA-Based DMA Interface for PCIe

H. Kavianipour, S. Muschter, C. Bohm

Department of Physics, Stockholm University, Stockholm, Sweden

We present a data communication suite developed for use in the Track Engine Trigger for the IceCube Neutrino Observatory (South Pole). It is a PCIe-based system implemented in Xilinx FPGAs with a bus master DMA on a 4-lane, generation 2 link. The suite contains DMA controller hardware IPs, test benches, Linux driver and user application for DMA and PIO transfers into memory modules and FIFOs. The Linux driver uses streaming mapping, vector write functionality, race condition controllers, page-wise memory allocation, wait queues and Message Signaled Interrupt (MSI) to facilitate high performance and throughput. The DMA which is based on the Xilinx bus master DMA, produces measured transfer speeds up to 680MB/s (read) and 720MB/s (write) using a HiTech Global Virtex6 board. The hardware has been verified on different platforms with different FPGAs. Besides the original IceCube application, the suite has also been used for the development of readout electronics for particle physics experiments. Other applications are also considered.

#### PS2-12: Real-Time Data Analysis Using the WaveDREAM Data Acquisition System

H. Friederich<sup>1,2</sup>, G. Davatz<sup>1,2</sup>, U. Gendotti<sup>1</sup>, H. Meyer<sup>1</sup>, D. Murer<sup>1</sup>, <sup>1</sup>Arktis Radiation Detectors Ltd, Zurich, Switzerland <sup>2</sup>ETH Zurich, Institute for Particle Physics, Zurich, Switzerland

The WaveDREAM data acquisition (DAQ) system, based on the DRS4 waveform digitizing chip, provides 1-5 Giga-samples per second (GSPS) digitization in a region of interest together with continuous sampling of the input signals at 120 Mega-samples per second (MSPS). In the Field-programmable gate array (FPGA), the 120 MSPS signal can be used to build complex trigger logic and to perform data analysis with hard real-time constraints in the microseconds range. Variable gain amplifiers (VGAs) allow to scale the input signal with 5-17 dB amplification such that it optimally matches the dynamic range of the DRS4 IC. As an application example, the FPGA firmware for this general purpose system has been optimized for the readout of an array of high-pressure 4He fast neutron detectors. For best performance, the VGA amplification has been matched to the photomultiplier (PMT) signals to allow efficient triggering on the event start without sacrificing the ability to count single photoelectrons (SPE). The trigger in the FPGA employs a coincidence logic between the two PMT signals of a detector vessel to effectively filter out PMT dark counts. Real-time data analysis includes energy deposit measurements and a pulse shape discrimination (PSD) algorithm to reject events originated by unwanted gamma radiation, thus greatly reducing the data rate to be processed offline and enabling operation also in high gamma rate environments. The high-resolution GSPS signal of the DRS4 readout is used to obtain precise event timing information, thereby enabling neutron time of flight (ToF) measurements with nanoseconds precision. In addition, this signal can be used during offline data processing to strengthen the analysis results. Thus, the WaveDREAM DAQ provides an excellent tradeoff between efficiency (real-time data analysis) and precision (GSPS signal for offline analysis) at low cost.

#### PS2-15: VHDL Design of Digital Adaptive Filters for PANDA Signal Processing

<u>M. Greco</u>, M. P. Bussa, M. Destefanis, M. Maggiora, S. Spataro University of Torino and INFN, Turin, Italy

The PANDA (antiProton ANnihilation in Darmstadt) experiment at the new Facility for Antiproton and Ion Research (FAIR) will study interactions between protons and antiprotons in the momentum range 1.5-15 GeV/c. The physics program is very demanding and requires an efficient and flexible triggering system that can handle a data rate in the range 40 to 200 GB/s due to an interaction rate of over 10 MHz. A Serial-Peripheral-Interface firmware was fully developed and implemented in VHDL to interface clocks on-board the digital processing unit and the connected ADC/DAC modules. Running operation was tested successfully. Digital least-mean-square (LMS) adaptive filters were designed and implemented for real-time filtering in data acquisition at work frequencies higher than 100 MHz to cope the foreseen high rate of PANDA experiment.

# PS2-11: A High-Resolution Time-to-Digital Converter Based on Multi-Phase Clock Implement in Field-Programmable-Gate-Array

#### Z. Yin<sup>1,2</sup>, S. Liu<sup>1,2</sup>, X. Hao<sup>1,2</sup>, S. Gao<sup>1,2</sup>, Q. An<sup>1,2</sup>

<sup>1</sup>Department of Modern Physics, University of Science and Technology of China, Hefei, Anhui, China <sup>2</sup>State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei, Anhui, China

Laser mapping is widely used in mapping the topography project .In this project .The experimental plane fly upon the square to be detected .Point the laser to the ground and detect the reflect back laser .The interval between the trigger impulse and echo impulse represent the distance of the plane and ground . We can calculate the interval by a high-resolution TDC .Scan the square with a static frequency can mapping the topography within a high precision .In the experiment we need a laser matrix to scan the square .Each laser device needs a independent TDC .For example ,a 8\*8 laser matrix need 64 TDCs .And each TDC's resolution should be better than 1 ns .In this kind of experiment ,a high integrated and high precision TDC is required .

This paper introduced a time-to-digital converter(TDC) based on 4 multi-phase clock that is implemented in a XILINX's Virtex4 FPGA .Its high precision ,high integrated level and large dynamic range can fit the demands very well .

Profit from the PLL technology we can adjust the clock phase precisely .In this case ,4 multi-phase clocks have the 0,90,180,270 phase shift are generated .Each clocks rising edge has the same delay to the previous clock .Based on those 4 multi-phase clocks ,we can divide one clock period into 4 same part .So that the bin size of the TDC can be proved to 1/4 clock period .It is a new approach to come true the time

interpolation within one clock period .The TDC based on multi-phase clock needs less logic resource than other kind TDC . The performance of the multi-phase clock based TDC was tested .The bin size (resolution) of each channel is 0.757ns and the RMS(precision) is less than 0.5 ns .The dynamic range is longer than 1 second .64 TDC channels is realized in only one FPGA on a 15cm \* 15cm board .The test result demonstrate the multi-phase clock based TDC can match the laser mapping project .And can be used in many other situations which needs a high integrated level and a precision of hundreds picoseconds .

# **PS2-17:** Optimization of the detection of very inclined showers using a spectral DCT trigger in arrays of surface detectors

Z. Szadkowski

Department of Physics and Applied Informatics, University of Lodz, Lodz, Poland

The DCT trigger allows recognition of ADC traces with a very short rise time and fast exponential attenuation related to a narrow, flat muon component of very inclined extensive air showers generated by hadrons and starting their development early in the atmosphere. Very inclined showers generate Cherenkov light falling directly mostly on two PMTs. A probability of 3-fold coincidences of direct light corresponding to a standard Auger trigger is low. Much more probable are 2-fold coincidences of a direct light. The 3rd PMT is next hit by reflected light, but with some delay. By fast sampling (80 MHz) this delay gives signal in the next time bin.

Two-fold coincidences of DCT coefficients allow triggering signals currently being ignored due to either too high amplitude threshold or due to their de-synchronization in time causing a tank geometry. Three DCT engines implemented into EP3C40F32417 FPGA used all DSP blocks generate the spectral trigger, when in at least 2 channels 8 DCT coefficients simultaneously are inside the acceptance lane. Additional veto signal (analyzing the amplitude) controls a trigger rate to avoid a saturation of a transmission channel. Both lab and long-term field measurements on the test tank confirm a high efficiency of the recognition of expected patterns of ADC traces.

#### PS2-19: COTS Real Time Quench Detection System for Superconducting Magnets

<u>R. Rajagopal</u><sup>1</sup>, S. Wunder<sup>2</sup> <sup>1</sup>Controls, Verivolt LLC., Berkeley, CA, United States <sup>2</sup>Sales, National Instruments, Austin, TX, United States

A new approach to real time quench detection in superconducting magnets is being presented. A common denominator to all superconducting magnet applications is the fact that a large amount of energy is being stored in the form of a magnetic field. External conditions or internal structural events can make the superconducting cable to snap out of superconducting state (quench), causing all the stored energy to convert into localized heat. This heat can potentially melt some of the internal elements and destroy the magnet. For this reason, a system that processes quenching in real time with extreme reliability and triggers energy extraction, is crucial to protect these very expensive assets. High resolution sensors from Verivolt, together with DAQ and digital controls from National Instruments, were used to make a Custom-Off-The-Shelf (COTS) quench detection system for superconducting magnets. The signals from the Verivolt sensors are digitized and processed in the FPGA on the backplane of a cRio chassis, keeping all critical elements of the system running on hardware, independent of CPU variabilities.

# **PS2-28:** FPGA Implementation of the 32-Point DFT for a Wavelet Trigger of Cosmic Rays Experiments

Z. Szadkowski

Department of High Energy Astrophysics, University of Lodz, Lodz, Poland

For the observation of ultra high-energy cosmic rays (UHECRs) by the detection of their coherent radio emission an FPGA based wavelet trigger is being developed. Using radio detection, the electromagnetic part of an air shower in the atmosphere may be studied in detail, thus providing information complementary to that obtained by water Cherenkov detectors which are predominantly sensitive to the muonic content of an air shower at ground. For an extensive radio detector array, due to the limited communication data rate, a sophisticated self trigger is necessary. The wavelet trigger investigating online a power of signals is promising, however its implementation requires some optimizations. The digitized signals are converted from the time to frequency domain by a standard Altera library based FFT procedure, then multiplied by wavelet transforms and finally converted to the time-domain again. Altera FFT routines convert ADC data as blocks of 2N samples. FFT coefficients are provided in a serial stream in 2N time bins. An estimated signals power strongly depends on relatively positions of the FFT(data) and the wavelet transforms in a frequency domain. Additional procedure has to calculate a most efficient selection of the sample block to reach a response corresponding to a maximal signal power. If a set of FFT coefficients were available in each clock cycle, the signal power could be estimated also in each clock cycle and additional tuning procedure would not be necessary. The paper describes an implementation of the 32-point FFT algorithm into Altera FPGA providing all 32 complex DFT coefficients for the wavelet trigger.

#### PS2-22: The ATLAS Muon Trigger Performance in Proton-Proton Collisions at Sqrt(s)=7 TeV

K. Nagano<sup>1</sup>, K. Black<sup>2</sup>, <u>T. Matsushita<sup>3</sup></u> <sup>1</sup>KEK, Tsukuba, Japan <sup>2</sup>Boston University, Boston, US <sup>3</sup>Kobe University, Kobe, Japan

The ATLAS experiment at CERN's Large Hadron Collider (LHC) has taken data with colliding beams up to instantaneous luminosities of  $3.65*10^{33}$  cm<sup>-2</sup> s<sup>-1</sup> in run period 2011. Sophisticated triggers to guard the highest physics output while reducing effectively the event rate were required at such high luminosity runs.

The ATLAS Muon trigger has successfully adapted to the changing environment in 2011 runs. The selection strategy has been optimized for the various physics analysis involving muons in the final state. This includes for example the combined trigger signatures with electron and jet trigger objects, and so-called full-scan triggers, which make use of the full event information to search for di-lepton signatures, seeded by single lepton objects.

The L1 muon trigger system gets its input from fast muon trigger detectors. Fast sector logic boards select muon candidates, which are passed via an interface board to the central trigger processor and then to the High Level Trigger (HLT). The Muon HLT is purely software based and encompasses a level 2 trigger followed by an event filter for a staged trigger approach. It has access to the data of the precision muon detectors and other detector elements to refine the muon hypothesis.

This presentation reports about efficiency, resolution, and general performance of the muon trigger in the 2011 runs and in the context of the physics goals of ATLAS.

#### UPG1: Upgrades 1

Tuesday, June 12 17:10-18:10 Crystal Ballroom

**UPG1-1: MEP V2, the New Event Building Protocol for the Upgraded LHCb Experiment** R. Schwemmer, N. Neufeld, G. Liu

CERN, Geneva, Switzerland

During the long shutdown of the LHC in 2018, the LHCb collaboration plans to significantly upgrade the LHCb detector. This upgrade foresees a trigger-less read-out. The detector data is sent directly into a read-out network at the LHC rate of 40 MHz. The data is then reconstructed and processed by a computing farm. The expected data rate will be in the order of 4 TB/s, making it the largest read-out network in HEP to date.

The data flow in this network is rather atypical for HPC networks. The network is divided into O(500) source and O(3000) sink nodes. All data that constitutes an event is initially distributed as fragments among the sources. These fragments have to be sent to one particular sink node for assembly and further processing. One of the biggest challenges is to control the data flow through the network to ensure that the output port to a single sink node is not being overloaded, because the fragments of one event all become available at exactly the same time.

We will present the lessons learned from the MEP v1 protocol currently in use and a first draft of the MEP v2 protocol, which will be used in the upgraded network. Since we do not yet know with certainty which technologies will be available for such a network in the future, the new version of MEP will adhere more strictly to the OSI model. The protocol will be constrained to one layer only and will be a more generic protocol for aggregating data fragments from many sources on one particular target.

#### UPG1-2: A New Readout Control System for the LHCb Upgrade at CERN

<u>F. Alessio</u>, R. Jacobsson CERN, Geneva, Switzerland

The LHCb experiment has proposed an upgrade towards a full 40 MHz readout system in order to run between five and ten times its initial design luminosity. The various sub-systems in the readout architecture will need to be upgraded in order to cope with higher sub-detector occupancies, higher rate and 40 MHz software event filtering. In this paper, we describe the new architecture, new functionalities and the first hardware implementation of the new LHCb Readout Control and Event Management system for the upgraded LHCb experiment, together with first results on the validation of the system.

The system is based on a single new Readout Supervisor instantiating several masters to allow partitioning of the LHCb sub-detectors. The communication with the readout electronics boards, which receive the data from the LHCb detector, is ensured by a shared high-speed optical link network for both the distribution of timing and synchronous control information, as well as communication of trigger throttle signals. A Readout Interface board with fan-out capabilities for timing and synchronous information and fan-in capabilities for throttle and rate regulation of the system, interfaces the Readout Supervisor to the readout boards. Moreover, the Readout Interface board is responsible for distributing the timing and synchronous control information. This board also takes care of the exchange of Experiment Control System information with the Front-End electronics. The new architecture of the system allows hybrid operation by supporting the old readout control system in parallel to the new system. This will allow the LHCb experiment to perform a staged upgrade of the readout architecture, by improving the performance of the old system up to the completion of the upgraded system.

We present how this system is being implemented within a common project in LHCb aiming at developing a shared generic hardware framework for both data readout and readout control based on ATCA technology. In this paper, we outline the real-time implementations of the new Readout Control system, together with solutions and validation tests on how to handle the synchronous distribution of timing and synchronous information in such a complex system entirely based on FPGAs and optical links in order to control the whole upgraded LHCb readout architecture. The development work plan for the readout control system takes into account the requirement that the system should be available in a mature state with high-level control at a very early state for the validation of Front-End electronics and the readout before launching productions.

#### UPG1-3: Topological and Central Trigger Processor for 2014 LHC luminosities

J. T. Childers, G. Anders, B. Bauss, D. Berge, V. B⊟uscher, R. Degele, E. Dobson, A. Ebling, N. Ellis, P. Farthouat, C. Gabaldon, B. Gorini, S. Haas, W. Ji, M. Kaneda, S. Maettig, A. Messina, C. Meyer, S. Moritz, T. Pauly, R. Pottgen, U. Sch□afer, <u>E. Simioni</u>, R. Spiwoks, S. Tapprogge, T. Wengler, V. Wengler

CERN, Meyrin, Geneva, Switzerland

The ATLAS experiment is located at the European Center for Nuclear Research (CERN) in Switzerland. It is designed to observe phenomena that involve highly massive particles produced in the collisions at the Large Hadron Collider (LHC): the world's largest and highest-energy particle accelerator. Event triggering and Data Acquisition is one of the extraordinary challenges faced by the detectors at the high luminosity LHC collider.

During 2011, the LHC reached instantaneous luminosities of 41033cm-1s-1 and produced events with up to 24 interactions per colliding proton bunch. This places stringent operational and physical requirements on the ATLAS Trigger in order to reduce the 40MHz collision rate to a manageable event storage rate of 400Hz and, at the same time, selecting those events considered interesting. The Level-1 Trigger is the

rst rate-reducing step in the ATLAS Trigger, with an output rate of 75kHz and decision latency of less than 2.5s. It is primarily composed of the Calorimeter Trigger, Muon Trigger, the Central Trigger Processor (CTP) and by 2014 a complete new electronics module: the Topological Processor (TP).

The TP will make it possible, for the first time, to concentrate detailed information from subdetectors in a single Level-1 module. This allows the determination of angles between jets and/or leptons, or even more complex observables such as muon isolation or invariant mass. This requires to recieve on a single module a total bandwidth of about 1Tb/s and process the data within less than 100 ns. In order to accept this new information from the TP, the CTP will be upgraded to process double the number of trigger inputs and logical combinations of these trigger inputs. These upgrades also address the growing needs of the complete Level-1 trigger system as LHC luminosity increases. During the LHC shutdown in 2013, the TP and the upgraded CTP will be installed. We present the justication for such an upgrade, the proposed upgrade to the CTP, and tests on the TP demonstrator and prototype, emphasizing the characterization of the high speed links and tests of the topological algorithm's latency and logic utilization.

#### **PS2: Poster Session 2**

#### Tuesday, June 12 15:50-17:10 Boiler room PS2-1: High Performance FPGA-Based DMA Interface for PCIe H. Kavianipour, S. Muschter, C. Bohm

Department of Physics, Stockholm University, Stockholm, Sweden

We present a data communication suite developed for use in the Track Engine Trigger for the IceCube Neutrino Observatory (South Pole). It is a PCIe-based system implemented in Xilinx FPGAs with a bus master DMA on a 4-lane, generation 2 link. The suite contains DMA controller hardware IPs, test benches, Linux driver and user application for DMA and PIO transfers into memory modules and FIFOs. The Linux driver uses streaming mapping, vector write functionality, race condition controllers, page-wise memory allocation, wait queues and Message Signaled Interrupt (MSI) to facilitate high performance and throughput. The DMA which is based on the Xilinx bus master DMA, produces measured transfer speeds up to 680MB/s (read) and 720MB/s (write) using a HiTech Global Virtex6 board. The hardware has been verified on different platforms with different FPGAs. Besides the original IceCube application, the suite has also been used for the development of readout electronics for particle physics experiments. Other applications are also considered.

## PS2-2: Low Power, Accurate Time Synchronization MAC Protocol for Real-Time Wireless Data Acquisition

J. Zhang<sup>1,2</sup>, J. Wu<sup>1,2</sup>, Z. Han<sup>1,2</sup>, L. Liu<sup>1,2</sup>, K. Tian<sup>1,2</sup>

<sup>1</sup>Department of modern physics, University of Science and Technology of China, Hefei, Anhui, China <sup>2</sup>State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei, Anhui, China

The issues of real-time wireless data acquisition, how to design and manage thousands of sensor nodes located over a vast geographical area, have received much attention during the last years. Energy efficiency, time synchronization and other requirements to support real time processing make a great challenge for such system. This paper proposes a real-time wireless data acquisition MAC protocol meeting the requirements of high throughput, low end-to-end latency, low energy consumption and accurate time synchronization. A hybrid approach, combining the advantages of Time Division Multiple Access (TDMA) and Frequency Hopping Spread Spectrum (FHSS), is adopted for antijamming and collision prevention. The hopping sequences of FHSS are carefully selected to reduce interference to a minimum. The packets of commands and data are delivered in a "bucket brigade"-like manner for optimum bandwidth utilization and low end-to-end latency. Experiments show that the bandwidth utilization exceeds 25%, and the mean latency per hop is kept to 8 to 9 milliseconds. Correspondingly, we propose a two-step time synchronization approach to balance synchronization performance versus energy consumption. First, the microcontroller of sensor node uses its internal low-power and low-precision RC oscillator to setup and manage the network connection. The frequency difference between local clock and parent's clock is estimated by periodically receiving time-stamped beacons. Then, when about to start acquisition, the sensor nodes enable the phase-locked loop (PLL) circuits, using microcontroller's counter, software low pass filter, digitalto-analog converter (DAC) and voltage-controlled crystal oscillator (VCXO) to generate the low-jitter synchronous clock. At the end of acquisition, the sensor nodes go back to the first step by turning off the PLL circuits to save power. In this way, the average per-hop synchronization error is in the 0.5µs range, which is markedly better than those of widely used algorithms, such as Reference-Broadcast Synchronization (RBS) and Timing-sync Protocol for Sensor Networks (TPSN).

### PS2-3: A MAC Layer Congestion Control Method to Achieve High Network Performance for EAST Tokamak

J. Luo<sup>1</sup>, <u>K. Shi<sup>2,3</sup></u>, Y. Shu<sup>3</sup>, S. Lin<sup>2</sup> <sup>1</sup>Institute of Plasma Physics, Academia Sinica, Hefei, China <sup>2</sup>Tianjin University of Technology, Tianjin, China <sup>3</sup>Tianjin University, Tianjin, China

Many applications would require fast data transfer in Wireless Local Area Networks (WLANs). A representative example is that EAST experiment data are retrieved by some physics researchers using the Transmission Control Protocol (TCP). However, due to the high contention degree and the high error rate in wireless networks, the packets may be loss for wireless reasons but not for congestion. This will greatly degrade the TCP performance. On one hand, the wireless packet loss is not congestion, but the traditional TCP assumes that every packet drop is congestion and thus decreases its congestion window, which will degrade its performance. On the other hand, due to the MAC layer retransmission policy employed by the IEEE 802.11 DCF mechanism, the lost packets at the MAC layer will be retransmitted for some times. Thus the waiting time of the packets in the MAC layer queue will be increased. So if we ignore all the packet loss for wireless reasons as the other improved mechanisms do, the network work congestion will be aggravated and its performance will be degraded. To alleviate the impact of the wireless packet loss to TCP in WLANs, this paper proposes a MAC layer congestion control method which is implemented at the end wireless nodes based on IEEE 802.11b DCF mechanism. At first, we propose a concept of MAC layer congestion window which means

the MAC layer will send all the packets in a window when it gets access to the wireless channel, other than just sends only one packet as the traditional DCF mechanism does. Then our congestion control mechanism adjusts the MAC layer congestion window based on the contention degree and the MAC layer packet loss rate. If the MAC layer contention degree or packet error rate is high, we will increase the congestion window to improve the successful transmission rate, and we will decrease the congestion window when the packet loss rate is lower than the average wireless packet loss rate. We also use a threshold to control the increase of the congestion window. The threshold is set according to the number of wireless nodes. By performing wireless congestion control at the MAC layer, our mechanism can mitigate the effect of wireless loss to TCP, and therefore improve the TCP performance. The simulation and experiment results show that our mechanism can have better performance than traditional MAC layer mechanisms in WLANs.

#### PS2-4: High Performance Event Building with InfiniBand Network in CBM Experiment

S. Linev

GSI Helmholtzzentrum fuer Schwerionenforschung, Darmstadt, Germany

The main challenge of CBM (Compressed Baryonic Matter) experiment at FAIR (Facility for Antiproton and Ion Research, Darmstadt, Germany) will be measuring of rare (1E-6 - 1E-9) probes at high (1E7 1/s) interactions rate. Due to complex signature of interested events front-end electronics will measure all signals in self-triggered mode and push data to the FLES (First Level Event Selection) computing farm - the first place where event selection decision could be done. The central part of the FLES will be high-performance network fabric, which should sort and distribute 1 TB/s of original data over the computing nodes in real time.

InfiniBand is high-throughput low-latency interconnect technology with low CPU consumption and, that is also very important, with affordable prices. Since several years InfiniBand is considered as most probable candidate for usage in CBM FLES. A number of tests were performed verifying throughput capabilities of InfiniBand fabric for the traffic patterns, expected in the FLES. Resent tests were performed on LOEWE-CSC cluster (http://csc.uni-frankfurt.de), which consists of about 800 nodes, equipped with QDR InfiniBand host adapters and connected via half fat-tree switch fabric.

Main approach of the performed tests was scheduling of data transfer in the way that no conjunctions are produced in the network. The crucial point on this way - explore and use fabric interconnect topology, where many physical paths between two nodes are exist. To allow exact timing of data transfers time synchronization between computing nodes with sub-microseconds precision was required. Dedicated software was implemented to execute such scheduled data transfer showing very promising results. Designed code will be integrated into the next version of the DABC (http://dabc.gsi.de) - general-purpose framework for DAQ software development.

#### PS2-5: Modulator-Based, High Bandwidth Optical Links for HEP Experiments

W. S. Fernando<sup>1</sup>, R. W. Stanek<sup>1</sup>, <u>D. G. Underwood<sup>1</sup></u>, D. Lopez<sup>2</sup>

<sup>1</sup>High Energy Physics Division, Argonne National Lab, Argonne, II, United States <sup>2</sup>Center for Nanoscale Materials, Argonne National Lab, Argonne, II, United States

Optical links will be an integral part of intelligent tracking systems at various scales from coupled sensors through intra-module and off detector communication. These links will be particularly useful if they utilize light modulators which are very small, low power, high bandwidth, and are very radiation hard. As a concern with the reliability, bandwidth and mass of future optical links in LHC experiments, we are investigating CW lasers and light modulators as an alternative to VCSELs.

We have constructed a test system with 3 such links, each operating at 10 Gb/s. We present the quality of these links (jitter, rise and fall time, BER) and eye mask margins (10GbE) for 3 different types of modulators: LiNbO3-based, InP-based, and Si-based. We present the results of radiation hardness measurements with up to  $\sim$ 10^12 protons/cm<sup>2</sup> and  $\sim$ 65 krad total ionizing dose (TID), confirming no single event effects (SEE) at 10Gb/s with either of the 3 types of modulators.

We have used a Si-based photonic transceiver to build a complete 40 Gb/s bi-directional link (10 Gb/s in each of four fibers) for a 100m run and have characterized it to compare with standard VCSEL-based optical links. Some future developments of optical modulator-based high bandwidth optical readout systems, and applications based on both fiber and free space data links, such as local triggering and data readout and trigger-clock distribution, are also discussed.

#### PS2-6: Waveform Timing Algorithms with a 5 GS/s Fast Pulse Sampling Module

J. Wang<sup>1,2</sup>, L. Zhao<sup>1,2</sup>, C. Feng<sup>1,2</sup>, Y. Zhang<sup>1,2</sup>, S. Liu<sup>1,2</sup>, Q. An<sup>1,2</sup>

<sup>1</sup>State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei, Anhui, China <sup>2</sup>Anhui Key Laboratory of Physical Electronics, Department of Modern Physics, University of Science and Technology of China, Hefei, Anhui, China

There are several algorithms to extract the arrival time of detectors as the time from a characteristic position on sampled waveforms (e.g., the central gravity of its waveform). In this paper, we first analysis the characteristics of three timing algorithms: digital const fraction discrimination (d-CFD), sliding widow with amplitude-weighted time (SWAWT), and optimal filtering in both time and frequency domain. We then built a fast pulse sampling module with the 4th version of Domino Ring Sampler (DRS4), and verify the timing performance of these algorithms in some of our physics experiments. There are a total of six channels on the module with the sampling rate up to 5GS/s (giga samples per second) per channel. We proved the module is capable of sub-10 ps RMS timing precision at about 5GS/s after applying such strategies as DC offset calibration and uneven sampling interval compensation.

In our evaluation, we first evaluated the timing performances of these algorithms at lab with reconstructed pulse of Multi-gap Resistive Plate Chamber (MRPC). MRPC signals are generated from a template and distributed to two channels with a constant delay between the channels. The time intervals are derived from the algorithms above with respect to the pulse shape, and the timing performances are all in the range of about 15 ps RMS (10.6 ps RMS per channel). Currently, we setup a reference start with four plastic scintillators (EJ-200), the rise time of which is about 0.7 ns. We will evaluate the timing resolution of the reference start and give our consideration of the optimal timing algorithms.
## PS2-7: The Study of Multi-Channel High Precision Pulse Synchronizer

<u>F. Li<sup>1,2</sup></u>, L. Chen<sup>1,2</sup>, F. Liang<sup>1,2</sup>, G. Jin<sup>1,2</sup>

<sup>1</sup>Department of Modern Physics, University of Science and Technology of China, Hefei, Anhui, China <sup>2</sup>State Key Laboratory of Particle Detection and Electronics, Hefei, Anhui, China

In recent years, measurement systems have raised critical requirements on trigger pulse sources. With the wide use of high precision measuring instruments, usually there would have dozens of instruments in an experiment and they need to be triggered simultaneously. Many high bandwidth oscilloscopes and spectrometers have demanded that the trigger pulse should have the rise time of about 150ps to ensure the measurement precision. Therefore, in order to obtain high quality trigger pulses for high precision measuring instruments, we need to do some research on the method and technology to generate multi-channel trigger pulses. That is high precision pulse synchronizer. A kind structure of pulse synchronizer is introduced to generate 48-channel synchronous pulses of different requirements. It was mainly constituted of high precision clock fan-out circuit and broadband transistor circuit. It can provide positive and negative pulses with the rise time of about 150-300ps and the amplitude of about 5-15 volts. The positive and negative pulse generating circuits have different design and all the output pulses are tested and proved to have a skew and jitter of less than 20ps. The pulse output time differences between each channel are inherent and can be corrected.

#### **PS2-8:** Sophisticated Online Analysis in ADC Boards

P. Wuestner<sup>1</sup>, A. Erven<sup>1</sup>, W. Erven<sup>1</sup>, G. Kemmerling<sup>1</sup>, H. Kleines<sup>1</sup>, P. Kulessa<sup>2</sup>, P. Marciniewski<sup>3</sup>, H. Ohm<sup>4</sup>, K. Pysz<sup>4</sup>, V. Serdyuk<sup>4</sup>,

S. van Waasen<sup>1</sup>, P. Wintz<sup>4</sup> <sup>1</sup>ZEL, Research Centre Juelich, Juelich, Germany <sup>2</sup>Institute of Nuclear Physiks PAN, Krakow, Poland <sup>3</sup>Physics and Astronomy, Uppsala University, Uppsala, Sweden <sup>4</sup>IKP, Research Centre Juelich, Juelich, Germany

For the readout of the calorimeter of the WASA at COSY experiment a QDC board using sampling ADCs and FPGAs to perform the pulse integration was developed. In the initial version only a simple pulse finding algorithm was implemented in order to avoid delay cables by storing the digitized signals in a pipeline of a few microseconds.

Recently a new version with 16 ADCs at a sample rate of 240 MHz and 12 Bit resolution was developed for tests of cylindrical drift chambers (straw tubes) in the Straw Trube Tracker of the PANDA experiment. The goal was to measure the energy loss by charge readout in addition to drift time measurement. Due to the irregular cluster structure of the straw signals complex algorithms for pulse finding, pulse feature extraction and triggering were implemented.

The algorithm can detect pile up and find pulse groups. Integration is possible over single pulses and over complete groups. Different methods for the calculation of starting time, including constant fraction, are implemented. Internal trigger generation depends on various criteria, e.g. slew rate, amplitude or sum of consecutive samples.

# PS2-9: A High Density Time-to-Digital Converter Prototype Module for BESII End-Cap TOF Upgrade

H. Fan<sup>1,2</sup>, C. Feng<sup>1,2</sup>, W. Sun<sup>1,2</sup>, C. Yin<sup>1,2</sup>, S. Liu<sup>1,2</sup>, Q. An<sup>1,2</sup>

<sup>1</sup>State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei, Anhui, China <sup>2</sup>Department of Modern Physics, University of Science and Technology of China, Hefei, Anhui, China

A high precision and high density time-to-digital converter (TDIG) module is described in this paper. The end-cap time-of-flight (ETOF) of a Beijing Spectrometer (BESIII) will be upgraded to improve its total time resolution. After upgrade, ETOF will be built using Multigap Resistive Plate Chambers (MRPC) and will have 1728 readout channels. The readout electronics must achieve high density due to the huge readout channel number. The time resolution of the readout electronics is required to be better than 25 ps, and the time resolution of the TDC is required to be better than 20 ps.

A 9U VME module with TDC function is designed for the ETOF upgrade. The signals, which are produced by the detector and then amplified and discriminated by the frontend electronics, are sent to the TDIG module to be digitized. The TDIG module uses the CERN HPTDC technique to achieve high precision time-to-digital converter. Each module applies nine HPTDC chips, which are programmed into the very high resolution mode, to realize 72 time measurement channels. The primary hit measurements from HPTDC chips are forwarded to one Cyclone field-programmable gate array (FPGA) to be processed. Finally, the data are sent to DAQ server by Ethernet. The VME interface logic is implemented in an Altera CPLD. The TDIG module can also accept an external trigger and pick out the expected events which are correlated with the given trigger. A series of experiment tests show that the time resolution of the TDIG module is better than 20 ps.

## PS2-10: Development of White Rabbit Interface for Synchronous Data Acquisition and Timing Control

Q. Du, G. Gong, W. Pan, H. Lu Tsinghua University, Beijing, China

In large scale physical experiments such as Large High Altitude Air Shower Observatory (LHAASO), sub-nanosecond accuracy timing distribution is required for thousands of detector DAQ frontends. The recent advances in White Rabbit Protocol (WR) provide a novel solution for such synchronous data acquisition application. We demonstrate a compact design of WR slave in FMC(FPGA Mezzanine Card) format that could work as a Giga-bit Ethernet interface to each detector readout circuit, and provide stabilized frequency distribution and timestamp synchronization using data link.

# PS2-11: A High-Resolution Time-to-Digital Converter Based on Multi-Phase Clock Implement in Field-Programmable-Gate-Array

<u>Z. Yin</u><sup>1,2</sup>, S. Liu<sup>1,2</sup>, X. Hao<sup>1,2</sup>, S. Gao<sup>1,2</sup>, Q. An<sup>1,2</sup>

<sup>1</sup>Department of Modern Physics, University of Science and Technology of China, Hefei, Anhui, China

<sup>2</sup>State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei, Anhui, China

Laser mapping is widely used in mapping the topography project .In this project .The experimental plane fly upon the square to be detected .Point the laser to the ground and detect the reflect back laser .The interval between the trigger impulse and echo impulse represent the distance of the plane and ground . We can calculate the interval by a high-resolution TDC .Scan the square with a static frequency can mapping the topography within a high precision .In the experiment we need a laser matrix to scan the square .Each laser device needs a independent TDC .For example ,a 8\*8 laser matrix need 64 TDCs .And each TDC's resolution should be better than 1 ns .In this kind of experiment ,a high integrated and high precision TDC is required .

This paper introduced a time-to-digital converter(TDC) based on 4 multi-phase clock that is implemented in a XILINX's Virtex4 FPGA. Its high precision high integrated level and large dynamic range can fit the demands very well.

Profit from the PLL technology we can adjust the clock phase precisely .In this case ,4 multi-phase clocks have the 0,90,180,270 phase shift are generated .Each clocks rising edge has the same delay to the previous clock .Based on those 4 multi-phase clocks ,we can divide one clock period into 4 same part .So that the bin size of the TDC can be proved to 1/4 clock period .It is a new approach to come true the time interpolation within one clock period .The TDC based on multi-phase clock needs less logic resource than other kind TDC . The performance of the multi-phase clock based TDC was tested .The bin size (resolution) of each channel is 0.757ns and the RMS(precision)

is less than 0.5 ns .The dynamic range is longer than 1 second .64 TDC channels is realized in only one FPGA on a 15cm \* 15cm board .The test result demonstrate the multi-phase clock based TDC can match the laser mapping project .And can be used in many other situations which needs a high integrated level and a precision of hundreds picoseconds .

## PS2-12: Real-Time Data Analysis Using the WaveDREAM Data Acquisition System

<u>H. Friederich</u><sup>1,2</sup>, G. Davatz<sup>1,2</sup>, U. Gendotti<sup>1</sup>, H. Meyer<sup>7</sup>, D. Murer<sup>1,</sup> <sup>1</sup>Arktis Radiation Detectors Ltd, Zurich, Switzerland <sup>2</sup>ETH Zurich, Institute for Particle Physics, Zurich, Switzerland

The WaveDREAM data acquisition (DAQ) system, based on the DRS4 waveform digitizing chip, provides 1-5 Giga-samples per second (GSPS) digitization in a region of interest together with continuous sampling of the input signals at 120 Mega-samples per second (MSPS). In the Field-programmable gate array (FPGA), the 120 MSPS signal can be used to build complex trigger logic and to perform data analysis with hard real-time constraints in the microseconds range. Variable gain amplifiers (VGAs) allow to scale the input signal with 5-17 dB amplification such that it optimally matches the dynamic range of the DRS4 IC. As an application example, the FPGA firmware for this general purpose system has been optimized for the readout of an array of high-pressure 4He fast neutron detectors. For best performance, the VGA amplification has been matched to the photomultiplier (PMT) signals to allow efficient triggering on the event start without sacrificing the ability to count single photoelectrons (SPE). The trigger in the FPGA employs a coincidence logic between the two PMT signals of a detector vessel to effectively filter out PMT dark counts. Real-time data analysis includes energy deposit measurements and a pulse shape discrimination (PSD) algorithm to reject events originated by unwanted gamma radiation, thus greatly reducing the data rate to be processed offline and enabling operation also in high gamma rate environments. The high-resolution GSPS signal of the DRS4 readout is used to obtain precise event timing information, thereby enabling neutron time of flight (ToF) measurements with nanoseconds precision. In addition, this signal can be used during offline data processing to strengthen the analysis results. Thus, the WaveDREAM DAQ provides an excellent tradeoff between efficiency (real-time data analysis) and precision (GSPS signal for offline analysis) at low cost.

## PS2-13: DSP Based Smart Sensorless Stepping Motor Driver for LHC Collimators

<u>A. Masi</u>, M. Butcher, R. Losito, R. Picatoste Ruilope CERN, Geneva, Switzerland

More than 500 stepping motors are used to move precisely the LHC collimators' jaws. Since the environment surrounding the motors is highly radioactive and drive electronics are damaged by this radioactivity, the drives are placed in radiation-safe zones at a distance of up to 1 km from the motors. They must therefore be connected to the drives via long cables. Furthermore the electromagnetic interference (EMI) emissions of the drives must be minimal at low frequencies in order to avoid interference with neighboring electronics. Pulse Width Modulated (PWM) control voltages are used to increase power efficiency; however, they generate significant EMI emissions. High frequency PWM chopping frequencies must therefore be used to shift the emissions to higher frequencies. These high frequency voltage signals, nonetheless, cause the long cables to act as transmission lines, and produce a ringing phenomenon in the currents on the drive-side of the cable, the side where measurements are possible. This ringing has required special consideration in the design of the drives current controllers. Good motor positioning repeatability is of great importance. It is thus necessary to have real-time knowledge of the motors position, in order that compensatory action can be taken to correct any misalignments (i.e. steps losses). Radiation-hard resolvers are used to measure the motors position and to detect lost steps. It is, nonetheless, desirable to have sensor redundancy. Additionally, even if the stepping motors work at nominal torque, chosen by design to be at least twice the nominal load torque, having an estimate of the real load torque can be useful to warn about mechanical degradation. All the LHC collimators have passed acceptance tests where the load torques on the entire axes strokes have been measured and verified. Load torque warning thresholds can be easily determined for each collimator axis according to the collimators type and orientation. A sensorless position and torque estimation solution based on Kalman filtering has been developed, having first determined an accurate model of the cable and motor used in the LHC Collimators. The Kalman observer has been implemented on a Texas Instruments TMS320F28335 DSC. In this paper, starting from the identified model, the Kalman filters application is described focusing on its Real Time implementation and the optimizations necessary to achieve the required performances. The estimation algorithm has been tested on a testbench based on a real collimator and a DAQ system developed ad hoc to compare the estimation results with the real torque and position measured on each collimator axis. A wide set of experimental measurements will be shown to validate the proposed approach.

## PS2-14: High Accuracy Reading Algorithm for Ironless Linear Position Sensor

A. Masi, A. Danisi, M. Di Castro, R. Losito

CERN, Geneva, Switzerland

Linear Variable Differential Transformer (LVDT) sensors are widely used in the radioactive environment of particle accelerators for the survey of linear axes of beam intercepting devices as scrapers, targets and collimators. Required reading accuracy can vary from some hundreds of micrometers down to a few micrometers in a challenging environment characterized by integrated ionization dose up to several MGrays and neutron fluence up to 10^12 p/cm2/s, and by Electro Magnetic Interference produced by high current cables passing nearby the sensors. LVDTs are used for these applications because of their properties of contactless sensing, easy-to-implement radiation resistance, robustness, intrinsic infinite resolution. As an example, more than 750 LVDT sensors that have been used to survey in Real Time the LHC Collimators' axes position. Using a proper reading algorithm, based on sinefit interpolation, a reading accuracy of a few micrometers has been reached even in the presence of cable lengths of several hundreds of meters. In spite of such performances, LVDTs are typically sensitive to external magnetic fields. In particular, a DC/slowly-varying magnetic field can cause significant position errors (e.g. several hundreds of micrometers). To allow their use also in presence of magnetic fields, we developed a novel ironless sensing structure that is intrinsically immune to external DC or low frequency magnetic fields due to the absence of ferromagnetic materials. In addition, being based on inductive coupling between non-contact coils, the new sensor keeps the key properties of LVDT sensors. The position of the moving coil can be extracted, as in standard LVDTs, by a differential reading of the fundamental harmonic of the two sense coils voltages. Even if this harmonic is not affected by a DC/slowly-varying magnetic field, the sense coils can in principle capture interfering signals coming from time-varying magnetic fields. A novel algorithm has been conceived to reach reading accuracy of few micrometers. It is characterized by high selectivity around the sensor excitation frequency, high immunity to even ultra low frequency interfering tones, high immunity to temperature variation. In this paper after a description of the new sensor, the reading algorithm will be fully detailed with particular focusing to the optimization for a Real Time implementation. A complete characterization in simulation is also presented. Experimental validations will complete the work.

## PS2-15: VHDL Design of Digital Adaptive Filters for PANDA Signal Processing

M. Greco, M. P. Bussa, M. Destefanis, M. Maggiora, S. Spataro University of Torino and INFN, Turin, Italy

The PANDA (antiProton ANnihilation in Darmstadt) experiment at the new Facility for Antiproton and Ion Research (FAIR) will study interactions between protons and antiprotons in the momentum range 1.5-15 GeV/c. The physics program is very demanding and requires an efficient and flexible triggering system that can handle a data rate in the range 40 to 200 GB/s due to an interaction rate of over 10 MHz. A Serial-Peripheral-Interface firmware was fully developed and implemented in VHDL to interface clocks on-board the digital processing unit and the connected ADC/DAC modules. Running operation was tested successfully. Digital least-mean-square (LMS) adaptive filters were designed and implemented for real-time filtering in data acquisition at work frequencies higher than 100 MHz to cope the foreseen high rate of PANDA experiment.

# **PS2-16:** Experience with the Custom-Developed ATLAS Trigger Monitoring and Reprocessing Infrastructure

<u>V. Bartsch<sup>1</sup></u>, S. George<sup>2</sup>, M. zur Nedden<sup>3</sup> <sup>1</sup>University of Sussex, Brighton, United Kingdom <sup>2</sup>Royal Holloway University of London, London, United Kingdom <sup>3</sup>Humboldt-Universitaet zu Berlin, Berlin, Germany

We describe the framework to test and validate new trigger releases, the data quality assessment which is done shortly after a run has finished and the treatment of events where no high level trigger decision could be derived.

ATLAS has had two long data taking periods in spring to winter 2010 and 2011. During that time the ATLAS trigger system has been improved to accomodate changes in the luminosity, the detector conditions and the general improvements to algorithms. It has become important to check the new developments to the trigger software efficiently, to monitor the quality of the produced data reliably and efficiently and communicate the outcome of the data quality assessment to be considered in physics analysis.

The trigger monitoring can be roughly divided into online and offline monitoring. The online monitoring calculates and displays all rates at every level of the trigger and evaluates up to 3000 data quality histograms. Online data quality information is being checked and recorded automatically. The offline trigger monitoring provides information after a run has finished. Experts are checking the information being guided by the assessment of algorithms checking the current histograms with a reference. The experts are recording their assessment in a so-called data quality defects database which is being used to build a good run list of data good enough for physics analysis. In the first half of 2011 about three percent of all data had an intolerable defect resulting from the ATLAS trigger system.

To keep the percentage of data with defects low any changes of trigger algorithms or menus must be tested reliably. A recent run without any high level trigger decision with a sufficient statistics is being reprocessed to check that the changes do not introduce any unexpected side-effects. The reprocessed datasets are being checked in the same offline trigger monitoring framework that is being used for the offline trigger data quality. It turns out, that the current system works very reliable and all potential problems could be faced.

In addition events for which the trigger could not make a decision are written out and recovered if possible. The events are available for physics analysis if successfully recovered. All of the events are analysed in detail to establish why the trigger could not make a decision. Feedback to the trigger developers is provided if necessary.

# **PS2-17**: Optimization of the detection of very inclined showers using a spectral DCT trigger in arrays of surface detectors

Z. Szadkowski

Department of Physics and Applied Informatics, University of Lodz, Lodz, Poland

The DCT trigger allows recognition of ADC traces with a very short rise time and fast exponential attenuation related to a narrow, flat muon component of very inclined extensive air showers generated by hadrons and starting their development early in the atmosphere. Very inclined showers generate Cherenkov light falling directly mostly on two PMTs. A probability of 3-fold coincidences of direct light corresponding to a standard Auger trigger is low. Much more probable are 2-fold coincidences of a direct light. The 3rd PMT is next hit by reflected light, but with some delay. By fast sampling (80 MHz) this delay gives signal in the next time bin.

Two-fold coincidences of DCT coefficients allow triggering signals currently being ignored due to either too high amplitude threshold or due to their de-synchronization in time causing a tank geometry. Three DCT engines implemented into EP3C40F32417 FPGA used all DSP blocks generate the spectral trigger, when in at least 2 channels 8 DCT coefficients simultaneously are inside the acceptance lane. Additional veto signal (analyzing the amplitude) controls a trigger rate to avoid a saturation of a transmission channel. Both lab and long-term field measurements on the test tank confirm a high efficiency of the recognition of expected patterns of ADC traces.

## PS2-18: Data Formatter System for the ATLAS Fast TracKer

J. Olsen<sup>1</sup>, T. Liu<sup>1</sup>, B. Penning<sup>1</sup>, H. L. Li<sup>2</sup>

<sup>1</sup>Fermi National Accelerator Laboratory, Batavia, Illinois, United States <sup>2</sup>University of Chicago, Chicago, Illinois, United States

Collisions in the LHC occur with an instantaneous luminosity of 1E34 cm^-2s^-1. The ATLAS detector trigger system must reject a vast majority of these events, and only 200 events per second can be stored for later analysis. Instantaneous luminosity is expected to increase to 3E34 with an average of 75 proton-proton interactions per crossing. Under these conditions the existing ATLAS trigger is strained and the need for a tracking trigger is clear. The Fast Tracker (FTK) upgrade adds a hardware based level-2 track trigger to the ATLAS DAQ system. Complications arise from the fact that the ATLAS inner detector was not designed for track triggering. Inner detector modules are not organized into the symmetric eta-phi towers that the track finder algorithms require.

The FTK system requires a Data Formatter hardware layer to perform data compression, remapping and repackaging of inner detector hits. The Data Formatter hardware accepts data delivered over fiber links from over 200 Readout Drivers (ROD). The first stage involves compressing pixel hits using an FPGA-based 2D clustering algorithm. Pixel clusters and SCT strip hits are then exchanged between Data Formatter boards, repackaged and sent downstream to the FTK core processing crates.

Prior to settling on any particular hardware platform, the Data Formatter system was extensively simulated using high-level tools written in C++ and Python. A balance between board density, physical *I/O* limitations, and backplane complexity was sought, and achieved, with the baseline design being comprised of 32 boards with up to eight fiber link inputs per board. While the simulation tools determined the ideal arrangement of crates, boards and input fibers, it quickly became clear that the number of data paths between board swas irregular. A given Data Formatter board must exchange data with between eight and nineteen other boards. Designing specific inter-board connections into a custom backplane was possible but a hard-wired backplane severely limits the possibility of future expansion. The Advanced Telecommunications Computing Architecture (ATCA) full mesh backplane provides the ideal hardware solution: every board in the shelf can exchange data directly over high speed serial interconnects. The baseline design is comprised of four 14-slot ATCA shelves; each shelf contains eight Data Formatter boards and six empty slots are available for future expansion. Data Formatter boards are hardware-based, incorporating large FPGAs with many multi-gigabit SERDES components. While the Data Formatter is designed for a level-2 trigger, the architecture also lends itself to scalable, high performance level-1 trigger systems.

ATCA hardware is designed for high availability with particular emphasis on redundant power and a robust management interface. The Data Formatter is our first design to target the ATCA platform and this paper chronicles our design process from conception to first prototype.

## PS2-19: COTS Real Time Quench Detection System for Superconducting Magnets

R. Rajagopal<sup>1</sup>, S. Wunder<sup>2</sup>

<sup>1</sup>Controls, Verivolt LLC., Berkeley, CA, United States <sup>2</sup>Sales, National Instruments, Austin, TX, United States

A new approach to real time quench detection in superconducting magnets is being presented. A common denominator to all superconducting magnet applications is the fact that a large amount of energy is being stored in the form of a magnetic field. External conditions or internal structural events can make the superconducting cable to snap out of superconducting state (quench), causing all the stored energy to convert into localized heat. This heat can potentially melt some of the internal elements and destroy the magnet. For this reason, a system that processes quenching in real time with extreme reliability and triggers energy extraction, is crucial to protect these very expensive assets. High resolution sensors from Verivolt, together with DAQ and digital controls from National Instruments, were used to make a Custom-Off-The-Shelf (COTS) quench detection system for superconducting magnets. The signals from the Verivolt sensors are digitized and processed in the FPGA on the backplane of a cRio chassis, keeping all critical elements of the system running on hardware, independent of CPU variabilities.

#### PS2-20: Commissioning and Performance of a Fast Level-2 Trigger System at VERITAS

B. Zitzer<sup>1</sup>, A. Weinstein<sup>2</sup>, M. Schroedter<sup>3</sup>, M. Orr<sup>3</sup>, M. Oberling<sup>1</sup>, A. Kreps<sup>1</sup>, F. Krennrich<sup>2</sup>, G. Drake<sup>1</sup>, K. Byrum<sup>1</sup>, J. T. Anderson<sup>1</sup> *HEP Divison, Argonne National Laboratory, Lemont, IL, United States* 

<sup>3</sup>Smithsonian Astrophysical Observatory, Amado, AZ, United States

We have built a new three-stage FPGA-based high-speed camera-level pattern trigger for VERITAS, an array of ground-based imaging atmospheric Cherenkov telescopes (IACTs) located in Arizona. This trigger has the ability to recognize patterns of Cherenkov light generated by atmospheric air showers initiated by incident extra- terrestrial gamma rays. The new trigger has programmable coincidence recognition timing and programmable delay compensation over 499 pixel channels in an IACT camera. Measurement of and compensation for system timing variations is achieved through the use of an FPGA-based time-to-digital converter (TDC) and FPGA-based programmable delay elements. The trigger pattern is the coincidence of any three adjacent pixels within the camera. Night-sky background is suppressed as the ratio of the squares of the coincidence gate widths. The tighter coincidence width achieved by the new system therefore permits operation at lower discriminator threshold. The new trigger has now been successfully installed on all four of the IACTs of VERITAS, replacing the previous

<sup>&</sup>lt;sup>2</sup>Iowa State University, Ames, IA, United States

system. We present measurements of the performance of this new trigger in comparison with that of the previous system and of the effect of the new trigger upon overall array performance.

## PS2-21: The ATLAS Hadronic Tau Trigger

C. Cuenca Almenar Department of Physics, Yale University, New Haven, Switzerland

Hadronic tau decays play a crucial role in the search for physics beyond the Standard Model as well as in Standard Model measurements. However, hadronic tau decays are difficult to identify and trigger on due to their resemblance to QCD jets. Given the large production cross section of QCD processes, designing and operating a trigger system with the capability to efficiently select hadronic tau decays, while maintaining the rate within the bandwidth limits is a difficult challenge.

The ATLAS trigger is a complex system, structured in three level, each of them accessing more precise information, having more allocated time and running more sophisticated algorithms. These algorithms not only have to reconstruct and identify hadronic tau products very fast, but they also need to reject backgrounds to keep the output rate of the trigger in the allocated bandwidth.

This contribution will summarize the status and performance of the ATLAS tau trigger system during the 2011 data taking period, and the upgrades put in place for the current 2012 run. Special emphasis will be placed on the key role of identification and rejection capabilities of the different sub-detectors of ATLAS and the algorithms used. Finally, first results and prospects on the performance in 2012 will be presented.

## **PS2-22:** The ATLAS Muon Trigger Performance in Proton-Proton Collisions at Sqrt(s)=7 TeV K. Nagano<sup>1</sup>, K. Black<sup>2</sup>, T. Matsushita<sup>3</sup>

<sup>1</sup>KEK, Tsukuba, Japan <sup>2</sup>Boston University, Boston, US <sup>3</sup>Kobe University, Kobe, Japan

The ATLAS experiment at CERN's Large Hadron Collider (LHC) has taken data with colliding beams up to instantaneous luminosities of  $3.65*10^{33}$  cm<sup>-2</sup> s<sup>-1</sup> in run period 2011. Sophisticated triggers to guard the highest physics output while reducing effectively the event rate were required at such high luminosity runs.

The ATLAS Muon trigger has successfully adapted to the changing environment in 2011 runs. The selection strategy has been optimized for the various physics analysis involving muons in the final state. This includes for example the combined trigger signatures with electron and jet trigger objects, and so-called full-scan triggers, which make use of the full event information to search for di-lepton signatures, seeded by single lepton objects.

The L1 muon trigger system gets its input from fast muon trigger detectors. Fast sector logic boards select muon candidates, which are passed via an interface board to the central trigger processor and then to the High Level Trigger (HLT). The Muon HLT is purely software based and encompasses a level 2 trigger followed by an event filter for a staged trigger approach. It has access to the data of the precision muon detectors and other detector elements to refine the muon hypothesis.

This presentation reports about efficiency, resolution, and general performance of the muon trigger in the 2011 runs and in the context of the physics goals of ATLAS.

## PS2-24: Multifunction-Timing Card ITTEV2 for CoDaC Systems of Wendelstein 7-X

J. Schacht<sup>1</sup>, J. Skodzik<sup>2</sup>

<sup>1</sup>CoDaC/Machine Control, Max-Planck-Institute for Plasmaphysics, Greifswald, Germany <sup>2</sup>Institute for Applied Microelectronic, University Rostock, Rostock, Germany

The timing system is a crucial element for the CoDaC (Control, Data Acquisition and Communication) system of the steady state fusion experiment Wendelstein 7-X (W7-X). Its main task is the synchronization of all clocks with sufficient accuracy. Furthermore, it is able to send, receive, and process event messages and to offer a wide range of time related functions, e.g., time capturing, pulse generation, realization of time delays, and sending and receiving of trigger signals. The overall timing system consists of a central timing system and a considerable number of local timing systems. Most of the technical systems like heating system, power supplies, gas inlet, and all diagnostic systems include a local timing system in a so called control station. Until now, there exist two different types of local timing systems: the local Trigger Time Event card (ITTEV1) for control stations with real time requirements and the local Time to Digital Converter card (TDC) for control stations used for data acquisition. Both card types have a standard parallel PCI or cPCI bus interface. A revision of the ITTEV1 and TDC cards is necessary as many components used for their fabrication are no longer available. Furthermore, the state-of-the-art bus interface is the serial PCIe bus. The need for a new bus interface with long term availability has led to the decision to use a GBit Ethernet interface. It will connect the new TTE card (ITTEV2), the successor of the ITTEV1 and TDC, with a host PC. Additionally, DDR3 memory is integrated to allow for the realization of high-resolution time capture processes. By choosing a more powerful FPGA device (Xilinx Virtex 6), it was possible to increase the time resolution by a factor of two. Starting with a short introduction of the W7-X timing system, this contribution describes the key properties, all extended as well as new features of the ITTEV2 card to face new requirements regarding data acquisition. The actual state of the development is given.

## **PS2-25: The ATLAS Jet Trigger**

M. Campanelli<sup>1</sup>, <u>L. Lopes</u><sup>2</sup> <sup>1</sup>*university college london, London, United Kingdom* <sup>2</sup>*Laboratorio de Instrumentacao e Fisica Experimental de Particulas (PT), Lisbon, Portugal* 

The ATLAS jet trigger system has a 3-level structure, and is based on the concept of Region Of Interest, where only regions of the detector around interesting Level-1 objects are reconstructed at the higher levels. This strategy is not well-suited for multi-jet events since it leads to pathologies and efficiency losses. This philosophy has been changed for the jet trigger during 2011, and we now have the possibility of

unpacking the full calorimeter at Event Filter. For 2012, full calorimeter unpacking will also be possible (for a small subset of the events) at an intermediate level between Level-1 and Level-2. We also moved to the use of calibrated scale at trigger level, and to the application of noise cuts to reduce rate spikes. We will present the performance of the jet trigger in 2011 and from the first runs of 2012.

### PS2-26: HAWC TeV Gamma Ray Observatory Trigger System

M. DuVernois

University of Wisconsin, Madison, WI, United States

The High Altitude Water Cherenkov (HAWC) experiment is currently under construction at 4100m above sea level in a valley between Sierra Negra and Orizaba near Puebla, Mexico. The experiment is intended as an all-sky TeV gamma ray observatory with a significant temporal and data overlap with the Fermi gamma ray satellite observatory. The detector array will consist of 300 water tanks instrumented with 1200 photomultiplier tubes sensitive to cosmic ray airshowers (background) and electromagnetic showers from primary gammas (signal). We present here the design, implementation, and performance of the FPGA-based digital trigger system for the HAWC experiment. It performs majority logic on 1200 channels of time-over-threshold (ToT) data and compares the number of tubes above threshold as a function of time over scales from 25ns to 1000ns. The trigger is implemented in Altera FPGAs along with data simulators for testing of the data acquisition readout which is centered on 1200 channels of multihit time to digital conversion.

## PS2-27: Development of the Control Card for the Digitizers of the Second Generation Electronics of AGATA

D. Barrientos<sup>1,2,3</sup>, V. Gonzalez<sup>3</sup>, M. Bellato<sup>2</sup>, A. Gadea<sup>1</sup>, D. Bazzacco<sup>2</sup>, J. M. Blasco<sup>3</sup>, D. Bortolato<sup>2</sup>, F. J. Egea<sup>1,3</sup>, R. Isocrate<sup>2</sup>, A. Pullia<sup>4</sup>, G. Rampazzo<sup>2</sup>, E. Sanchis<sup>3</sup>, A. Triossi<sup>2</sup>

<sup>1</sup>Instituto de Fisica Corpuscular (CSIC-UV), Valencia, Spain

<sup>2</sup>Istituto de Fisica Nucleare (INFN), Sezione di Padova, Padova, Italy

<sup>3</sup>Departamento Ingeniera Electronica, Universitat de Valencia, Valencia, Spain

<sup>4</sup>Istituto de Fisica Nucleare (INFN), Sezione di Milano, Milano, Italy

The Advanced GAmma Tracking Array (AGATA) is a last generation gamma-ray spectrometer composed of segmented High-Purity Germanium (HPGe) detectors that performs Pulse Shape Analysis (PSA) and gamma-ray tracking techniques in order to get high efficiency and resolution. For that purpose, an accurate determination of the energy, time and position of every interaction within the detector volume is required, which is implemented with a concurrent digitization at 100 Msamples/s of each 36-fold detector crystal of the array. For the present, a fully operational system for the electronics is currently acquiring data during the experimental campaigns. However, quick improvements in electronic devices make possible to redesign the system, preserving specifications, but gaining in compactness, power compsumption and costs. In this work, the novel control card for the digitizers boards of the system is presented. The unit is charged to communicate with the pre-processing electronics, through four optical links, and with four digitizer units, through a custom backplane. From the optical links, the unit receives the sampling clock from the Global Trigger and Synchronization (GTS) system. Another two bidirectional optical links are provided for latency measurements and slow control purposes. The aim of this board is to receive the clock, to clean it and to broadcast it with the same latency to four digitizer units. It has also to broadcast the signals for measuring the latency, as well as the slow control signals needed to control each digitizer unit.

In order to perform the tasks described previously, the card mounts a Spartan-6 Field Programmable Gate Array (FPGA), from Xilinx. Ethernet, mini-USB and SMB connectors have been added for the use of the card without the optical interface. The design and qualification processes for the card are presented in this work, including a detailed description of the design, simulation and performance tests.

# **PS2-28:** FPGA Implementation of the 32-Point DFT for a Wavelet Trigger of Cosmic Rays Experiments

#### Z. Szadkowski

Department of High Energy Astrophysics, University of Lodz, Lodz, Poland

For the observation of ultra high-energy cosmic rays (UHECRs) by the detection of their coherent radio emission an FPGA based wavelet trigger is being developed. Using radio detection, the electromagnetic part of an air shower in the atmosphere may be studied in detail, thus providing information complementary to that obtained by water Cherenkov detectors which are predominantly sensitive to the muonic content of an air shower at ground. For an extensive radio detector array, due to the limited communication data rate, a sophisticated self trigger is necessary. The wavelet trigger investigating online a power of signals is promising, however its implementation requires some optimizations. The digitized signals are converted from the time to frequency domain by a standard Altera library based FFT procedure, then multiplied by wavelet transforms and finally converted to the time-domain again. Altera FFT routines convert ADC data as blocks of 2N samples. FFT coefficients are provided in a serial stream in 2N time bins. An estimated signals power strongly depends on relatively positions of the FFT(data) and the wavelet transforms in a frequency domain. Additional procedure has to calculate a most efficient selection of the signal power could be estimated also in each clock cycle and additional tuning procedure would not be necessary. The paper describes an implementation of the 32-point FFT algorithm into Altera FPGA providing all 32 complex DFT coefficients for the wavelet trigger.

## PS2-29: Evolution and Performance of Electron and Photon Triggers in ATLAS in the Year 2011

A. Tricoli<sup>1</sup>, T. Kono<sup>2</sup>, <u>V. Solovyev<sup>3</sup></u> <sup>1</sup>CERN, Geneva, Switzerland <sup>2</sup>DESY, Hamburg, Germany <sup>3</sup>B.P. Konstantinov Petersburg Nuclear Physics Institute, Leningrad, Russia

The electron and photon triggers are among the most widely used triggers in ATLAS physics analyses.

In 2011, the increasing luminosity and pile-up conditions demanded higher and higher thresholds and the use of tighter and tighter selections for the electron triggers. Optimizations were performed at all three levels of the ATLAS trigger system. At the high-level trigger (HLT), many variables from the calorimeters and tracking detectors are used to achieve high efficiency and large rejection power. At L1, the thresholds were raised and optimised to account for \$\eta\$-dependence and hadronic isolation was implemented.

In addition to physics triggers, dedicated triggers for collecting a large number of control samples of J/psi->ee, W->enu and jet background, for calibration, efficiency and fake rate measurements were developed.

This contribution summarizes the algorithms and performance of ATLAS electron and photon triggers used in 2011 data taking,

#### PS2-30: Advanced Light Source Control System Upgrade Intelligent Local Controller Redesign E. Norum

Lawrence Berkeley National Laboratory, Berkeley, USA

As part of the control system upgrade at the Advanced Light Source the existing intelligent local controller (ILC) modules are being replaced. These modules provide real-time updates of control setpoints and monitored values. This paper describes the architecture and performance of the 'ILC Replacement Modules' which have been developed to take on the duties of the existing modules. The new modules use a 100BaseT network connection to communicate with the ALS Experimental Physics and Industrial Control System (EPICS) and are based on a commercial FPGA evaluation board running a microcontroller-like application.

The IRM application software is compiled to run directly on the MicroBlaze processor embedded in the FPGA with no intervening operating system code. This allows for rapid response to timer and network events. Performance testing shows that over 90% of timer interrupts are acknowledged within 20 microseconds and that the maximum response time is less than 180 microseconds. Setpoint update requests from the ALS EPICS control system are thus handled well within the 1 millisecond required response time. The effect of network load on response times is minimized by placing the IRMs and their controlling EPICS Input/Output Controller (IOC) on a private network segment. To further reduce the effects of network stack response on the transfer of data between the IRMs and IOC a simple UDP-based publish/subscribe protocol is used. The application software in the IRMs and IOC provide error detection and command retransmission rather than relying on the network stacks to provide this function. The result is a system that has been shown to meet the real-time response requirements of the instruments controlled by the IRMs.

Each IRM provides four analog inputs and four analog outputs. All have a range of 10V and a resolution of 16 bits. The analog gain and offset calibration factors for each channel are stored in on-board flash memory. This allows modules to be swapped at any time without the need for recalibration or of operator intervention. A front panel OLED display provides local indication of analog and digital I/O values.

Approximately 125 IRMs will be installed once the upgrade is complete. This will require the addition of three network switches to connect the IRMs to the EPICS IOC. The IRMs will communicate with the switches of copper (100BaseT) network connections. The link between the switches and the IOC will be made with fiber or copper gigabit network links.

This paper presents results of timing and throughput tests of a prototype module as well as a detailed description of the hardware and software design.

## : Design Considerations of Ad-Hoc Wireless Building Radiation Monitoring Network for Nuclear Accident Emergency Response Applications

H.-H. Tseng<sup>1</sup>, H.-I. Lin<sup>2</sup>, T.-P. Wang<sup>3</sup>, T.-C. Hung<sup>4</sup>

<sup>1</sup>Nuclear Instrument Division, Institute of Nuclear Energy Research, Longtan, Taoyuan, Taiwan

<sup>2</sup>Graduate Institute of Automation Technology, National Taipei University of Technology, Taipei, Taiwan

<sup>3</sup>Graduate Institute of Computer and Communication Engineering, National Taipei University of Technology, Taipei, Taiwan <sup>4</sup>Department of Mechanical Engineering, National Taipei University of Technology, Taipei, Taiwan

Emergency response to complete (AC/DC) blackout of Nuclear Power Plant like what happened to Fukushima accident requires a batterypowered ad-hoc wireless personal area networks (WPANs) inside reactor buildings for precise accident data tracking and workers activity and radiation exposure control. Its vital to decision-making from options and assess the current/future risks potentials and remedial actions. ZigBee wireless technology and its underlying IEEE 802.15.4 standard is an attractive low-cost, low-power solution to fulfill above applications that require self-healing, fast and easy deployment, low data rate, long battery life, and secure networking.

In this paper, a prototype system with real-time location capability is designed on Radio-Pulse ZigBee Single Chip MG2455. It is a System-on-Chip combines phenomenal performance RF transceiver, enhanced 8-bit 8051 MCU with internal 96KB of FLASH and 8KB of RAM for user application program and data, hardwired MAC and 128-bit AES-based encryption security keys.

The Location Engine implements a distributed computation algorithm that uses received signal strength indicator (RSSI) values from known reference nodes which are made up of ZigBee Coordinators (ZC) with known coordinates. Other nodes are mobile radiation monitor called blind nodes or ZigBee End Device (ZED) which contains radiation detector and just enough functionality to talk to the ZC, whose coordinates need to be estimated.

It is demonstrated that by integrating radiation detector with ZigBee real-time location system (RTLS), both the real time radiation exposure as well as precise room number based location of monitor can be achieved using multiple reference nodes with location range of 40 room 8 floor building.

#### DAQ1: Data Acquisition 1 / Medical Imaging

Wednesday, June 13 08:30-10:40 Crystal Ballroom DAQ1-1: The Trend of Data Path Structures for Data Acquisition Systems (DAQ) in Positron Emission Tomography (PET) Systems E. Kim<sup>1,2,3</sup>, P. D. Olcott<sup>4,3</sup>, K. J. Hong<sup>1,2,3</sup>, J. Y. Yeom<sup>1,2,3</sup>, C. S. Levin<sup>1,2,4,5,3</sup> <sup>1</sup>Radiology, Stanford University, Stanford, USA <sup>2</sup>Electrical Engineering, Stanford University, Stanford, USA <sup>3</sup>Molecular Imaging Program at Stanford, Stanford University, Stanford, USA <sup>4</sup>Bio Engineering, Stanford University, Stanford, USA <sup>5</sup>Physics, Stanford University, Stanford, USA

Data extraction of photon detector signals is an important task in PET systems. The data path structures of DAQ systems in PET have been changing with technological advances in communication devices and photon-detecting devices. New requirements due to advanced features of PET such as time-of-flight (ToF) capability or magnetic resonance imaging (MRI) compatibility have also been driving further changes. In this talk, I will describe the current trends in the data path structures for PET DAQ systems, which handle the PET signal processing chain from detector modules to the data acquisition PC for real-time data extraction. The design of data path structures are dependent on the detector modules, which may be built using different types of detectors such as photo-multiplier tube (PMT), avalanche photo-diode (APD), or silicon photo-multiplier (SiPM) with different readout multiplexing schemes. Further, we will discuss how advances in communication devices, such as high-speed serial buses and optical transceivers, can be utilized for interconnections within the DAQ system. We will also describe how new PET systems control of incoming block detectors and jitter between synchronized DAQ boards must be minimized. In order to make a PET system compatible to work simultaneously inside MRI, we will describe changes to the photon detection devices and the DAQ system to minimize interference with an MRI system. Different multiplexing schemes and cabling are needed to connect the detector modules and the DAQ system.

# DAQ1-2: Design of a real-time FPGA-based DAQ architecture for the LabPET II, an APD-based Scanner dedicated to small animal PET imaging

L. Njejimana<sup>1</sup>, M.-A. Tetrault<sup>1</sup>, L. Arpin<sup>1</sup>, A. Burghgraeve<sup>1</sup>, P. Maille<sup>1</sup>, J.-C. Lavoie<sup>1</sup>, C. Paulin<sup>1</sup>, K. C. Koua<sup>1</sup>, H. Bouziri<sup>1</sup>, S. Panier<sup>1</sup>, M. W. Ben Attouch<sup>1</sup>, M. Abidi<sup>1</sup>, J.-F. Pratte<sup>1</sup>, R. Lecomte<sup>2</sup>, R. Fontaine<sup>1</sup>

<sup>1</sup>Department of Electrical and Computer Engineering, Universite de Sherbrooke, Sherbrooke, Quebec, Canada <sup>2</sup>Department of Nuclear Medicine and Radiobiology, Universite de Sherbrooke, Sherbrooke, Ouebec, Canada

A 64-channel mixed-signal Application Specified Integrated Circuit (ASIC) has recently been designed to extract in real time data from the LabPET II detector modules developed to achieve submillimetric spatial resolution. Each detection block consists of 2 arrays of 4 x 8 avalanche photodiodes (APD) individually coupled with 8 x 8 scintillators array, to form 64 independent and parallel DAQ channels. The ASIC is expected to receive 3000 PET events/sec per channel. A real-time FPGA-based digital DAQ system has been designed to interface with the ASICs and allows events harvesting, processing and transmission to a distant computer for image reconstruction. Real-time events processing embedded in the DAQ includes energy calculation using a time-over-threshold (TOT) conversion scheme, a timing correction which is function of the energy value and a sorting tree. A real time coincidence engine analyzes events and only keeps relevant information to minimize data throughput and post-acquisition data processing. The architecture consists of 3 layers of FPGA-based electronics wired through gigabit links: a Front-End board that extracts timing and energy along with a pixel address, a Hub board that sorts events chronologically, and a coincidence board that copes with random estimation as well as coincident events. The real-time digital architecture can sustain a throughput of ~111M events per second for a ~37000 channels scanner configuration.

#### DAQ1-3: A Building Block for Nuclear Medicine Imaging Systems Data Acquisition

<u>T. K. Lewellen<sup>1</sup>, R. S. Miyaoka<sup>1</sup>, D. DeWitt<sup>1</sup>, S. Hauck<sup>2</sup></u> <sup>1</sup>*Radiology, University of Washington, Seattle, United States* 

<sup>2</sup>Electrical Engineering, University of Washington, Seattle, United States

Developing new detector designs for PET and SPECT imaging systems often leads to problems in adapting existing data acquisition electronics to the requirements for the new devices. In our own laboratory, we have different detector designs for several applications and require new acquisitions electronics to support them. After considering both commercial and open source alternatives, we determined that we needed more computational power at the detector interface to support our needs. We had previously developed a system based on FireWire (developed for the MiCES pre-clinical scanner), and based our new system on lessons learned working with that scanner. The new system is built around a digital processing board (termed the Phase II board) that takes on many roles for different acquisition topologies. The core design approach was to move as much pulse processing as we could into a large FPGA on the Phase II board and keep the analog electronics to a minimum. Algorithms we have implemented in the FPGA include: 1) basic pulse integration with pile-up correction; 2) timing by comparing a pulse to a high resolution reference pulse (and tools to generate the reference pulse using the relatively low speed ADCs); 3) baseline restoration; and 4) a statistical estimator for determining the location (x,y,z) of an event in a monolithic crystal.

In our implementation, a basic system has a analog interface board that converts the detector signals to differential analog signals to be processed by the Phase II board. The Phase II board currently has 64 channels of 65 MHz ADCs and one ADC running greater than 300 MHz. Any or all of the ADC pads can be converted with ajumper board to route differential serial signals from devices like digital SiPMs to serial receivers in the FPGA. The card also includes four general purposes SPI bus connectors for control of system components and low speed messaging between cards as well as a pair of general-purpose daughter board connectors to add functionality to the card as needed. The FPGA family (Alteria) that we selected also allows the use of several different capacity FPGAs that have the same pin layout to be used for different applications without having to change the board design. To support different acquisition system topologies, the cards can be configured as master controllers (including sending start/stop commands, any mechanical motion control, and local coincidence processing); as local acquisition nodes that connect other Phase II cards to the communication network (each card can support four other cards in a star topology); or

as a basic acquisition node either stand alone or as part of a network. The original design assumed FireWire as the network bus to pass data/commands between the host and the Phase II cards. The next revision of the card will support USB 2.0/3.0 and we are evaluating a general purpose I/O adapter board to support a range of network bus technologies.

#### DAQ1-4: 3D Ultrasound Computer Tomography for Breast Cancer Diagnosis

M. Balzer, M. Birk, R. Dapp, A. Menshikov, M. Zapf, H. Gemmeke, N. Ruiter *IPE, KIT, Karlsruhe, Germany* 

Breast cancer is the most common type of cancer among women in Europe and North America. Unfortunately, breast cancer is frequently diagnosed after metastases have developed. A sensitive and reliable imaging method enabling early detection could enhance the survival probability of the women substantially. Ultrasound computer tomography (USCT) is a promising candidate for sensitive imaging of breast cancer. A clinical study is planed in this year.

We built a worldwide unique 3D USCT prototype realizing the benefits of a full 3D system for the first time.

The 3D USCT apparatus has a semi-ellipsoidal aperture with 628 emitters and 1413 receivers. The in-house developed and manufactured transducers are mounted on a semi-ellipsoidal transducer holder and are grouped into transducer arrays with embedded amplifiers and emitter electronics. The transducer holder can be rotated and translated to increase the number of virtual transducers, improving the reconstructed images. Data acquisition is carried out with 480 parallel channels digitizing at 20 MHz with 12 bit resolution. The massively parallel acquisition system includes 80 Field Programmable Gate Arrays (FPGA) for data acquiring, processing and storage. One total scan acquires 3.5 million A-scans with 20 GByte of raw data for one breast volume. The duration of these fully automatic scans at 4 aperture positions is 40 seconds.

We are working on two reconstruction approaches. The reconstruction with transmission signals and compressing sampling algorithm and the Synthetic Aperture Focusing Technique (SAFT) using reflectivity signals. The favorite SAFT algorithm enables position resolution better than 0.2 mm. The complexity of the image reconstruction is mainly defined by an inversion of a 3.5x10<sup>-6</sup>x1024<sup>-3</sup> matrix. So far we succeded to reduce reconstruction time from one month to one day. We now aim for a further reduction to 30 minutes. The use of massive parallel working FPGA and GPU is a promising approach. First results will be presented.

# DAQ1-5: FPGA-Based Multi-Channel DAQ Systems with External PCI Express Link to GPU Compute Servers

<u>T. Bergmann<sup>1</sup></u>, D. Bormann<sup>1</sup>, M. A. Howe<sup>2</sup>, M. Kleifges<sup>1</sup>, A. Kopmann<sup>1</sup>, N. Kunka<sup>1</sup>, A. Menshikov<sup>1</sup>, D. Tcherniakhovski<sup>1</sup> <sup>1</sup>Karlsruhe Institute of Technology, Karlsruhe, Germany

<sup>2</sup>University of North Carolina, Chapel Hill, North Carolina, USA

In this presentation we describe a new approach to integrate GPU computing into a well established data acquisition (DAQ) system. Beneath technical details we present first results of this new concept and discuss the expanded field of applications we intend to cover by the new architecture.

DAQ systems are a key technology of scientific research. At KIT DAQ systems have been developed for many years mainly for large physics experiments like the KATRIN neutrino experiment or the Auger cosmic ray observatory. These experiments required DAQ systems with high sampling rate (12 bit ADCs @10-40 MHz, up to 480 channels per system), one or two levels of trigger algorithms (running on up to 81 programmable FPGAs) detecting the physical relevant events and a programmable event builder facility (embedded Linux-System on computer module) connected by PCI bus to the FPGAs and ethernet link to external DAQ computer. Typically the data output of the trigger system is below 1 MB/s. The DAQ data chain roughly consists of ADCs, realtime two-stage hardware trigger, event builder, storage and offline analysis. Graphic processing units (GPUs) provide several hundred computing processors and can speed up calculations by parallel computing most efficiently on vector shaped data like detector data time series. Nowadays packing several GPUs in a standard PC yields a high performance compute server at affordable costs. A couple of efficient real-time monitoring systems for scientific applications using GPUs have been developed by KIT in the last years.

We added an additional step to the DAQ data chain and utilize high performance computing for a third level trigger or online analysis. To exploit the GPU computing power it was essential to speed up the data throughput and provide a fast data link to the external PC. To achieve this we developed an adapter board with a PCI to PCI Express (PCIe) bridge that replaces the existing computer module. The adapter connects an external PC directly to the DAQ system with a one PCIe lane cable. PCIe is software compatible to PCI so we can use the same Linux driver for the external PC and the internal PrPMC module.

The new adapter increases the flexibility of the DAQ setup. Depending on the application simple embedded processors with fast or gigabit ethernet can be selected. With the high-bandwidth PCIe readout adapter advanced trigger systems and/or analysis steps can be added using powerful compute servers. Both programmable DAQ electronics and powerful compute servers build a flexible platform for future experiments.

# DAQ1-6: Field-Programmable Gate Array (FPGA) Firmware for the Fermilab E906 (SeaQuest) Trigger

J. Wu<sup>1</sup>, S.-H. Shiu<sup>2</sup>

<sup>1</sup>Particle Physics Division, FNAL, Batavia, IL <sup>2</sup>Institute of Physics, Academia Sinica, Taipei, Taiwan

Scintillating hodoscopes trigger firmware in a field-programmable gate array (FPGA) has been implemented in a commercially-off-the-shelf 6U VMEbus module for the Fermilab E906 (SeaQuest) experiment. The FPGA receives up to 96 channels inputs and digitizes the leading edge times at 1 ns (LSB) resolution using time-to-digital converter (TDC) blocks in the firmware. Digital processes on outputs of the TDC include adjusting channel delays individually in 1 ns steps, setting coincidence range and re-align with accelerator bucket clock. The re-aligned hits are further processed in trigger matrices. E906 uses four different stations of scintillating hodoscopes and various 3-out-of-4 (or 4-out-of-4) majority coincidence logics are used to generate valid track information as trigger primitives to form global trigger. Zero-suppressed TDC data are read out for each event and the module can be used as a 96-channel TDC when the trigger matrices are disregarded. Requirements of large channel count, fast digital processing and limits on number of logic elements and memory bits available, power supply

capability (which restricts using high clock frequency in only a small portion of the FPGA) force us to design the firmware with extra cares. Experiences, design practices and techniques are discussed in this paper for firmware projects facing similar challenges.

#### UPG2: Upgrades 2

## Wednesday, June 13 11:05-12:05 Crystal Ballroom UPG2-1: The Generic Evaluation Tool for the LHCb Event Builder Network Upgrade G. Liu, N. Neufeld

CERN, Geneva, Switzerland

LHCb has proposed an upgrade to allow operate at higher luminosity. In order to improve the trigger efficiencies, all sub-detectors will be readout at 40 MHz and data will be sent to a large processing farm which performs event building and filtering. The builder network will need to provide an aggregate bandwidth about 32 Tb/s.

To build such a large network, evaluations on different technologies are important for the network design. An generic test tool and methodology is needed to compare the performances. In this paper, we will describe our event builder network benchmarking tool. This tool consists of several parts: core unit, device readout unit, transport unit and measurement unit. The main services of the core unit are information dispatching, exception handling and etc. The device readout unit provides the access to the network, new protocol or device can be supported by implementing new plug-in. The measurement unit generates variant traffic patterns and measures the performance.

Several architectures and switch technologies have been considered for the event builder network. In our lab, we have setup a small test environment for 10 Gb Ethernet and Infiniband with 8 servers. The results of performance tests using this tool will be outlined.

## UPG2-2: Upgrade Project and Plans for the ATLAS Detector and Trigger

F. Pastore, R. Vari

Physics, Royal Holloway University of London, London, United Kingdom

In the coming years different phases of upgrades for the LHC complex are foreseen, which will allow to extend the physics potential of its experiments. Through two different phases (namely phase-1 and phase-2), the average luminosity will be increased by a factor 5-10 above the design luminosity. Consequently, the detectors and the infrastructure of the DAQ system of the experiments will need to be upgraded as well, to take into account the increased radiation level and particle rates foreseen at such high luminosity.

In this paper we describe the changes to the ATLAS detector and its trigger system, to face the increased number of interactions per collisions. This will cause higher level of pile-up and increased rates at each level of the trigger. The trigger detectors will improve their selectivity by benefiting from the increased granularity available at the trigger level, which will allow for a higher resolution. The use of the tracking system in the lower levels of the trigger selection is also discussed. It is foresen that the second level trigger will be helped by a new Fast Tracking. The addition of tracking information at the first trigger level during the LHC upgrade phase-2 is currently under discussion. Different scenarios are compared, having in mind the requirements to achieve the expected physics potential of ATLAS in this high luminosity regime.

#### UPG2-3: Associative Memories for L1 Track Triggering in LHC Environment

D. Magalotti<sup>1</sup>, E. Pedreschi<sup>2</sup>, A. Annovi<sup>3</sup>, P. Giannetti<sup>2</sup>, M. Piendibene<sup>2,4</sup>, G. Broccolo<sup>5</sup>, F. Palla<sup>2,5</sup>, R. Dell'Orso<sup>2</sup>, F. Ligabue<sup>5</sup>, S. Taroni<sup>1,6</sup>, L. Servoli<sup>1</sup>, A. Nappi<sup>1,6</sup>

<sup>1</sup>NFN, Ferugia, Italy <sup>2</sup>INFN, Pisa, Italy <sup>4</sup>Universit di Pisa, Pisa, Italy <sup>5</sup>Scuola Normale Superiore, Pisa, Italy

<sup>6</sup>Universita' degli studi di Perugia, Perugia, Italy

Modern high energy physics experiments search for extremely rare processes hidden in much larger background levels. Experience at high luminosity hadron collider experiments shows that controlling trigger rates can be extremely challenging as the luminosity increases, physics goals change in response to new discoveries, and the detector ages. It is thus essential that the trigger system be flexible and robust with redundancy and significant operating margin. This has certainly been the case in the CDF experiment where the Silicon Vertex Trigger (SVT) has significantly extended the experiments physics capability. Tracking information enhances the trigger rejection capabilities while retaining high efficiency for interesting physics events. The design of a tracking based trigger for the High Luminosity LHC (HL-LHC) is an extremely challenging task, and requires the identification of high-momentum particle tracks as a part of the Level 1 Trigger. Simulation studies show that this can be achieved by correlating hits on two closely spaced silicon strip sensors, and reconstructing tracks at Level 1 by employing an Associative Memory approach. Associative Memories compare the tracker informations of each event to pre-calculated "expectations" (pattern matching) in a very short time and contribute to the trigger decision. This is done in several trigger sectors, in parallel, thus reducing the execution time and remain within the L1 latency.

In this contribution we describe a first test of using the state of the art hardware to use the simulated information coming from the CMS upgraded tracker for reconstruction of tracks at L1. The existing hardware has been developed for other experiments: the AMBslim mother board consisting of 4 smaller boards, the Local Associative Memory Banks (LAMB), each hosting 32 AM chips to contain the stored patterns with the readout logic. However, despite the encouraging results, the ability of a single AMBslim (even with newer AM chips) to process a single event is much less than the amount of input data foreseen for the CMS case, and the latency strongly depends on the time necessary to load the data in the AM system and to process a single event. One possible solution is to parallelize the event processing inside the AMBslim board assigning each event to one LAMB. We describe the firmware implementation of this concept in the current hardware, the obtained results, and a possible modification of the LAMB hardware in order to obtain the minimum latency time for processing events. Finally we propose to use a AM system as coprocessor for the offline reconstruction of events: the AM is able to provide the CPUs (or GPUs) with powerful seeds (roads) for a local high resolution track finding.

#### MO4: Mini-orals 4

## Wednesday, June 13 12:05-12:25 Crystal Ballroom PS3-1: Readout Electronics and Data Acquisition of a Time of Flight Detector for Positron Emission Tomography

J. Y. Yeom<sup>1</sup>, V. Španoudaki<sup>1</sup>, K. J. Hong<sup>1</sup>, C. S. Levin<sup>2,3</sup>

<sup>1</sup>Molecular Imaging Program at Stanford, Department of Radiology, Stanford University, Stanford, CA, United States <sup>2</sup>Department of Physics, Stanford University, Stanford, CA, United States <sup>3</sup>Department of Electrical Engineering, Stanford University, Stanford, CA, United States

Time-of-Flight (ToF) information in Positron Emission Tomography (PET) can contribute to a significant improvement in the reconstructed image signal to noise ratio, enabling image contrast improvement, a reduction in patient radiation dose, and/or shorter scan times. We have recently developed a multi-element SiPM (Silicon photomultiplier) based block detector module for ToF PET. In this study, the detector, readout electronics and data acquisition are described, and a preliminary characterization of the detector module is presented. The detector module is based on a 4 x 4 array of LYSO-SiPM elements (Hamamatsu MPPC S10931-050P) read out by individual wideband RF amplifier to maximize timing performance. To preserve the fast signal waveform of the detector and extract relevant information from the data, each element is digitized with a channel of the high speed CAEN V1742 (32 channels, 5 GHz sampling, 12-bit amplitude resolution) waveform digitizer. As the digitizer is unable to trigger on itself, a trigger board to output a fast pulse that triggers the digitizer whenever any pixel of the detector detects a signal has also been fabricated.

To assess the performance of one of the modules, a 4 x 4 LYSO scintillator array (3 x 3 x 5 mm<sup>3</sup> elements) was coupled with optical grease to the photodetectors and energy resolution measurements were performed using a Ge-68 source. The energy spectra for each channel acquired and the photopeak resolution versus overvoltage has been measured. The energy resolution, not corrected for non-linearity effects, varied from 14.0 + 0.8 % to 7.7 + 1.6 % for overvoltage range from 0.8 V to 1.6 V. Results from one channel have been compared for the case of a high speed oscilloscope and the CAEN digitizer. The largest variation in energy resolution is 4.7 % between those two cases. We will present results for the timing resolution of the detector module used in conjunction with the CAEN V1742 digitizer.

### PS3-3: Design of the Trigger Interface and Distribution Board for CEBAF 12 GeV Upgrade

W. Gu, D. Abbott, C. Cuevas, G. Heyes, E. Jastrzembski, B. Moffit, B. Raydo, J. Wilson, H. Dong, S. Kaneta, N. Nganga, C. Timmer, V. Gyurjyan

Physics, Jefferson Lab, Newport News, Virginia, United States

The design of the Trigger Interface and Distribution (TID) board for the 12 GeV upgrade at the Continues Electron Beam Accelerator Facility (CEBAF) in TJNAL is described. The TID board distributes a low jitter system clock, synchronized trigger, and synchronized multi-purpose SYNC signal. The TID also initiates data acquisition for the crate. With the TID boards, a multi-crate system can be setup for experiment test and commissioning. The TID board can be selectively populated as a Trigger Interface (TI) board, or a Trigger Distribution (TD) board for the experiments. When the TID is populated as a TI, it can be located in the VXS rate and distribute the CLOCK/TRIGGER/SYNC through the VXS P0 connector; it can also be located in the standard VME64 crate, and distribute the CLOCK/TRIGGER/SYNC through the VME P2 connector or front panel. It initiates the data acquisition for the front crate where the TI is positioned in. When the TID is populated as a TD, it fans out the CLOCK/TRIGGER/SYNC from trigger supervisor to the front erates through optical fibres. The TD monitors the trigger processing on the TIs, and gives feedback to the TS for trigger flow control. Field Programmable Gate Arrays (FPGA) is utilised on TID board to provide programmability. The TID boards were intensively tested on the bench, and various setups.

### PS3-14: The Readout Electronics of the Micromegas-Based Large Time Projection Chamber Prototype for the International Linear Collider

D. Calvet, D. Attie, D. Besin, P. Colas, R. Joannes, A. Le Coguie, S. Lhenoret, I. Mandjavidze, M. Riallot, W. Wang, E. Zonca CEA-IRFU, Saclay, France

This works presents the design, implementation and test of prototype modules of a Time Projection Chamber based on Micromegas amplification technology which was built in view of the future International Linear Collider. The main goals of this development are to investigate the performance of the detector and to demonstrate the feasibility of extremely compact and low power readout electronics. We based the front-end electronics on the AFTER chip, a 72-channel ASIC originally built for the T2K experiment, and devised new hardware, mechanics and cooling to read out the 1728 channels of a detector module while staying confined in the available area of ~220 cm2. Using a multi-board layer structure with high density solderless connectors, ASIC die wire-bounding and other space-saving techniques, we reach a density of ~8 channels per square centimetres. The thickness of the readout electronics for one module is around 4 cm. The digital part of a module is based on a Xilinx Virtex-5 FPGA which interfaces to the 24 AFTER chips used for the readout. It receives the data digitized from the front-end by a 6-channel ADC, temporary buffers data, applies zero-suppression and transfers event data to a remote data concentrator over a 2 Gbit/s optical link. To simplify design and reduce the overall cost, we use a commercial Virtex-5 FPGA characterization platform customized with specific add-ons to build a 12-optical ports data concentrator. Software running on the embedded PowerPC processor of the FPGA performs front-end configuration, event data gathering and transfer to the data acquisition PC over a standard Gigabit Ethernet link. We present the operation of a vertical slice of the complete detector and readout system along with results obtained in a test beam. In particular, our tests show that using a resistive layer in the Micromegas detector improves tracking resolution and allows operation without the spark protection circuit normally required on each readout channel. This simplifies design and is more compact. Finally, we explain how this setup is now being scaled-up to build a 7-module detector prototype which is the final goal of this R&D program.

## PS3-18: Design of an Optical Uplink with 10GBit/s Link Between PCIe and MicroTCA

H. Kleines, P. Wstner, A. Ackens, M. Drochner, P. Kmmerling, S. van Waasen, M. Ramm ZEL, Forschungszentrum Jülich, Juelich, Germany

In the context of developments for the PANDA detector system an optical Uplink from MicroTCA to PCIe is being designed. The Link is based on X2 transceivers with a nominal speed of 10 GBit/s. The PCIe board has already been produced and it is currently under test. It is based on a Xilinx Virtex 5 (XC5VLX30T) FPGA. For the implementation of the XAUI interface to the X2 transceiver a PM8358 with a parallel interface to the FPGA is used. The corresponding AMC module, which is under development, is based on same components. Open issues regarding the implementation of the PCIe root complex functionality on this module will be discussed.

#### **PS3-21: Development of an AMC Module MMC**

<u>P. Kaemmerling</u>, M. Drochner, H. Kleines, S. van Waasen, M. Ramm, A. Ackens ZEL. Forschungszentrum Juelich. Juelich. Germany

The MMC (Module Management Controller) of an AMC module communicates with the CMC (Carrier Management Controller) and the ShMC (Shelf Management Controller). It handles the xTCA-FRU hardware signals, and negotiates inventory, power load, diagnosis, receives IPMI-commands and sends state and sensor data. We designed an AMC PCB and used a newly introduced PIC32MX460 with a MIPS32 125DMIPS microcontroller for the MMC. Together with the new MPLAB X gcc-toolchain we chose the open source software coreIPM / coreBMC as starting point and adopted it to PIC32 and our board hardware. We experienced a very dynamic deployment of IDE-, compiler-, library- and example-versions of the PIC32 family. We extracted hardware-related code to an extra library and implemented some extensions like a more reliable i2c-stack for the PIC32.

## PS4-10: New strategy for the control of low frequency large band mechanical suspensions and inertial platforms

F. Barone<sup>1,2</sup>, F. Acernese<sup>1,2</sup>, R. De Rosa<sup>3,2</sup>, G. Giordano<sup>1</sup>, R. Romano<sup>1,2</sup> <sup>1</sup>Dept. Scienze Farmaceutiche e Biomediche, Universita'di Salerno, Fisciano (Salerno), Italy <sup>2</sup>Sezione di Npoli, Isitiuto Nazionale di Fisica Nucleare, Napoli, Italy <sup>3</sup>Dept. Scienze Fisiche, Universita' di Napoli Federico II, Napoli, Italy

Low frequency seismic suspensions (attenuators) and inertial platforms require a careful design not only of the mechanical attenuation stages but also of the control system, especially if a residual horizontal motion better than 10-15 m/sqrt(Hz) in the band 0.01 - 100 Hz is a requirement. One of the most important element of the control system is the tipology of the sensors, whose accuracy, stability, sensitivity and band may constitute a real limitation for the improvement of their performances, especially if very large seismic attenuations are required in the low frequency band. In particular, the present most effective control systems, based on accelerometric sensors (force feed-back configuraton), are mainly limited by the sensor electronics. To try improve the performances of low frequency suspensions (attenuators) and inertial platforms, we introduced a new control philosophy: the control system directly acquires the instantaneous relative positions of the mechanical components through monolithic folded pendulum sensors without any force feed-back (seismometer configuration). In the paper we discuss this new control architecture and the results of the tests on a state-of-the-art mechanical suspension.

## PS3-39: SEUs Tolerance in FPGAs Based Digital LLRF System for XFEL

M. K. Grecki

MSK, DESY, Hamburg, Hamburg, Germany

The rapidly developing semiconductor technology allows to implement sophisticated digital control in the programmable devices platforms (FPGAs, CPUs). However the increasing size and performance of the circuits has also a drawback at the failure sensitivity, in particular for soft errors due to ionizing radiation. The sensitivity to SEUs is related to the critical charge which strongly depends on the transistor dimensions and supplying voltage. The sensitivity to ionizing radiation increases faster than the circuits complexity due to Moore's law. Therefore the life critical systems and systems operating in radioactive environment have to deal with soft errors. The countermeasure can be special design techniques introducing the redundancy to the algorithms and/or circuit design allowing to detect and correct errors. Recently also the semiconductor manufactures provides some tools to the designers to help them to fight for highest reliability of their designs. The system designer can use of these tools but he/she is not limited to that. Even on the algorithm and implementation levels there are possibility to apply general or customized countermeasures against failures. But this is sometimes costly and/or induces performance limitations. The goal is to find the compromise between cost, performance and reliability. The LLRF control system for XFEL will use sophisticated digital systems based on FPGAs and DSPs. It will be installed in the close proximity to the accelerator pipe since the accelerator is constructed using the single channel concept. Therefore electronic circuits will be exposed to gamma and neutron radiation. The electronics will be built using normal COTS components therefore normal radiation resistivity is expected. The racks with electronic systems will be partially shielded against radiation but moderate radiation level will be present during machine operation. Therefore the soft errors are expected and must be taken into account. In order to evaluate the possible consequences of the radiation to the LLRF control the experiments at FLASH accelerator have been performed. The paper presents some techniques used to improve the tolerance against SEUs applied in the LLRF system and presents results of experiments performed at FLASH accelerator tunnel. The cost and efficiency of these methods (smart algorithms, spatial and time redundancy etc.) are also discussed.

## PS3-41: Multiple Register Synchronization with a High-Speed Serial Link Using the Aurora Protocol

D. Barrientos<sup>1,2,3</sup>, V. Gonzalez<sup>3</sup>, M. Bellato<sup>2</sup>, A. Gadea<sup>1</sup>, D. Bazzacco<sup>2</sup>, J. M. Blasco<sup>3</sup>, D. Bortolato<sup>2</sup>, F. J. Egea<sup>1,3</sup>, R. Isocrate<sup>2</sup>, A. Pullia<sup>4</sup>, G. Rampazzo<sup>2</sup>, E. Sanchis<sup>3</sup>, A. Triossi<sup>2</sup>

<sup>1</sup>Instituto de Fisica Corpuscular (CSIC-UV), Valencia, Spain

<sup>2</sup>Istituto de Fisica Nucleare (INFN), Sezione di Padova, Padova, Italy

<sup>3</sup>Departamento Ingeniera Electronica, Universitat de Valencia, Valencia, Spain

<sup>4</sup>Istituto de Fisica Nucleare (INFN), Sezione di Milano, Milano, Italy

The synchronization of general purpose registers in the framework of distributed hardware systems becomes essential when the amount of programmable integrated circuits raises dramatically. In this work, we propose the synchronization of user-controlled registers between two Field Programmable Gate Arrays (FPGAs) through a high-speed serial link at 2.5 Gbps using the Aurora protocol. Aurora is an open, lightweight and scalable protocol, that performs a 8B/10B codification, flow control, clock correction, etc. On top of that, a set of VHDL modules manages the synchronization between a variable number of registers, whose length is also variable, and can be configured before the synthesis of the code. Thus, a final hardware core is provided, allowing the user to implement a specific register configuration up to 254 registers with 8-bit width.

The development and validation of the code has included a simulation process for each developed module and several hardware testbenches, using Virtex-5 and Virtex-6 FPGAs from Xilinx. In addition, Bit Error Ratio (BER) tests for the whole firmware and hardware system have been performed. From those tests, some characteristics of the core have been quantified, such as the maximum frequency for updating the registers as a function of the number of synchronized registers, and the latency of the link from the local to the remote user interface. For that purpose, a generator of pseudo-random values using a Linear Feedback Shift Register (LFSR) in both FPGAs has been used, allowing to measure the BER for the whole setup.

The work has been developed on a general basis, in order to make it fully compatible with several possible implementations. However, it has also been validated for the first use that was conceived: the slow control system in the second generation of electronics for the Advanced GAmma Tracking Array (AGATA). Furthermore, the firmware has been included as a peripheral in a microprocessor embedded in one of the FPGAs, while its partner was linked to a serial 2-wire bus. The registers in the created peripheral have been programmed with an application software layer, written in C code, using bit-banging techniques. The modularity of the C code also provides the possibility of encapsulating the serial protocols, providing, to the high-level user, read and write functions in a fully transparent way.

Distributed slow control systems managing remote devices, using bit-banging techniques, or register-dependent protocols could take advantage of the versatility of the developed core without the need of embedding a microprocessor for that purpose. Furthermore, the low resource utilization and the small user interface makes it easily portable and usable into a custom user application.

#### PS3-42: Graphical User Interface for Serial Protocols Through a USB Link

D. Barrientos<sup>1,2,3</sup>, V. Gonzalez<sup>3</sup>, M. Bellato<sup>2</sup>, A. Gadea<sup>1</sup>, D. Bazzacco<sup>2</sup>, J. M. Blasco<sup>3</sup>, D. Bortolato<sup>2</sup>, F. J. Egea<sup>1,3</sup>, R. Isocrate<sup>2</sup>, A. Pullia<sup>4</sup>, G. Rampazzo<sup>2</sup>, E. Sanchis<sup>3</sup>, A. Triossi<sup>2</sup>

<sup>1</sup>Instituto de Fisica Corpuscular (CSIC-UV), Valencia, Spain

<sup>2</sup>Istituto de Fisica Nucleare (INFN), Sezione di Padova, Padova, Italy

<sup>3</sup>Departamento Ingeniera Electronica, Universitat de Valencia, Valencia, Spain

<sup>4</sup>Istituto de Fisica Nucleare (INFN), Sezione di Milano, Milano, Italy

Within the last decade, the trend towards smaller systems with greater functionality in the electronics community has been widely accepted. This fact has led integrated circuit designers into levels of integration and complexity barely imagined a few years ago. However, the price paid has been the increased number of integrated circuits in the boards. In addition, as the physical space has usually been reduced, most of the configuration interfaces for these circuits are performed with 2-wire or 3-wire serial links. Under these circumstances, the qualification of complex board prototypes becomes a hard task when different protocol and computer interfaces are needed.

In this work, we have developed a Graphical User Interface (GUI), for Windows operating systems, that provides an intuitive and fully transparent way to interact with several devices. The software has been developed using Dynamic-link libraries (DLL), linked at run-time, encapsulating the hardware USB interface and three implemented protocols. The GUI is composed of four tabs that correspond to the "Port setup", "I2C bus", "SPI bus" and "uWire bus". In the first tab, the USB port is setup, as well as the signal bonding for the four available ports can be configured. As a consequence of protocol selection by the user, read and write operations are available in the specific tab. The integrated circuit chosen as bridge is the USB to UART bridge CP2103, from Silicon Laboratories. This chip, as well as the USB to UART conversion, provides four General Purpose Input Output (GPIO) ports used for the aims of this work within a QFN-28 package. As presented, the development of a GUI integrating the commonly used 2-wire and 3-wire serial protocols provides a portable and friendly interface for the configuration of several devices very useful during prototype validation stage.

### **EXC: Excursion**

Wednesday, June 13 12:30-18:00 Crystal Ballroom

#### DAQ2: Data Acquisition 2 / Fusion

#### Thursday, June 14 08:30-10:40 Crystal Ballroom

#### DAQ2-1: Trends on Control and Data Acquisition in Fusion Devices: Towards High Availability B. Goncalves

Associao Euratom-IST, Instituto de Plasmas e Fuso Nuclear-Laboratrio Associado, Instituto Superi, Lisbon, Portugal

Experimental fusion technology has now reached a point where experimental devices will be able to produce as much energy as is expended in heating the plasma, and a roadmap for the development of fusion energy has been proposed. The next generation fusion experiments are envisioned to be more than an order of magnitude larger than those of today, will be highly complex, raise new challenges in the field of control and data acquisition systems and demand well integrated, interoperable set of tools with a high degree of automation. The immediate next step in this roadmap is the construction and operation of the International Thermonuclear Experimental Reactor (ITER). The ITER tokamak will demonstrate the physics understanding and several key technologies necessary to maintain burning plasmas. Control systems in nuclear fusion reactors act as a plasma control tool and an operation supervisor. Both thrusts of real-time control autivities will be steered by the need to satisfy regulatory requirements while addressing how to effectively control burning plasmas. Operation supervisory tools will likely be similar to the ones in use in fission reactors, with slow response time of the control system the order of hundreds of milliseconds. However, fusion reactors are expected to explore more advanced operation scenarios capable of sustaining a long duration, steady-state plasma and to suppress plasma instabilities almost completely. ITER will be capable of exploring advanced tokamak (AT) modes of operation, characterized by high plasma pressure, long confinement times, and low levels of inductively driven plasma current, which allows steady-state operation. These advanced modes rely heavily on active control to develop and maintain high performance plasmas with sufficient plasma density, temperature, and confinement to maintain a self-sustaining fusion reaction for long durations. For fusion burn control is essential to integrate

simultaneously multiple measurements from different sensors, real-time plasma modelling from several tools and multiple actuators in fast control loops with time constraints of the order of tenths of microseconds. Tokamaks are high order, distributed parameter, nonlinear systems with a large number of instabilities being required to solve many extremely challenging mathematical modelling and control problems. Fast control plant systems based on embedded technology with higher sampling rates and more stringent real-time requirements (feedback loops with sampling rates > 1 kHz) will be demanded. Furthermore, in ITER, it is essential to ensure all control systems have high-availability and that control loss is a very unlikely event. Providing a robust, fault tolerant, reliable, maintainable and operable control systems will be a crucial challenge that future reactors have to face. This contribution will address the real-time control needs of a fusion experiment, the present solutions, the existing problems and the broad scientific and technical questions that need to be addressed on the path to a highly-available fusion power plant.

# DAQ2-2: Feedforward Power Distortion Correction in RF Power Delivery Systems for Plasma Processing Systems

D. J. Coumou

MKS, ENI Products, Rochester, New York, United States

Many critical technologies starting from the past and continuing into the present century rely and will continue to rely on processes that utilize plasma based material processing. Plasma processing is a cornerstone technology for the semiconductor industry, and plays a critical role in the continued adherence of technology advances to the now famous Moores Law. Paramount to the continuum of plasma processing advances for thin-film manufacturing are reliable and repeatable RF power delivery systems used for RF plasma discharges. In this paper, we outline the present state of the art for controlling RF power delivery and contrast this to our scheme of centralized control at the RF source. In conventional RF power delivery systems, the RF power supply performs local control to regulate power for the required power level. An impedance tuning network resides between the RF power supply and the RF discharge. In a manner analogous to the RF power supply, this impedance tuning network, with dual actuators for load and tune compensation, adjusts variable impedance devices (i.e. variable capacitors) with motor-controlled actuators to adapt the network for maximum power transfer from the RF source to the plasma discharge. When power transfer is not at a maximum, some portion of the power is reflected by the load back to the RF source. We consider this power loss a distortion that requires a correction to achieve an optimal power transfer condition. Our approach is economical and achieves the vexing performance objectives necessary for advancing thin-film manufacturing. In our scheme, power regulation is also conducted with autonomous feedback control in the power supply. An RF sensor in the power supply serves a dual purpose of (1) coupling feedback to the power controller, and (2) providing a quantitative power distortion measurement of the RF power delivery system. We centralize control and remove systematic redundancy with a feedforward controller, also in the RF power supply, to correct the power distortion by adjusting elements in the impedance tuning network. A second instantiation of our feedforward controller demonstrates impedance tuning operation with a frequency-agile RF power supply. By adjusting frequency, power distortion in the RF power delivery system is corrected in a manner similar to the tune element in the impedance matching network. A frequency tuning RF power supply has the innate capability to achieve an optimal power condition with greater speed by orders of magnitude than the tune element in the impedance matching network. Our feedforward framework is a significant and substantial departure from the industry practice of using heuristic methods for impedance tuning to overcome RF power distortion. We demonstrate fast frequency tuning operation that is required by short-cycle thin-film processes and RF pulsing applications.

#### DAQ2-3: Prototyping Control and Data Acquisition for the ITER Neutral Beam Test Facility

P. Simionato<sup>1</sup>, E. Zampiva<sup>1</sup>, <u>A. Luchetta<sup>1</sup></u>, G. Manduchi<sup>1</sup>, A. Soppelsa<sup>1</sup>, C. Taliercio<sup>1</sup>, F. Paolucci<sup>2</sup>, F. Sartori<sup>2</sup>, P. Barbato<sup>1</sup>, M. Breda<sup>1</sup>, R. Capobianco<sup>1</sup>, F. Molon<sup>1</sup>, M. Moressa<sup>1</sup>, S. Polato<sup>1</sup>

<sup>1</sup>Consorzio RFX - CNR, Padova, Italy <sup>2</sup>Fusion for Energy, Barcelona, Spain

The ITER Neutral Beam Test Facility is being established to execute R&D on heating neutral beam injectors (HNB) for fusion research operating with negative ions. Its mission is to develop technology to build the HNB prototype injector by which reaching all stringent HNB requirements (16.5 MW injection power, 1MeV acceleration energy, 40 A ion current and 1 hour continuous operation). Two test-beds will be constructed in sequence in the facility: First the ion source test-bed, referred to as SPIDER, to optimize the negative ion source performance, second the actual prototype injector, referred to as MITICA, to optimize ion beam acceleration and neutralization. Control and data acquisition will facilitate the execution of the experimental activity, contribute to system availability and reliability and help increasing knowledge and understanding physics phenomena. SPIDER control and data acquisition system, referred to as SPIDER CODAS, is under design. To validate the main architectural choices, a system prototype has been assembled and performance tests have been executed to assess the prototype capability to fulfill the control and data acquisition system requirements. The prototype is based on open source software frameworks running under Linux. EPICS is the slow control engine, MDSplus is the data handler and MARTe is the fast control manager. The prototype addresses low and high-frequency data acquisition, 10 kS/s and 10 MS/s respectively, camera image acquisition, data archiving, data streaming, data retrieval and visualization, real time fast control with 100 us control cycle and supervisory control. The paper will discuss SPIDER CODAS design choices with reference to system requirements and will show how the proposed architecture can address all SPIDER CODAS requirements.

This work was set up in collaboration with and under financial support from Fusion for Energy.

## DAQ2-4: Real-Time Processing System for the JET Hard X-Ray and Gamma-Ray Profile Monitor Enhancement

<u>A. M. Fernandes<sup>1</sup></u>, R. C. Pereira<sup>1</sup>, A. Neto<sup>1</sup>, D. F. Valcarcel<sup>1</sup>, J. Sousa<sup>1</sup>, B. B. Carvalho<sup>1</sup>, V. Kiptily<sup>2</sup>, B. Syme<sup>2</sup>, P. Blanchard<sup>3</sup>, A. Murari<sup>4</sup>, C. M. B. A. Correia<sup>5</sup>, C. A. F. Varandas<sup>1</sup>, JET-EFDA Contributors<sup>6</sup>

<sup>1</sup>Associacao EURATOM/IST, Instituto de Plasmas e Fusao Nuclear, Instituto Superior Tecnico, Universidade Tecnica de lisboa, 1049-001 Lisboa, Portugal

<sup>2</sup>EURATOM/CCFE Fusion Association, Culham Centre for Fusion Energy, Culham Science Centre, Abingdon, OX14 3DB, UK
<sup>3</sup>Association EURATOM-Confederation Suisse, Ecole Polytechnique Federale de Lausanne (EPFL), CRPP, CH-1015 Lausanne, Switzerland
<sup>4</sup>Euratom-ENEA Association, Consorzio RFX, 35127 Padova, Italy

<sup>5</sup>Centro de Instrumentacao, Dept. de Fisica, Universidade de Coimbra, 3004-516 Coimbra, Portugal

<sup>6</sup>JET-EFDA, Culham Science Centre, OX14 3DB, Abingdon, UK, See the Appendix of F. Romanelli et al., Proceedings of the 23rd IAEA Fusion Energy Conference 2010, Daejeon, Korea

The Joint European Torus (JET) is currently undertaking an enhancement program, in which one of the objectives is to test relevant diagnostics for the International Thermonuclear Experimental Reactor (ITER), the reference for the next generation of fusion experiments. One of the challenges in ITER is the provision of real-time data analysis and compression capabilities, to sustain the expected long duration discharges and the high acquisition rates achieved by recent data acquisition systems. Foreseeing this real-time requirement, a new system was developed and installed at JET for the gamma-ray and hard X-ray profile monitor diagnostic. The new system, which is connected to 19 CsI(Tl) photodiodes in order to obtain the line-integrated profiles of the gamma-ray and hard X-ray emissions, was designed to overcome the data acquisition limitations of the present fast electron Bremsstrahlung diagnostic (FEB), while exploiting the required real-time features. This paper presents the developed real-time processing architecture for the JET gamma-ray and hard X-ray profile monitor. The system hardware, based on the Advanced Telecommunication Computer Architecture (ATCA) standard, includes reconfigurable digitizer modules with embedded Field Programmable Gate Array (FPGA) devices capable of acquiring and simultaneously processing data in real-time from the 19 detectors. A suitable algorithm was developed and implemented in the FPGAs, which are able to deliver the correspondent energy of the acquired pulses. and its associated occurrence time. The real-time processed data is sent periodically, during the discharge, through the JET real-time Asynchronous Transfer Mode (ATM) network, and stored in the JET scientific databases at the end of the pulse. Publishing the processed data in the ATM network enables this to be used for machine control purposes (e.g. the information about the line-integrated emissions of the hard X-rays in real time can be used to determine the lower hybrid current drive deposition before the main heating phase). Additionally, the realtime processed data is used for local calibration, using embedded radioactive sources to build in real-time the 19 channels spectra. The acquired raw data is also stored in the digitizer modules local memory and retrieved after the pulse to the JET database, where it can be post-processed offline to validate the real-time algorithms. The interface between the ATCA digitizers, the JET Control and Data Acquisition System (CODAS) and the JET real-time network is provided by the Multithreaded Application Real-Time executor (MARTe). From the experimental results it was concluded that it is possible to measure in real-time the line-integrals of both hard X-ray and gamma-ray emissions, covering an energy range from ~200keV to 8MeV. This allows us to meet two of the major milestones: the ability to process and supply high volume data rates in real-time and this over a wide spectrum energy range.

### DAQ2-5: Study of Radiation Damage in Front-End Electronics Components

<u>T. Higuchi<sup>1</sup></u>, M. Nakao<sup>1</sup>, R. Itoh<sup>1</sup>, S. Y. Suzuki<sup>1</sup>, E. Nakano<sup>2</sup> <sup>1</sup>High Energy Accelerator Research Organization, Ibaraki, Japan <sup>2</sup>Osaka City University, Osaka, Japan

Beside its great success of the Standard Model (SM) of the particle physics, in our Universe there still exist yet-unanswered questions by the SM. Several theoretical and experimental studies imply an existence of a new physics (NP) beyond the SM around O(1TeV) scale that can answer the questions. To elucidate the NP model, we will start the Belle II experiment in 2015 as an upgraded version of the Belle experiment. In the Belle experiment, we used to digitize detector hit signals by signal digitizers located about 10m away from the detector, where interconnections between the detectors and the digitizers were made by analog cables. In the Belle II experiment, because the number of readout channels will be about doubled from the Belle case to make the detector design more hermetic and granular, we move the signal digitizers to much closer position in or on the detector to avoid lot of analog cabling. The digitized signals in/on the detector will be transmitted to receiver modules via optical fibers.

The location for the Belle II signal digitizers will have harsh ambient neutrons and gamma-rays due to beam backgrounds, where we roughly estimate 10^11/cm-2 neutron counts and 100Gy gamma-ray irradiation per year. We have been carrying out radiation damage study to electrical components on the signal digitizers: FPGA chips, optical transceivers, and voltage regulators. In our previous studies, we find optical transceivers are very weak against gamma-ray radiation. All transceivers we tested were killed by 3-year-equivalent gamma-ray dose of the Belle II operation, while we plan to run the Belle II experiment for >10 years. Furthermore, it is known that low dose rate of gamma rays will boost damage integration in electronics components. If this is the case, the transceivers' lifetime will become shorter.

To investigate this problem, we systematically study several optical transceivers to find out the one that can survive >10-year-equivalent Belle II operation with varying the dose rate. In the conference, we report results of the systematic investigation, together with summary of our past studies.

## DAQ2-6: Readout Hardware and Firmware Architecture of the HFT PXL Detector at STAR

J. Schambach<sup>1</sup>, L. Greiner<sup>2</sup>, T. Stezelberger<sup>2</sup>, X. Sun<sup>2</sup>, M. Szelezniak<sup>2,3</sup>, C. Vu<sup>2</sup> <sup>1</sup>University of Texas at Austin, Austin, TX, United States <sup>2</sup>Lawrence Berkeley National Laboratory, Berkeley, CA, United States

<sup>3</sup>IPHC (Institut Pluridisciplinaire Hubert Curien), Strasbourg, France

The Heavy Flavor Tracker (HFT) is an approved micro-vertex detector upgrade to the STAR experiment at RHIC, consisting of three subsystems with various technologies of silicon sensors arranged in 4 concentric cylinders, to be installed in STAR by 2014. This new vertex detector will improve the track-pointing resolution in STAR to DCA of ~30 m in order to allow for a direct and full topological reconstruction of heavy quark meson decays and a better determination of the heavy quark meson spectra. The two inner most layers of the HFT close to the beam pipe, the Pixel (PXL) subsystem, employ CMOS monolithic active pixel sensor (MAPS) technology that integrates the sensor, front-end

electronics, and zero-suppression circuitry in one silicon wafer. The PXL layers of the HFT will consist of 400 MAPS sensors arranged in 40 ladders (10 ladders at 2.5cm and 30 ladders at 8cm from the beam), each containing 10 of these sensors. This talk will present selected design characteristics of the PXL detector part of the HFT and the hardware and firmware architecture of the proposed readout system for this detector, as well as its integration into the existing STAR framework. A prototype of this readout system has recently been used at CERN to take data from a telescope consisting of 7 sensors arranged in parallel planes and shown to be fully functional. Selected results from this beam test will be presented as well.

#### DAQ3: Data Acquisition 3

#### Thursday, June 14 11:05-12:25 Crystal Ballroom

## DAQ3-1: The Belle II Pixel Detector Data Acquisition and Reduction System

B. Spruck<sup>1</sup>, T. Gessler<sup>1</sup>, W. Kuehn<sup>1</sup>, S. Lange<sup>1</sup>, H. Lin<sup>2</sup>, Z. Liu<sup>2</sup>, D. Muenchow<sup>1</sup>, H. Xu<sup>1,2</sup>, J. Zhao<sup>2</sup> <sup>1</sup>II. Physics Institute, University Giessen, Giessen, Germany

<sup>2</sup>*IHEP*, Institute of High Energy Physics, Beijing, China

The upcoming Belle II experiment is designed to work at 40 times higher luminosity than its predecessor. Due to the high luminosity and the small distance of 14mm to the interaction region in Belle II, the inner pixel detector with its  $\sim$ 8 million channels will deliver ten times as much data as all other sub-detectors together. A data rate of  $\sim$ 22 GB/s is expected for a trigger rate of  $\sim$ 30 kHz and an estimated pixel detector occupancy of  $\sim$ 3 %, which is by far exceeding the specifications of the Belle II event builder system. Therefore a reduction of a factor >30 is needed.

A hardware platform capable of processing this amount of data is the ATCA based Compute Node (CN). Each node consists of an xTCA carrier board and four AMC/µTCA daughter boards. The carrier board supplies the high bandwidth connectivity between the daughter boards and the other CNs in the shelf by Rocket-IO links. In the current prototype design, each AMC board is equipped with a Virtex5FX70T, 4 GB of memory, GBit Ethernet and two optical links which allow for high data transfer rates. IPMI control of mother- and daughter board is foreseen. One ATCA shelf containing 10 motherboards/40 daughter boards is sufficient to process the data from the 40 FEE boards.

The data reduction on the CN is done in two steps. First, the data delivered by the front end electronics via optical links has to be stored in memory until the high level trigger (HLT) decision has been made. Depending on the event topology, this might take up to three seconds. This decreases the event rate by more than a factor of three. In a second step, the pixel data of the positively triggered events is reduced with the help of regions of interest (ROI), calculated by the HLT from projecting trajectories back to the pixel detector plane. The design allows additional ROI inputs computed from the silicon vertex strip detector tracklet data as well as from hit cluster properties. The final data reduction is archived by sending only data within these ROIs to the main event builder.

The pixel-ROI selection algorithm as well as the buffer management for random access have been implemented in VHDL. A hardware implementation of UDP for GBit Ethernet is used to overcome the limits of the slow software stack.

A full featured Linux system, including remote secure shell access as well as a full gcc compiler suite, allows for direct development of the slow control software part on the embedded processor (PowerPC). A test bench system, using an additional Linux PC which sends data by Ethernet to the CN and receives the processed data, has proven the main functionality of our prototype system. Test results of buffer management, optical links, hardware Ethernet stack and overall system performance will be presented. This work is supported in part by BMBF under grant 05H10RG8.

#### DAQ3-2: Design Concepts for a Hierarchical Synchronized Data Acquisition Network for CBM F. Lemke, U. Bruening

ZITI, Computer Architecture Group, University of Heidelberg, Mannheim, Germany

The Compressed Baryonic Matter (CBM) experiment at the Facility for Antiproton and Ion Research (FAIR) in Darmstadt is investigating the highly compressed nuclear matter using nucleus-nucleus collisions. Detecting various particles requires different types of detectors, which are positioned in dense arrangements and read-out by the front-end electronics (FEE). The FEE components are connected to the Data Acquisition (DAQ) network. Due to the different types of detectors, placement constraints and other requirements, the DAQ system must allow flexible build-up variants, efficient data aggregation schemes including speedup, precise time synchronization, and dense interconnection solutions. In addition, it must be able to handle a data flow from the detector of up to several TB/s. The FEE is using a self-triggered approach. The event selection is done after event building in the compute cluster. This paper presents the concepts and design details of the planned DAQ-structure and the final protocol version for the CBM network and its physical laver (PHY) integrating all traffic classes into one link. The CBMnet protocol and PHYs are used within all parts of the network directly in FEE ASICs and in all FPGAs included in the different readout hierarchy levels. Thus, there is no protocol conversion required within the DAQ system. Furthermore, the paper focuses on the concepts and plans using an ASIC for early data aggregation and control of FEE close to the detector. Different FPGA implementation variants have been successfully used within several beam times and FEE ASICs with integrated CBM network protocol are on the way. FPGAs are flexible and ideal to be used in a read-out chain, but due to radiation close to the detector and the requirement of an early control and aggregation stage, a fault tolerant ASIC design is required in the final DAQ system. This ASIC must not only be capable to handle traffic classes for data, control and synchronization into cluster direction towards the data processing board, but also being able to support at least 32 link connections to FEE boards (FEB) within the detector. The FEBs need to be synchronized and controlled by the ASIC. Furthermore, there must be a resynchronization of rebooted or reinitialized FEBs to integrate them back into the read-out chain during runtime. The single link bandwidth is 0.5 Gb/s and for supporting different read-out variants and bandwidth differences between inner and outer detectors the ASIC supports up to 4 links connected to each FEB resulting in a maximum bandwidth of 2 Gb/s. The ASIC aggregates the data and sends the data to the next stage with a minimum of 5 Gb/s on each link. An electrical to optical conversion attached in close distance to the ASIC combines the links into dense 12x ribbon fiber connections delivering at least 60 Gb/s. This compact and flexible read-out hierarchy structure provides the capability for the multi TB/s DAQ system required for CBM.

## DAQ3-3: Electromagnetic Calorimeter Trigger for PANDA Experiment

Z.-A. Liu<sup>1</sup>, Q. Wang<sup>1</sup>, H. Xu<sup>1</sup>, J. Zhao<sup>1</sup>, H. Lin<sup>1</sup>, T. Gessler<sup>2</sup>, S. Lange<sup>2</sup>, D. Muenchow<sup>2</sup>, B. Spruck<sup>2</sup>, W. Kuehn<sup>2</sup> <sup>1</sup>EPC, Inst. of High Energy Physics, Chinese Academy of Sciences, Beijing, China <sup>2</sup>II. Physikalisches Institut, Justus-Libig-universitat Giessen, Giessen, Germany

The Electromagnetic Calorimeter(EMC), consisting of cooled PbWO4 crystals, is one of the crucial components of the PANDA spectrometer at FAIR, Germany, a multi-purpose detector for tracking, calorimetry and particle identification, employing antiproton annihilations to investigate non-perturbative QCD aspects, in particular in the Charmonium region.PANDA will run at high luminosities providing up to 2107 interactions/s, and more than 200 GB/sdata rate. A specially designed trigger and DAQ system will separate relevant data from background to reduce data rate. To handle such rates, a dedicated DAQ system without hardware triggers is foreseen that is able to process the full stream of raw data. Pre-processing and event filtering will be done using ATCA based Compute Nodes (CN) with FPGAs serving as processing engines. This paper describes the EMC part of the trigger and data acquisition system in general and its implementation, with particular emphasis on our CN hardware architecture. The CN features, including a system on a programmable chip architecture and the data flow will be discussed. A detailed description of the EMC cluster finding, local maximum searching, and feature extraction algorithms will be presented. Finally a test system will be discussed and results with simulated data as input for this system will be shown.

## DAQ3-4: Implementation and First Results of the Real-Time Computing System for the Gamma Ray Energy Tracking in-Beam Nuclear Array (GRETINA)

<u>C. M. Campbell</u><sup>1</sup>, I.-Y. Lee<sup>1</sup>, M. Cromaz<sup>1</sup>, D. Doering<sup>1</sup>, Č. Lionberger<sup>1</sup>, D. Ć. Radford<sup>2</sup>, T. Stezelberger<sup>1</sup>, S. Zimmermann<sup>1</sup> <sup>1</sup>Lawrence Berkeley National Laboratory, Berkeley, CA, United States <sup>2</sup>Oak Ridge National Laboratory, Oak Ridge, TN, United States

The Gamma Ray Energy Tracking In-Beam Nuclear Array (GRETINA), a germanium detector system capable of measuring energy and position (within 2mm) of gamma-ray interaction points and tracking multiple gamma-ray interactions, has been built and tested. GRETINA is composed of seven detector modules, each with four highly purity germanium crystals. Each crystal has 36 segments and one central contact instrumented by charge sensitive amplifiers. Custom Digitizer/DSP boards convert the analog information with 14-bit analog to digital converters operating at 100 MS/s, and digitally processes the data to determine the energy and timing information of the gamma interactions with a crystal. The computing system is composed of VME readout CPUs running VxWorks, which communicate with a 60 dual-processor farm (each processor with four cores) through a 10 Gb/s Ethernet switch. The CPUs read out the digitizer/DSPs and send the data to the farm. Digitized pulses from each of the crystal segments are analyzed in real-time by the processor farm to determine the energy, time, and three-dimensional positions of all gamma-ray interactions. This sub-segment interaction information is then utilized, together with the characteristics of Compton scattering and pair-production processes, to track the scattering sequences of the gamma rays. The processor farm is capable of processing in real-time the position of 20,000+ gamma-ray/s. Tracking arrays will give higher efficiency, better peak-to-total ratio and much higher position resolution than current arrays used in nuclear physics research. Particularly, the capability of reconstructing the position of the facilities. In this paper we will present the details of real-time processing by the processor farm and results of initial in-beam tests of the computing system of GRETINA.

## DAQ4: Data Acquisition 4

## Thursday, June 14 13:40-14:40 Crystal Ballroom DAQ4-1: Extending the IceCube DAQ System by Integration of the Generic, High-Speed Sorter Module TESS

C. C. W. Robson<sup>1</sup>, K. Hanson<sup>2</sup>

<sup>1</sup>Physics, Stockholms universitet, Stockholm, Sweden

<sup>2</sup>Service de Physique des particules Imentaires, Universit Libre de Bruxelles, Brussels, Belgium

In the extreme environment of Antarctica at the South Pole, the IceCube experiment, the worlds first kilometer-scale neutrino telescope, collects cosmic ray events. IceCube consists of over 5000 digital optical sensor modules (DOMs) deployed on 86 instrumentation lines each extending 2.5 km deep in the antarctic ice. The array of optical modules monitors the Cherenkov light emitted by passing radiation, which, when digitized and timestamped to nanosecond precision, is used as input to sophisticated reconstruction algorithms that determine the direction, energy, and type of the incident cosmic ray event. In order to achieve this goal, the IceCube data acquisition system merges the digital data streams from each photodetector into a single time-ordered list which is presented to online triggers that determine, in realtime, whether or not a given pattern of hits is noise or signal. At the present time, the data provided to the triggers is limited by the performance of sorting and merging algorithms: the 500 Hz raw data rate from each sensor (2.5 MHz array aggregate rate) is beyond the capability of the central sort and merge. The current solution adopted by the IceCube detector is to impose a hardware-based pre-trigger coincidence on hits emanating from the DOMs which reduces the rate by a factor of 20. While this pre-trigger coincidence has negligible impact on the detector sensitivity for the principal goal of high-energy neutrinos from galactic or extragalactic sources, other low-energy physics searches are affected. This presentation details work done to develop and implement a system, TESS, which is capable of merging the full raw data stream being produced by the IceCube DOMs. TESS is designed as a pipelined architecture with three major modules: server, selector and the client glued together by circular buffers. The three modules runs in only three threads and since the architecture is self synchronizing and uses no data copying maximum performance can be achieved for global sorting of payloads. The TESS sorting architecture was originally designed to provide a globally sorted data stream for triggers targeting low-energy events from annihilation of hypothesized dark-matter particles, however its utility is generalizable to any IceCube trigger which requires inspection of the full data stream. The IceCube online supernova detection system is a notable example. Moreover, the architecture is generic to any system involving multiple, independently sorted data streams which must be merged into a single sorted data stream.

## DAQ4-2: Readout of GEM Stacks with the CERN SRS System

M. L. Purschke

Physics Dept., Brookhaven National Laboratory, Upton, NY, United States

The PHENIX Collaboration at the Relativistic Heavy ion Collider (RHIC) is considering a GEM tracker as part of a major upgrade of the detector. The CERN RD51 collaboration has developed a "Scalable Readout System" (SRS) to facilitate a standard readout of GEM stacks. For the tests we are currently conducting, we have implemented the readout of the SRS into our benchtop DAQ system, "RCDAQ". RCDAQ is a lightweight, yet powerful and versatile data acquisition system for smaller test bench setups (less than about 100,000 readout channels). It is format-compatible with the main PHENIX data acquisition system, and uses the standard root-based online monitoring and analysis machinery which we use in PHENIX. It comes with the standard amenities of a modern DAQ system, such as support for different event types, automatic bookkeeping features, Elog support, and the ability to script fully automated acquisitions. In addition to showing some early data from our GEM test detector, we will introduce the concepts behind RCDAQ, show the implementation of the SRS system, and highlight the main features of the system.

## DAQ4-3: GPS Timing and Control System for the HAWC Experiment

A. U. Abeysekara, D. Edmunds, J. T. Linnemann, T. N. Ukwatta Physics and Astronomy, Michigan State University, East Lansing, United States

We present a FPGA based GPS Timing and Control (GTC) system for the High Altitude Water Cherenkov (HAWC) experiment. HAWC is a next generation TeV gamma ray observatory currently under construction at 4100m altitude at Sierra Negra, Mexico. HAWC will survey the sky for TeV radiation from pulsars, supernova remnants, active galactic nuclei, gamma ray bursts, primordial black holes or potentially new astrophysical sources. Installation of HAWC has begun, and by 2014 we plan to install 300 steel water tanks each instrumented with 4 photo multiplier tubes. HAWC DAQ consists of 10 or more commercial VME 128 channel TDCs and another ~ 25 VME boards in the Scaler and Trigger subsystems. All boards need to be operated synchronously so that event fragments can be assembled into a coherent stream of events. The DAQ system will be triggered at up to 40 kHz, and read out in blocks of triggers to maximize throughput. The total data rate is expected to be up to 500 MB/s. The GTC system distributes trigger and flow control signals to all modules to keep all modules in synch. In addition, the GTC system provides each HAWC trigger with a unique time stamp with an absolute accuracy of one micro-second, sufficient to support synchronization with astrophysical phenomena of interest, such as pulsars or gamma ray bursts. Time strings are acquired from an inexpensive commercial GPS unit and buffered for readout by encoding the timestamps as inputs to the TDC channels. The 10 MHz GPS clock is also used to derive the 40 MHz low jitter clock signal that is used as the global clock of the HAWC experiment. The system is implemented using two VME card species, one a VME FPGA card used to handle clocks, timing, and control logic, and one handling fan-out, fan-in and level shifting to interface between the control system and the individual DAQ modules. We present the design of hardware and firmware, and early experimence with the GTC system.

#### MO6: Mini-orals 6

#### Thursday, June 14 14:40-15:40 Crystal Ballroom

# PS3-24: Development of a Clock Distribution System for Sub-Nanosecond Time Synchronization over Long Distances

#### Y. Yang, K. Hanson, T. Meures

Interuniversity Insitute for High Energies (IIHE), Brussels, Brussels, Belgium

The Askaryan Radio Array (ARA) is a new detector deployed at the South Pole designed to detect ultrahigh-energy neutrinos using radio frequency signals emitted by neutrino-induced cascades in the glacial ice. The whole array will contain 37 stations which cover O(100) km2 surface area. Each station consists of four two-hundred-meter-deep holes spaced 20 meters apart with 2 horizontally polarized and 2 vertically polarized antennas in each hole at depths ranging from 180 200 m. A custom-designed ASIC nominally located in close proximity to the antennas is used for high-speed digitization of the induced RF signals. In order to perform the complex particle reconstructions, each antenna signal must be recorded with a time precision of 50 ps relative to other antennas in the same station. In addition the digital data stream from the digitizer must be transmitted from the hole bottom to logic on the surface. This note describes our groups solution to both challenges that uses commercially available high-speed transceivers and the clock data recovery functions built into these ICs. This includes both a CATSE twisted pair version and an optical fiber version. While this application is discussed in particular, the technology has potential applications in many fields: any system that requires ultra-high precision synchronization of two or more remote clocks could benefit from the system described herein.

#### PS3-25: Development of the Data Acquisition System of a Large TPC for the ILC

<u>G. W. P. De Lentdecker<sup>1</sup></u>, E. Verhagen<sup>1</sup>, Y. Yang<sup>1</sup>, L. Jonsson<sup>2</sup>, B. Lundberg<sup>2</sup>, U. Mjornmark<sup>2</sup>, A. Oskarsson<sup>2</sup>, L. Osterman<sup>2</sup>, E. Stenlund<sup>2</sup> <sup>1</sup>Universite Libre de Bruxelles, Brussels, Belgium

<sup>2</sup>Lund University, Lund, Sweden

A large Time Projection Chamber (TPC) is proposed as part of the tracking system for a detector at the future electron positron linear collider ILC. The Linear Collider TPC (LCTPC) Collaboration is currently studying a large TPC prototype (60 cm long, with an outer radius of 77 cm), offering some modularity to investigate various gas amplification systems (GEM or Micromegas), pad sizes and geometries as well as different read-out systems. This prototype has already been extensively and successfully tested during more than 10 weeks, with 6 GeV electron beams. The readout electronics of the ILC large TPC prototype is based on the ALICE ALTRO ADC chip in combination with a newly developed charge pre-amplifier, PC16, which is programmable with respect to shaping time, gain, decay time and polarity. The preamplifier was specially developed as a first step towards the final electronics for the ILC TPC. The data acquisition system of the prototype is also based on the ALICE data acquisition system, using the Detector Data Link (DDL) and the PCI Detector Read Out Receiver Card (DRORC) both developed by the

#### ALICE Collaboration.

For the use of the TPC at the ILC, the current readout and data acquisition systems have to be upgraded in different aspects: size of the frontend electronics, power-pulsing capability, improved digital signal processing and higher bandwidth communication technology for the data acquisition. In this note we will mainly report on the latest developments concerning the front-end electronics and the new data acquisition system: we will report on the status of the design of the new Multi-Chip-Module (MCM) board that can house up to 8 new sALTRO16 chips as well as on the development of a first micro-TCA Advanced Mezzanine Card (AMC) prototype to replace the DRORC.

# **PS3-26:** Real-Time Performance of Commercial Intel-Based VME Controllers for the CODA Data Acquisition System

B. J. Moffit

Physics, Jefferson Lab, Newport News, VA, United States

We have evaluated the performance of several Intel-based VME controllers for use at in data acquisition systems (DAQ) at Jefferson Lab. In the 12 GeV era, PPC-based VME controllers running vxWorks will be replaced with those that are Intel-based running Linux. This is facilitated by the use of FPGAs on the VME modules to perform trigger logic and communicating trigger information over serial and fiber connections throughout the DAQ. The need for a hard realtime operating system on the VME controller is removed from the equation as the readout of the digitized data from the VME modules (using VME-2eSST) is done in a threaded environment with multiple cores while digitization is taking place in the buffered, pipelined system. In this paper, we briefly discuss the 12 GeV Hall D DAQ and the requirements of the VME Controller. We present results from baseline testing of various models from different vendors using a different Linux kernels, including results from a kernel compiled with the CONFIG PREEMPT RT patch.

## PS3-27: A Readout System Utilizing the APV25 ASIC for the Forward GEM Tracker in STAR

<u>G. J. Visser</u><sup>1</sup>, J. T. Anderson<sup>2</sup>, B. Buck<sup>3</sup>, A. S. Kreps<sup>2</sup>, T. Ljubicic<sup>4</sup> <sup>1</sup>CEEM, Indiana University, Bloomington, IN, United States <sup>2</sup>Argonne National Laboratory, Lemont, IL, United States <sup>3</sup>Bates R&E Center, Massachusetts Institute of Technology, Middleton, MA, United States <sup>4</sup>Physics, Brookhaven National Laboratory, Upton, NY, United States

We have developed a modular readout system for the 30,720 channel Forward GEM Tracker recently installed in the STAR Experiment at RHIC, BNL. The modular architecture is based on a passive compact PCI backplane running a custom protocol, not PCI, connecting 6 readout modules to a readout controller module. The readout modules provide all necessary functions, including isolated power supplies, to operate up to 24 APV25 chips per module with high-impedance ground isolation. The frontend boards contain a minimal set of components as they are located inside the STAR TPC inner field cage and are inaccessible except during long shutdown periods. The frontend boards connect to the readout modules with cables up to 24 m in length, carrying unbuffered analog readout signals from the APV25 as well as power, trigger, clock and control. The readout module digitizes the APV analog samples to 12 bits at 37.532 MHz, and zero suppresses and buffers the data. The readout controller distributes trigger and clock from the central trigger system, gathers the data over the backplane, and ships it to a linux PC via a 2.125 Gbps optical data link (DDL from ALICE). The PC gathers data from multiple readout controllers and dispatches it to the STAR event builders. The readout modules, controllers, and backplanes are housed in a common crate together with the GEM HV bias power supplies.

## PS3-28: A Comprehensive Zero-Copy Architecture for High Performance Distributed Data Acquisition over Advanced Network Technologies for the CMS Experiment

<u>A. Petrucci<sup>1</sup></u>, G. Bauer<sup>2</sup>, U. Behrens<sup>3</sup>, J. Branson<sup>4</sup>, S. Bukowiec<sup>1</sup>, O. Chaze<sup>1</sup>, S. Cittolin<sup>5</sup>, J. A. Coarasa Perez<sup>1</sup>, C. Deldicque<sup>1</sup>, M. Dobson<sup>1</sup>, A. Dupont<sup>1</sup>, S. Erhan<sup>6</sup>, D. Gigi<sup>1</sup>, F. Glege<sup>1</sup>, R. Gomez - Reino<sup>1</sup>, C. Hartl<sup>1</sup>, A. Holzner<sup>4</sup>, L. Masetti<sup>1</sup>, F. Meijers<sup>1</sup>, E. Meschi<sup>1</sup>, R. Mommsen<sup>7</sup>, C. Nunez-Barranco-Fernandez<sup>1</sup>, V. O'Dell<sup>7</sup>, L. Orsini<sup>1</sup>, C. Paus<sup>2</sup>, M. Pieri<sup>4</sup>, G. Polese<sup>1</sup>, A. Racz<sup>1</sup>, O. Raginel<sup>2</sup>, H. Sakulin<sup>1</sup>, M. Sani<sup>4</sup>, C. Schwick<sup>1</sup>, A. C. Cristian Spataru<sup>1</sup>, F. Stoeckli<sup>2</sup>, K. Sumorok<sup>2</sup>

<sup>1</sup>CERN. Geneva. Switzerland

<sup>2</sup>Massachusetts Institute of Technology, Cambridge, Massachusetts, USA

<sup>3</sup>DESY, Hamburg, Germany

<sup>4</sup>University of California, San Diego, San Diego, California, USA

<sup>5</sup>Eidgenssische Technische Hochschule, Zurich, Switzerland

<sup>6</sup>University of California, Los Angeles, Los Angeles, California, USA

This paper outlines a software architecture where zero-copy operations are used comprehensively at every processing point from the Application layer to the Physical layer. The proposed architecture is being used during feasibility studies on advanced networking technologies for the CMS experiment at CERN. The design relies on a homogeneous peer-to-peer message passing system, which is built around memory pool caches allowing efficient and deterministic latency handling of messages of any size through the different software layers. In this scheme portable distributed applications can be programmed to process Input to Output operations by mere pointer arithmetics and DMA operations only. The approach combined with the open fabric protocol stack ( OFED ) allows to attain a near wire-speed message transfer at application level. The architecture supports full portability of user applications by encapsulating the protocol details and network into modular peer transport services whereas a transparent replacement of the underlying protocol facilitates deployment of framework and prevents the potential difficult couplings to deal with when the underlying communication infrastructure changes. We demonstrate the feasibility of this approach by giving efficiency and performance measurements of the software in the context of the CMS distributed event building studies.

<sup>&</sup>lt;sup>7</sup>FNAL, Chicago, Illinois, USA

# PS1-29: Phase and Amplitude Drift Calibration of the RF Detectors in a MTCA.4 Based LLRF System

<u>J. Piekarski</u><sup>1</sup>, K. Czuba<sup>1</sup>, M. Hoffmann<sup>2</sup>, W. Jalmuzna<sup>3</sup>, F. Ludwig<sup>2</sup>, H. Schlarb<sup>2</sup>, C. Schmidt<sup>2</sup>, B. Yang<sup>2</sup> <sup>1</sup>Institute of Electronic Systems, Warsaw, Poland <sup>2</sup>Deutsches Elektronen-Synchrotron, Hamburg, Germany <sup>3</sup>Department of Microelectronics and Computer Science, Technical University of Lodz, Lodz, Poland

One of the key components of Low-Level RF systems (LLRF) in Free Electron Lasers (FELs) is the RF field detector that converts the detected cavity field signal to an intermediate frequency (IF) for digital sampling. Amplitude and phase drifts appearing in RF field detectors significantly limit the system precision and they can not be corrected automatically by digital control loops basing on standard signals. To solve this problem a drift calibration scheme was developed to measure exact drift values and correct them during LLRF system operation. Because nowadays FELs are operating in the pulse mode, there is a period of time when the detection chain can be characterized by an injection of a reference signal. In order to achieve high accuracy drifts have to be measured just before the normal operation of the RF detector. For that purpose, a special RF Drift Calibration Module (DCM) has been designed which cooperates with the MTCA.4 based LLRF system. In this paper we present the drift calibration method and the DCM design. Laboratory results and tests at the Cryo-Module Test Bench (CMTB) are demonstrated as well.

# **PS3-35:** Implementation of Intelligent Data Acquisition Systems for Fusion Experiment Using EPICS and FlexRIO Technology

<u>D. Sanz<sup>1</sup>, M. Ruiz<sup>1</sup>, R. Castro<sup>2</sup>, J. Vega<sup>2</sup>, J. M. Lopez<sup>1</sup>, E. Barrera<sup>1</sup>, N. Utzel<sup>3</sup>, P. Makijarvi<sup>3</sup> <sup>1</sup>CAEND-UPM-CSIC, Universidad Politecnica de Madrid, Madrid, Spain <sup>2</sup>Asociacion EURATOM/CIEMAT, Madrid, Spain <sup>3</sup>ITER Organizarion, St. Paul lez Durance Cedex, France</u>

The data acquisition systems used for fusion experiments have the following requirements: a large number of analog input channels synchronized among them, high speed sampling rate, pre-processing capabilities with real time constraints, interface for carry out control loops and data archiving to stores and process/display data off-line. In addition, some other features are becoming relevant. These are, the generation of hardware events, the TimeStamping of the data with the maximum accuracy. To meet this list of requirements, implies the use of reconfigurable input/output devices depending on the specific diagnostic. These functionalities in general are not available in general purpose multifunction data acquisition devices. The main objective of this work has been to propose and implement a methodology based on: a) having reconfigurable data acquisition system customized taking into account the requirements of the scientific in charge, b) providing multifunction data acquisition with TimeStamping functionalities, c) simplifying the implementation of scalable system. d) providing the integration in EPICS, a distributed control system framework. The complete solution has been achieved developing an EPICS asynDriver device support and a design-model for configuring RIO FPGA based devices. This EPICS device support will be able to manage every RIO/FlexRIO device using a ruleset, making possible to get the features presented above. The design-model requires following a workflow with these steps: 1) the scientist lists in a designed spreadsheet the features he needs. 2) The spreadsheet generates automatically the db file used to create an IOC EPICS application to control the RIO device. 2) Using LabVIEW and following the design-model defined by this project and by providing a spreadsheet filled by the scientist, the design is compiled to obtain the bitfile to program the FPGA. 3) With the bitfile, LabVIEW tools generate a header file with the mapping of the resources in the FPGA. 4) The user uses the bitfile, the header file, the db file, and the EPICS device support to create the IOC application capable to control and manage the RIO/FlexRIO device configured with specific features. The resulting system is a complete and easily reconfigurable data acquisition system which permits, in a short period of time to achieve a ready-touse solution for a new type of experiment. The relevant aspects of the proposed solution will be presented, with its main advantages and limitations.

# PS4-1: A General Self-Organization Tree-Based Energy-Balance Routing Protocol for Wireless Sensor Network

<u>Z. Han</u><sup>1,2</sup>, J. Wu<sup>1,2</sup>, J. Zhang<sup>1,2</sup>, L. Liu<sup>1,2</sup>, K. Tian<sup>1,2</sup>

<sup>1</sup>Department of Modern Physics, University of Science and Technology of China, Hefei, Anhui, China,230026 <sup>2</sup>State Key Laboratory of Particle Detection & Electronics, University of Science and Technology of China, Hefei, Anhui, China,230026

Wireless sensor network (WSN) is composed by a large number of low-cost micro-sensors to collect and send various kinds of message to base station (BS). It has a wide-range of applications, including military surveillance, disaster prediction, and environment monitoring, so it attracts a lot of attention. Since battery replacement is not an option for network with thousands of physically embedded nodes, energy efficient routing protocol must be employed to get long-life work time. To achieve that, we need not only to minimize energy consumption but also to balance WSN load. Researchers have proposed many elegant protocols. LEACH, HEED, PEGASIS, and PEDAP are typical protocols based on data-fusion. However, LEACH and HEED consume energy heavily in the head nodes so the head nodes tend to die early. PEGASIS which is known as a chain-based energy efficient protocol, has a long time delay. PEDAP consist a minimum spanning tree which has nearly optimal cost. But such a static protocol needs BS to build the topography. On another hand, PEGASIS and PEDAP are to suitable for the case that relay node should transmit the message include both of its own and its children's, which can not be fused. LEACH, HEED adapt to this case to a certain extent. They are all cluster-based and try to balance the load in such case, but the nodes further from BS still die first.

In this paper, a general self-organization tree-based energy-balance routing protocol (GSTEB) is proposed. This protocol assumes that each node can get its coordinate by GPS or other manners. Through sending query packets for a certain radius, nodes can get the neighbors' information such as coordinates, energy-level (EL), etc. EL is a parameter for load balance. It's a relative and estimated energy value rather than a true one. Each round, BS assigns a new root and broadcast to all nodes. After that, each node selects its parent in parallel by using the EL and coordinate information. The selection criteria are: 1) The distance between parent node and root is smaller than that between root and itself. 2) If root is BS, parent node should have the largest EL among neighbors. If root is a general node, the EL of parent node shouldn't be smaller than its own. 3) The parent node chose should lead to the least energy consumption. A MATLAB simulation shows that use the same model

with PEGASIS, each round GSTEB spends only 0.5% extra energy than PEDAP. Because GSTEB is a dynamic and parallel protocol, it can change the root and reconstruct routing tree with shorter delay and less overhead depending on the criteria mentioned above, so a better balance load is achieved, especially for dense nodes deployed. For this model, GSTEB improves the death round of the first node by 150% comparing with PEGASIS. For the other situation that data can't be fused, we compare GSTEB with HEED, result shows that GSTEB improves the death round of the first node by 100% compared with HEED.

#### PS4-9: Experiences with the MTCA.4 Solution for the EuXFEL Clock and Control System

E. Motuk, M. Postranecky, M. Warren, M. Wing

Department of Physics and Astronomy, University College London, London, United Kingdom

The clock and control (CC) system for the EuXFEL mega-pixel detectors consists of a multi-purpose MTCA.4 AMC card with a Xilinx FPGA and a custom designed Rear Transition Module (RTM) which provides the CC functionality. The system resides in a MTCA.4 crate with the Timing Receiver (TR) board and synchronises the DAQ system to the general EuXFEL timing. This paper presents the experiences with the prototype system in addition to describing the RTM hardware and the CC system firmware in detail. The tests that have been performed to validate the basic and MTCA.4 specifications related functionality are presented first. The next stage of tests involve confirming the system functionality by using the TR board as it would be in the EuXFEL DAQ system and a development board to simulate a Front End Electronics (FEE) unit. The performance metrics in terms of jitter and bit error rates for FEE communication are presented. As a result of the performance tests, the improvements and modifications to the current hardware for the final system are outlined in the conclusions.

### **PS4-12: Timing and Triggering System for the European XFEL Project - a Double Sized AMC Board** <u>A. Hidvegi<sup>1</sup></u>, P. Gessler<sup>2</sup>, H. Kay<sup>3</sup>, K. Rehlich<sup>3</sup>, C. Bohm<sup>1</sup>

<sup>1</sup>Physics Dept., Stockholm University, Stockholm, Sweden <sup>2</sup>European X-Ray Free Electron Laser Facility GmbH, Hamburg, Germany <sup>3</sup>Deutsches Elektronen-Synchrotron (DESY), Hamburg, Germany

The European X-Ray Free Electron Laser (XFEL) [1] is a 3.4km long linear accelerator that will enable new scientific research, by studying structures and events on the nanoscale.

For such a complex machine to operate properly, precise timing and trigger information must be distributed throughout the entire accelerator. This information is used by the monitoring equipments of the accelerator and also by the experiment stations. The phase stability (jitter) of clock signals must be better than 5 ps (RMS), including drifts due to changes of propagation delay in fiber cables, caused by temperature variations.

The system was developed in several steps, starting with an evaluation board to test key concepts, a single-size AMC prototype for  $\mu$ TCA system with two revisions and finally a double-size AMC board for  $\mu$ TCA system.

The double-size AMC board is intended for the final system that incorporates all the functionality that different user groups have requested and all the experiences gained from the previous prototypes. To reduce overall system cost some parts were implemented on daughter-boards. For customization to different application the possibility to use rear-transition modules (RTM) has been added. This presentation will mainly focus on the new double-size AMC board, give an architectural overview and report some performance measurements.

## PS4-14: Directive Multi-Channel Beta Probe for Detecting Small Tumors

S. J. Jeon, J. H. Park, <u>K. S. Joo</u> Physics, Myongji University, Yongin, Gyeonggido, South Korea

Devices referred beta-probes have been developed to assist surgeons in locating tumor or tumor remnants during surgery. This study was developed for a compact multi-channel beta probe based on Silicon Photo-Multiplier using BCF-12 optical fiber scintillator. The compact multi-channel beta probe that is produced for this experiment is able to distinguish beta-ray count rate and annihilation gamma-ray background count rate. Each detection channel is made of 1mm diameter, 15cm length BCF-12 scintillator and silicon photomultiplier. In order to separate the beta-ray signal from the gamma-ray background, each detection channel is associated with a close channel shielded from beta-ray with a 200 $\mu$ m thickness lead. The probe was evaluated for detecting performance using two kinds of radioisotopes which were Na-22(1 $\mu$ Ci), Cs-137(1 $\mu$ Ci). In order to detect spatial information, line response function was calculated. Line response function of the probe was calculated by 0.5mm stepping Na-22 source. The source was placed at 0.5mm from the front of the probe. The probe has a good detecting efficiency : Na-22 and Cs-137 are measured 15% and 16% at 1  $\mu$ Ci radiation source. The annihilation gamma-ray background is eliminated by a subtraction method. Gamma-ray subtraction method has improved position sensitivity of the probe. FWHM of line response function decreased by approximately half : Gamma-ray background subtraction data has 2.79mm FWHM, and non-subtraction has 4.57mm FWHM. This result demonstrates the potential ability of the probe to trace more accurately the small tumor. The beta-probe has been made to be visually small and has efficiency of detecting small tumor as supportive surgical machinery of nuclear medicine.

## PS4-20: High-Precision Accelerator RF Control for the European XFEL

H. Schlarb<sup>1</sup>, F. Ludwig<sup>1</sup>, M. Hoffmann<sup>1</sup>, T. Jezynski<sup>1</sup>, J. Branlard<sup>1</sup>, C. Schmidt<sup>1</sup>, M. Grecki<sup>1</sup>, V. Ayvazyan<sup>1</sup>, S. Pfeiffer<sup>1</sup>, K. Czuba<sup>2</sup>, A. Piotrowski<sup>3</sup>, O. Hensler<sup>1</sup>, W. Jalmuzna<sup>3</sup>, D. Makowski<sup>3</sup>, L. Butkoswki<sup>2</sup>, W. Cichalewski<sup>3</sup>, I. Kudla<sup>1</sup>, J. Piekarski<sup>2</sup>, K. Przygoda<sup>3</sup>, I. Rutkowski<sup>2</sup>, D. Sikora<sup>2</sup>, J. Szewinksi<sup>1</sup>, W. Wierba<sup>1</sup>, B. Yang<sup>1</sup>, L. Zembala<sup>2</sup>, S. B. Habib<sup>2</sup> <sup>1</sup>MSK, DESY, Hamburg, Germany <sup>2</sup>ISE, WUT, Warsaw, Polen <sup>3</sup>DMCS, Uni of Lodz, Lodz, Polen

Fourth generation light sources based on linear accelerator driven Free Electron Lasers (FELs) open new research opportunities in singlemolecule imaging, material science, atomic physics, biology and extremely short timescale X-ray science. Currently, the largest FEL project under construction is the 3.5 km long European-XFEL in Hamburg targeted towards high photon pulse production rate (30000 pulses/sec) with

an unrivaled brilliance in the Angstrom wavelength range. The large number of photon pulses is achievable by accelerating the electron beam to 17.5 GeV in a pulsed superconducting accelerator comprised of 100 cryogenic modules each containing 8 nine-cell Niobium cavities cooled to 2 K. To make a cost-effective, reliable, maintainable and scalable system, which meets industrial standards, a new development of the RF controls based on MTCA.4 architecture was started. While most of the RF controls are realized in an external 19 chassis in order to achieve the very challenging RF field detection precision, we could demonstrate that when the appropriate precautions are taken, field detection, RF generation, RF distribution, together with digital DAQ system and the high-speed real-time can be entirely embedded in the MTCA.4 crate system. This ground breaking result of embedding ultra-high precision analog electronics for detection on the Rear Transition Module (RTM) together with the high power digital procession units on the AMC opens up entirely new possibilities for MTCA.4 and is particularly relevant for Free Electron Lasers where the acceleration field precision should be well below 0.01% and 0.01 deg (equivalent to 20 femtoseconds) in amplitude and phase. In this paper, we present the architecture of the superconducting RF control system with various pre-, main- and post-processing entities for the 2500 RF channels and give an overview of the firmware structure, software architecture and automation.

# PS3-22: Minimizing Dead Time of the Belle II Data Acquisition System with Pipelined Trigger Flow Control

<u>M. Nakao</u><sup>1</sup>, C. Lim<sup>2</sup>, M. Friedl<sup>3</sup>, T. Uchida<sup>1</sup> <sup>1</sup>IPNS, KEK, High Energy Accelerator Research Organization, Tsukuba, Ibaraki, Japan <sup>2</sup>Department of Physics, Yonsei University, Seoul, Korea <sup>3</sup>HEPHY, Austrian Academy of Sciences, Vienna, Austria

TThe Belle II experiment at the SuperKEKB e<sup>+</sup>e<sup>-</sup> energy-asymmetric storage ring at KEK, Tsukuba, Japan, is now under construction to search for physics beyond the Standard Model in B meson, charm mesons and  $\tau$  lepton decays. The detector consists of seven sub-detector systems, for which the data acquisition scheme is unified with an exception for the innermost pixel detector (PXD). For the unified systems, the data generated at the frontend upon the level-1 trigger distributed by the timing distribution system is transmitted to a COmmon Pipelined Platform for Electronics Readout (COPPER) system with a homemade protocol which we call the Belle2link. For the design luminosity of 8×10<sup>35</sup> cm<sup>-2</sup> s , we expect 1 kHz rate each for B, charm and  $\tau$  production, or 10 kHz for the total physics event rate. Including backgrounds we design our data acquisition system to be able to handle 30 kHz level-1 trigger rate. In order to minimize the dead time, the frontend digitization system is operated in a pipeline manner. In order to minimize the dead-time fraction, we introduce a pipelined trigger flow control scheme to minimize the dead-time fraction (or garbage-event fraction) while avoiding data collapse in the data-flow. In this report, we describe the design of the trigger flow control of Belle II, the trigger distribution and status collection scheme to minimize their latency, the simulation results on the dead-time fraction for various parameters, and measurement of the dead-time fraction in a realistic setup. The trigger flow control design is largely driven by the silicon vertex detector (SVD) readout scheme, which has a fixed pipeline length in its APV25 readout chip. We find an operation scheme which generates less than 1% dead-time fraction at the 30 kHz trigger rate. We present the parameter dependence of the dead-time fraction by using a simple simulation program. For other detectors, the data buffers are inside the FPGA which also handles the timing signal, and can be flexibly designed. The entire system has to be controlled by a single source of the trigger distribution tree, and therefore it is crucial to minimize the latency. We developed a serial data handling scheme that minimizes the overhead of encoding and decoding. Finally, using a dummy trigger generator, timing distribution modules and prototype frontend readout boards, we demonstrate this pipelined trigger flow control scheme.

### **PS4-2: Real Time Control System of Active Reflector of FAST**

X.-C. Deng<sup>1,2</sup>, W.-Q. Wu<sup>1,2</sup>, M.-C. Luo<sup>1,2</sup>, H.-T. Shen<sup>3</sup>, L.-C. Zhu<sup>3</sup>, P.-Y. Tang<sup>1,2</sup>, J.-J. Liu<sup>1,2</sup>, <u>F. Li<sup>1,2</sup></u>, G. Jin<sup>1,2</sup>, J. Wang<sup>1,2</sup> <sup>1</sup>Univ. of Sci. & Tech. of China, hefei, anhui, China <sup>2</sup>State Kay Laboratory of Technologies of Particle Detection and Electronics, hefei anhui, china

<sup>2</sup>State Key Laboratory of Technologies of Particle Detection and Electronics, hefei, anhui, china <sup>3</sup>National Astronomical Observatories,, beijing, china

Five-hundred-meter Aperture Spherical radio Telescope (FAST) is a Chinese mega-science project to build the largest single dish radio telescope in the world as shown in Fig.1. It use the karst depression as the site which is large to host the 500-meter telescope and deep to allow a zenith angle of 40 degrees. As a huge scientific device, the supporting structure of the radio telescope FAST demands special requirements beyond those of conventional structures. The most prominent one is that the supporting structure should enable the surface formation of a paraboloid from a sphere in real time through active control. The main object of control system is to build a control network in the karst depression which consists of master control computer, layered control unit and 2300 actuator control nodes, that can drive the actuator to make the reflector surface formation of a paraboloid from a sphere in real time. As known in popular, EPICS (Experimental Physics and Industrial Control System) is a good and easy-use framework for a real time control system which was developed by LANL and ANL initially. EPICS now is used in many scientific device such as large accelerators, large telescopes. In the design of ARS control system, we selected EPICS. The function makeup of ARS control system has three layers. At the lowest level there are Control nodes which can control motor to drive cable and sampling points which can sample the cable tension and node position. The middle level is area control unit which control about 200 nodes at defined area in which the control nodes are in one field bus such as RS485 or CAN. The top level is the ARS Master Control Unit which gets the data of position benchmark from benchmark system and interface with upper system called Central Control System and Control and manage components in the lower levels. In the ARS control system, the design is based EPICS, the lowest level is Control Node and Sampling Point corresponding to the motor control and node position sampling and cable tensions sampling. In the middle level, an IOC is designed for Area Control Units which consists of about 200 nodes. We have finished the IOC in LAB, design a simulator for control nodes and accomplished master control Unit interfacing with CCS, and tested in LAB environment.

### PS3-20: Upgrading the Backend of the Pipeline Readout System for Belle II

<u>S. Y. Suzuki</u>, T. Higuchi, M. Nakao, R. Itoh, Y. Igarashi *KEK, Tsukuba, Ibaraki, Japan* 

The Belle II experiment, the successor of the Belle experiment at KEK for the study of CP violation, will start operation in a couple of years. One of the challenge to the data acquisition system is the expected high trigger rate, which is about 40 times higher than that of Belle. Belle experiment used a large number of pipelined TDC on COmmon Pipeline Platform for Electronics Readout (COPPER) modules. Each

COPPER module consists of a 9U-size baseboard, TDC mezzanine cards in the homemade form factor called FINESSE, a trigger receiving module in the PMC form factor (IEEE 1386.1), and a processor PMC module (PrPMC). At Belle II, digitization electronics are moved to further frontend of the data stream, but we still reuse the COPPER modules as the backend of the pipeline readout system.

The average trigger rate is about 30 kHz and we expect that the largest detector requires the bandwidth of 30 MB/s per COPPER module. To handle this bandwidth, both the interface speed and the processing power have to be sufficiently high enough. The task of the PrPMC is not only transmitting data from the baseboard to the network, but also the data error detection, consistency check, and removal of redundant data headers. The items to be checked depend on the detector subsystem and software could vary. In order to use the offline software module without modification for the online processor, the COPPER system requires the x86 architecture for the PrPMC.

The original COPPER board using the EPC-6315 PrPMC of Radisys Corporation with the Pentium-III 800MHz processor does not meet the requirement. The Fast Ethernet interface of EPC-6315 clearly cannot handle the bandwidth, buf we found that the use of the Gigabit Ethernet on the COPPER base board is still slow because of the bus bridge latency. As no suitable board was not found in the market, we decided to develop a new PrPMC with a Gigabit Ethernet interface and sufficient processing power for the COPPER systems of Belle II.

The new PrPMC equips the Intel Atom CPU Z530 processor, Poulsbo chipset, and a Gigabit Ethernet interface. Poulsbo chipset has two PCI express paths; one is dedicated to the Gigabit Ethernet, and the other is connected to the PCI bus bridge. The latter will be fully used to receive the data from FINESSE modules.

We rewrote the device drivers to operate COPPER systems for this new PrPMC, and we confirmed that the bus reading speed reaches 120 MB/s under the 30 kHz trigger rate. Previous drivers did most of device operations in interrupt handling context, that made the system unstable under a high trigger rate. Now most device operations are moved out to the process context using the WorkQueue feature in the Linux kernel. With this rewriting, the maximum acceptable trigger rate exceeds 70 kHz.

We will report further test results of the COPPER readout system using our new processor module and the updated software.

## **PS3-31: Communication Architecture of DAQ-Middleware**

<u>Y. Nagasaka<sup>1</sup></u>, H. Sendai<sup>2</sup>, E. Inoue<sup>2</sup>, T. Koutoku<sup>3</sup>, N. Ando<sup>3</sup>, S. Ajimura<sup>4</sup>, M. Wada<sup>5</sup> <sup>1</sup>Hiroshima Institute of Technology, Hiroshima, Japan <sup>2</sup>High Energy Accelerator Research Organization, Ibaraki, Japan <sup>3</sup>The National Institute of Advanced Industrial Science and Technology, Ibaraki, Japan <sup>4</sup>Osaka University, Osaka, Japan <sup>5</sup>Bee Beans Technologies Co. Ltd., Ibaraki, Japan

DAQ-Middleware is a software framework of a network-distributed data acquisition system for a small or middle size experiment. The framework was developed for a data acquisition system based on Robot Technology Middleware, RT-Middleware, which is an international standard of OMG, Object Management Group and is not only for Robotics but also for embedded systems.

The framework is developed with the object-oriented technology and a CORBA, Common Object Request Broker Architecture, technology to communicate objects. The communication of CORBA is sufficient for Robotics. But other communication method, which is not based on CORBA, is also required for a DAQ system framework.

We developed new communication architecture of DAQ-Middleware which was based on a usual socket communication. Each communication method can be selected with using the configuration file. The performance of the DAQ-Middleware with using new communication architecture was measured and compared with CORBA. The throughput is improvement in case of a transfer of data whose size is greater than about 256 kByte.

## PS3-34: Advanced Linux PCI Services (ALPS) for Rapid Prototyping of PCI-Based DAQ Electronics

S. A. Chilingaryan, M. Caselle, A. Kopmann, U. Stevanovic, M. Vogelgesang

IPE, Karlsruhe Institute of Technology, Karlsruhe, Germany

Writing stable and performant drivers and keeping them up to date with the latest Linux kernel is complex and tedious task. It is especially difficult to synchronize parallel development of hardware and software. However, many components of PCI driver are standard. Basically, in development phase hardware engineers often only need access to the device registers and the ability transfer data between device and host memory in few different modes. This functionality may be provided uniformly for most devices. We developed a universal PCI driver and a debugging tool to facilitate hardware development. There are few basic ideas behind: A universal driver is used during the development phase. If necessary, the dedicated driver might be implemented when the hardware is ready. To simplify maintenance of new kernel versions, we split the DMA implementation into the two parts. The kernel module is kept as small as possible and it is responsible for the memory management only. The actual implementation of the DMA engine and most of other features are actually realized in user space. Finally, the design of the driver allows fine grained scripting. For example, it is possible to start the DMA engine, set some registers to initiate DMA transfer, read data from DMA engine, make an attempt to process it, and if the wrong data is returned, analyze the status registers to find the signature of the error. So, the hardware design is not blocked by missing or malfunction software and no software modifications are required for hardware debugging. The PCI board is identified by the vendor and device ids which are specified as module parameters. The register model is defined by a simple XML file. The driver is able to operate in two non-DMA modes: with plain PCI memory mapping and FIFO registers. DMA engines depend on the FPGA implementation and are supported by plugins. Along with driver we provide an SDK and a command line tool. To simplify integration with distributed data acquisition systems, we plan to enhance ALPS by a web-service interface. The universal driver is successfully used for the development of a high-throughput camera platform at KIT.

### PS3-37: A 16-Channel 15 ps TDC Implemented in a 65 nm FPGA

L. Zhao<sup>1,2</sup>, X. Hu<sup>1,2</sup>, S. Liu<sup>1,2</sup>, J. Wang<sup>1,2</sup>, Q. An<sup>1,2</sup>

<sup>1</sup>State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei, Anhui, China <sup>2</sup>Anhui Key Laboratory of Physical Electronics, Department of Modern Physics, University of Science and Technology of China, Hefei, Anhui, China

We present the implementation of a high-resolution Time-to-Digital Converter (TDC) targeting a Field Programmable Gate Array (FPGA) from Xilinx Virtex-5 family. There are a total of 16 channels with the timing performance of about 15ps RMS, 30 ps Bin per channel.

The design of TDC is based on a counter and an interpolator method. A counter is used to track the coarse time elapsed since the TDC is enabled, and gives the coarse time. This approach also offers a large dynamic range that is only limited by the number of counter bits. Dedicated carry-in lines in CARRY4 block of Virtex-5 FPGA are utilized for time interpolation, which gives fine time measurements within a system clock period. There are many approaches to implement time interpolation: Vernier method tuning two ring oscillators, pure tapped delay line (TDL), Wave Union TDL and Vernier TDL. We focus on pure TDL method after making a serious trade-off among high-resolution, flexibility, resource utilization and dead time. Our simulation shows that the delay from CIN to COUT in CARRY4 block is as large as 104 ps. Thus we need to subdivide the delay of CARRY4 into finer taps for a higher resolution. Temperature, power voltage and process variations are common causes to the inhomogeneous delay cells. However, additional uneven delays need to be calibrated due to the dividing operation. Multiple strategies are applied to calibrate the non-uniformity of delay cell and to enhance the TDC resolution. The initial point is a code density test. A bin-by-bin calibration look-up table can be built inside FPGA according to the test results and utilized to compensate the temperature and voltage instability.

Due to the disparity of delay lines in Virtex-5 FPGA compared to the carry-in resources inside previous Xilinx FPGA families an extra effort is needed to flatten the inhomogeneous delays. It can be done either with software calibration or directly with hardware compensation. For software calibration, we can get the asymmetric distribution of the bin width using MATLAB and analyze its influence on the linearity of TDC. The asymmetric distribution, to a large part, is normal. We can easily compensate this delay variation during the offline data processing of time. For hardware compensation, we derive the asymmetric delay distribution inside CARRY4, and make most of this asymmetry balance the asymmetric delay using the inherent tapped delay line. Meanwhile, we also apply Place and Route (PAR) constraints to fit our TDC design. Two different configurations are used to get the tapped point of TDL for comparison. Hardware compensation requires no extra resource and is more attractive and efficient.

We design an evaluation board to verify the performance of the TDC. This board is not only set up for bench-top test, but has potential in modularization as it is physically implemented in 6U PXI format.

# **PS3-5: Design and Implementation of DAQ Readout System for Daya Bay Reactor Neutrino Experiment**

X. Ji, F. Li, K. Zhu

Institute of High Energy Physics, Chinese Academy of Sciences, Beijing, China

The Daya Bay Reactor Neutrino Experiment will consist of seventeen separate detector subsystems distributed in three underground experimental halls. There will be eight PMT based anti-neutrino detectors (ADs), six water-Cherenkov detectors, and three RPC detector subsystems. Each detector will be read out using an independent VME crate. The data acquisition (DAQ) readout system (ROS) reads data fragments from electronics and trigger modules then concatenates them into an event in each crate. A detailed design and implementation of the DAQ readout system for Daya Bay Reactor Neutrino Experiment will be presented.

## PS4-21: The Application of Embedded System in Csns Experimental Control System

J. Zhuang, K. Zhu, Y. Chu, L. Hu, J. Li, D. Jin

Division for Experimental Physics, Institute of High Energy Physics, CAS, Beijing, China

CSNS (China Spallation Neutron Source) is a large scientific plant that will be settled in China and the plan of its construction will be carry out in next 6.5 year. The control system of CSNS is a large-scale open source DCS system. The way that the front controller integrated into DCS is critical in the control system. Traditionally, a single board computer with vxWorks in VME crate is used as IOC (Input/Output controller) to integrate control device into DCS system. Now, the emerging of SOC chip makes the lower cost and more flexible IOC possible. Also, for the sake of reducing cost, the real-time Linux is another option of OS on IOC.

There are two kinds of task on IOC, one is the information exchange task and the other is control task. The requirements of the two tasks are different. Control task requires that it executes at the exactly time. The information exchange task requires that it executes as frequently as it can. The control task may be interfered by frequently executed information exchange task. Generally, to guarantee the balance of these two tasks, we use CPU time planning. The upper limit of net access is set to guarantee the control task performance. Through CPU test, net test, application test, the performance of the embedded CPU is well studied, and the limit can be obtained. The performance and limit are useful for system designing. After these test, different embedded CPU is selected for different application.

The test can be standardized for other application in our system. The test tools are free, and are useful for other system.

## **PS3-16: An FPGA Based GEMROC ASIC Readout System**

B. Mindur, W. Dabrowski, T. Fiutowski, P. Wiacek, A. Zielinska

AGH University of Science and Technology, Krakow, Poland

A Gas Electron Multiplier Readout Chip (GEMROC) is an Application Specific Integrated Circuit (ASIC) dedicated to process signals generated inside a GEM detector. The GEMROC ASIC is to be used as a part of a Proton Range Radiography (PRR) system being developed at CERN. Nevertheless before frontend chips can be employed in the final application one has to verify their performances during variety of tests. The tests should be done using an environment very similar to one in which they will be working, therefore a dedicated readout system is needed. In this paper we present an FPGA Ethernet based compact readout system dedicated for GEMROC ASICs. The main requirements for the ASICs data acquisition system (DAQ) are as follows: a) simulations reading out up to 8 ASICs (4 per each detector plane), b) transferring the data to a host PC with commercially available interface, c) providing the slow control signals as well as a clock synchronization, d) online data processing and event reconstructing, e) compactness of the overall system with an affordable price. Above constraints lead us to utilize commercially available FPGA mezzanine board namely FXT70 Mini Module Plus from Silica which is to be plugged to custom ADC baseboard hosting a fast ADC with interconnection to two dedicated ASICs boards. The ADC and ASIC boards were design in such a way that they well fit to each other and can be further directly connected to GEM detector. The signals acquired by the GEM detector are transferred to the GEMROC ASICs where the amplification and shaping processes are performed independently for all input channels. The amplitudes of that signals are stored in analogue FIFOs and thereafter transferred to the ADC for the digital FIFOs when the derandomization and zero suppression operations take place. The digital data are sent out of the GEMROC using 8-bit width LVDS data bus connected to the FPGA. A single FPGA is

processing in parallel the digital data from four ASICs together with digitized analogue signals amplitudes from ADC. The system is running with a 125 MHz main clock. Two such boards are used in order to readout two GEM detector planes. The so-called X and Y detector planes signals amplitudes together with 8 ns LSB timestamp information allows us to make a 2D imaging (using centre of gravity algorithm for event reconstruction) of the ionization particles passing through an inner active volume of the GEM detector. The overall system consists of two synchronized FPGA-ADC boards connected to 4 ASICs mezzanines housing 8 GEMROCs. On the other hand two FPGAs are connected to the host PC using 1 Gbps Ethernet link each. The DAQ PC is equipped with dedicated C++ based software which is responsible for a configuration of the FPGAs and ASICs settings, storing all the incoming data as well as online reconstruction of the 2D events.

## PS3-23: Development of New Data Acquisition System at Super-Kamiokande for Nearby Supernova Bursts

T. Tomura<sup>1</sup>, Y. Hayato<sup>1</sup>, M. Ikeno<sup>2</sup>, M. Nakahata<sup>1</sup>, S. Nakayama<sup>1</sup>, Y. Obayashi<sup>3</sup>, K. Okumura<sup>4</sup>, M. Shiozawa<sup>1</sup>, S. Y. Suzuki<sup>2</sup>, T. Uchida<sup>2</sup>, S. Yamada<sup>5</sup>, T. Yokozawa<sup>1</sup>

<sup>1</sup>Kamioka Observatory, Institute for Cosmic Ray Research, University of Tokyo, Kamioka, Gifu, Japan

<sup>2</sup>High Energy Accelerator Research Organization (KEK), Tsukuba, Ibaraki, Japan

<sup>3</sup>Kavli Institute for the Physics and Mathematics of the Universe, University of Tokyo, Kashiwa, Chiba, Japan

<sup>5</sup>Research Center for Neutrino Science, Tohoku University, Sendai, Miyagi, Japan

Super-Kamiokande (SK), a 50-kiloton water Cherenkov detector, is one of the most sensitive neutrino detectors. SK can be used also for supernova observations by detecting neutrinos generated at supernova. In order to improve the performance of the detector for supernovae, we are developing two new features, one for recording all information within one minute and the other for recording calorimetrical information for nearby supernovae.

The current SK data acquisition (DAQ) system reads out all the photomultiplier tube (PMT) hits, including the dark noise, and applies software trigger to select events to record. Therefore, the PMT hits caused by very low energy events below the threshold are not stored. Since

supernova burst is a very rare phenomenon and details of the burst mechanism are not known yet, all possible data should be recorded without any bias in the trigger system. To accomplish this, we are adding a new feature to the DAQ system to record all the PMT hit information before and after the burst occurs for about one minute.

According to the simulation study based on the Livermore model, the neutrino burst from a supernova farther than about 1300 light years can be recorded without loss of data by the current DAQ system. However, if a supernova burst occurred within a few hundreds of light years, the neutrino event rate can be more than 30 MHz and the system can record only about first 20% of the events. To overcome this inefficiency, we are developing a new DAQ system that can handle such high-rate neutrino events. This new DAQ system records the number of hit PMTs so that we can count the neutrinos and obtain a time profile of the number of neutrinos emitted at the supernova.

We will present the implementation of these improvements. The results of the tests with the final prototype before the mass production will be shown.

## PS4-11: Superconducting Cavities Automatic Loaded Quality Factor Control at FLASH

W. Cichalewski<sup>1</sup>, J. Branlard<sup>2</sup>, H. Schlarb<sup>2</sup>, N. Walker<sup>2</sup>, J. Carwardine<sup>3</sup>

<sup>1</sup>Technical University of Lodz, Lodz, Poland

<sup>2</sup>MSK, Deutsches Elektronen Synchrotron, Hamburg, Germany

<sup>3</sup>Argonne National Laboratory, Argonne, USA

The free electron laser accelerator in Hamburg (FLASH) consists of superconducting TESLA cavities controlled through their vector-sum. In this approach, the Low Level Radio Frequency (LLRF) control system includes one feedback controller driving a single microwave klystron providing RF power to 8-16 cavities. The main goal of the LLRF controller is to optimize the accelerating field parameters for best beam acceleration in the superconducting structures. This task is challenging both from the control theory point of view and from taking into account real system limitations and cavity to cavity operating parameters spread. Making use of other actuators, such as the cavity loaded quality factor, QI, can be beneficial to optimize field parameters in each resonator. The paper focuses on the control of the superconducting cavities QI by means of automatic adjustments of the input power antenna which couples the RF power to individual cavity. Tuning QI for each cavity allows for a better control of individual accelerating fields. The paper includes a description of the approach that has been used at FLASH to implement automatic loaded Q tuning algorithms in the LLRF control system of a single RF station. Different issues concerning not only the coupler antenna control but also quality factor optimization are also included. Additionally, tests results are presented together with a description of the operational experience with QI tuning algorithms during a regular accelerator run.

# **PS3-2**: A Prototype of Underground Muon Counters Triggered from the Water Cherenkov Surface Detectors Built on Unified Altera Platform

Z. Szadkowski

Department of Physics and Applied Informatics, University of Lodz, Lodz, Poland

The aim of the additional underground muon counters is an investigation of Extensive Air Showers at energies lower than those accessible with the standard Auger array, where the transition from galactic to extragalactic sources is expected. The paper describes the prototype of the Master/Slave (standard Auger surface detector triggered the underground muon counters) synchronous data acquisition system with 80/320 MHz sampling in the surface/underground segments built on unified Altera platform - CycloneIII/CycloneIV FPGAs with implemented NIOS processors in each segment. NIOS processors eliminate external micro-controllers and allow generating necessary interfaces: SDRAM controller, UART, SPI, DMA, previously implemented from logic elements. Moving several slow tasks from the logic block (coding in the AHDL) to the NIOS (coding in C) dramatically simplified the system and increased its flexibility. 100 Hz T1 trigger rate remains sufficient time margin for all processors managing by the soft-core NIOS. Splitting 64 input channel just after fast input FPGA registers clocked by 320 MHz into 128-bit bus with twice lower clock allows achieving global registered performance of 160 MHz for the entire trigger/memory circuitry. NIOS processors communicate each other via UART protocol, however with LVDS standard. Underground CycloneIV FPGA is programmed remotely via additional MAXII CPLD with non-volatile programming memory. Tests showed that a full synchronous cycle: a

<sup>&</sup>lt;sup>4</sup>Research Center for Cosmic Neutrinos, Institute for Cosmic Ray Research, University of Tokyo, Kashiwa, Chiba, Japan

transfer of the trigger with a time stamp from the surface detector into the underground segment via galvanic insulated dedicated line, freezing data from 64 channels at 320 MHz sampling in an internal DPRAMs, writing/reading data into/from external SDRAM, extraction physical data identified by sent from Central Data Acquisition System (CDAS) GPS time stamps and its transfer from the underground NIOS via surface NIOS to CDAS, is successful.

# **PS3-32: Implementation of the Disruption Predictor APODIS in JET Real Time Network Using the MARTe Framework**

<u>J. M. Lopez</u><sup>1</sup>, J. Vega<sup>2</sup>, D. Alves<sup>3</sup>, S. Dormido-Canto<sup>4</sup>, A. Murari<sup>5</sup>, J. M. Ramirez<sup>4</sup>, R. Felton<sup>6</sup>, M. Ruiz<sup>1</sup>, G. D. Arcas<sup>1</sup>, and JET-EFDA Contributors<sup>7</sup>

<sup>1</sup>CAEND, Universidad Politecnica de Madrid., Madrid, Spain

<sup>2</sup>Asociacion EURATOM CIEMAT para Fusion, Madrid, Spain

<sup>3</sup>Associacao EURATOM/IST, Instituto de Plasmas e Fusao Nuclear. Instituto Superior Tecnico, Univ. Tecnica de, Lisboa, Portugal

<sup>4</sup>Dpto. Informatica y Automatica, Universidad de Educacion a Distancia, Madrid, Spain

<sup>5</sup>Consorzio RFX-Associazione EURATOM ENEA per la Fusione, Padova, Italy

<sup>6</sup>EURATOM/CCFE Fusion Association, Culham Science Center OX14 3DB, Abingdon, United Kingdom

<sup>7</sup>See Appendix of F. Romanelli et al Proc. 23rd IAEA Fusion Energy Conference 2010, Daejeon, Korea

Disruptions in tokamaks devices are unavoidable, and they can have a significant impact on machine integrity. So it is very important have mechanisms to predict this phenomenon. Disruption prediction is a very complex task, not only because it is a multi-dimensional problem, but also because in order to be effective, it has to detect well in advance the actual disruptive event, in order to be able to use successful mitigation strategies. With these constraints in mid a real-time disruption predictor has been developed to be used in JET tokamak. The predictor has been designed to run in the Multithreaded Application Real-Time executor (MARTe) framework. The predictor Advanced Predictor Of DISruptions (APODIS) is based on Support Vector Machine (SVM). The implementation uses seven relevant measurements, e.g. Plasma current, Mode lock amplitude, etc. These signals are processed using 32 ms time windows with a sampling frequency of 1 kHz. Various features are calculated (mean value and standard deviation of the FFT, without first component). The real-time implementation has been validated using the JET database signals obtaining an equivalent performance to the one of the off-line prediction algorithm. These results show that the system is able to predict a disruption 30 ms in advance with a hit rate of 90%. It is estimated that 30 ms is a sufficient time to take protective actions. The system has been implemented on a six core x86 architecture with an ethernet Network Interface Card (NIC) for remote administration and introspection and an Asynchronous Transfer Mode (ATM) NIC handling all real-time I/O within the JETs Real Time Data Network (RTDN). It is a user-space application running on a mainstream Linux vanilla kernel and implemented using MARTe. Real-time performance has been achieved by combining available Central Processing Units (CPU) isolation and Interrupt ReQuests (IRO) routing mechanisms. Preliminary results of the systems prediction and real-time performances will be presented as well as the influence of the MARTe framework on the development and integration of the system into JETs distributed philosophy for real-time experiment control.

#### **PS4-18: Recent Developments in Control Software for Optical Synchronization Applications at DESY** P. Prędki, T. Kozak, A. Napieralski

Department of Microelectronics and Computer Science, Technical University of Lodz, Lodz, Poland

Proper operation of FELs such as the Free-Electron Laser in Hamburg (FLASH) and the European X- Ray Free-Electron Laser (XFEL), which is currently under construction in Hamburg at DESY, requires many specific subsystems to be synchronized with a precision exceeding 10 femtoseconds. Those components are often separated by several hundred meters or even kilometers, as in the case of the European XFEL. Such distances mean that it is extremely difficult to use only conventional RF signal distribution in coaxial cables for synchronization because of high losses and phase drifts. Electromagnetic interference is also an issue. As an alternative solution, a laser-based synchronization scheme can be employed in parallel. In this case, the signals are transmitted via stabilized optical fibers. Such an architecture is currently being used at FLASH and will also be the main means of synchronization at the European XFEL. The hardware for such a synchronization system consists of many optical elements such as commercial lasers and self-built free-space and fiber optic setups. However, a significant part of it is also the electronics responsible for control, diagnostics and signal processing. Currently, the VME standard is used throughout FLASH for the majority of the control system digital hardware infrastructure. For the European XFEL, however, an architecture with a high level of reliability and availability is required. Because of that, the Micro Telecommunications Computing Architecture (TCA) had been chosen. It is a fairly new standard and it provides significantly better performance and employs modern technological solutions making it more suitable than the older VME architecture.

This paper focuses on the development of specialized control software applied to phase-lock the various lasers and fiber link stabilization units used in the laser-based synchronization system at FLASH. The presented software solutions are hardware-independent and the code is portable to any architecture able to support the Distributed Object Oriented Control System (DOOCS) used at DESY. Therefore tests of the software can be thoroughly performed at FLASH and later seamlessly moved to operate in the European XFEL environment. In this article, the authors first describe the basic block used in all the applications which is a proportional-integral-derivative (PID) regulator implemented in a Texas Instruments Digital Signal Processor (DSP). Later, they focus on the more advanced features of the phase-locking software such as automatic switching between reference signals, error-recovery routines, automatic signal discovery, and tuning. The software can also be used to measure and characterize features of the optical hardware such as the timing jitter of the locked lasers or the arrival time delay for the link distribution units. The measurement results for some of the equipment used are also presented.

### FERT1: FPGA and Electronics Applied to Realtime Systems 1

### Thursday, June 14 17:20-18:20 Crystal Ballroom

## FERT1-1: Real-time measurement and adjustment of random phase in frequency-nondegenerate entanglement swapping experiment

<u>Z. Sang</u><sup>1,2</sup>, X. Jiang<sup>2,3</sup>, F. Li<sup>1,2</sup>, H. Zhang<sup>2,3</sup>, T. Zhao<sup>2,3</sup>, G. Jin<sup>1,2</sup> <sup>1</sup>State Key Laboratory of Particle Detection and Electronics, Hefei, Anhui, China <sup>2</sup>Dept. of Modern Physics, University of Science and Thechnology of China, Hefei, Anhui, China <sup>3</sup>Hefei National Laboratory for Physical Science at Microscale, Hefei, Anhui, China

The frequency-uncorrelated photons entanglement swapping experiment is an advanced strategy for quantum communication. It entangles the frequency-uncorrelated photons that never interacted. In the experiment we take two pairs of frequency-uncorrelated polarization entangled photons and subject one photon from each pair to a Bell-state measurement, also with a time interval measurement. The other two photons are coupled into fibers and guided to a phase adjustment component which compensates the phase between the frequency-uncorrelated photons, in order to verify the entangled state of them. The flying photons arent released until the phase adjustment component is ready, which is decided by the time interval measurement. The key of the experiment is to realize the Time interval measurement combining with real time calculating and feedback controlling. A special circuit has been developed for the experiment. The circuit which performs transferring the uncertain time interval into level between +/- 1 voltage outputting to the Electro-Optic Modulator (EOM) should run as soon as possible because of the optical fiber loss of the flying photon. The EOM is the phase adjustment component in optical path using Pockels effect. Besides, the precision of the time-digital-converter (TDC) decides the contrast ratio of the experimental results. Therefore, the circuit should perform as high precision TDC as well as very short respond time for achieving the experimental target. This article reports the layout and characterization of the fast phase adjustment circuit. A precision TDC based on field programmable gate array (FPGA) is implemented with a Least Significant Bit of about 40 ps (an RMS resolution less than 20 ps). And the response time of the whole system is less than 110ns.

## **FERT1-2: A Compact Dosimeter for Space Applications**

C. Deneau<sup>1</sup>, J.-R. Vaille<sup>1,2</sup>, F. Bezerra<sup>3</sup>, E. Lorfevre<sup>3</sup>, R. Ecoffet<sup>3</sup>, L. Dusseau<sup>1</sup> <sup>1</sup>Institut d'Electronique du Sud, Universite Montpellier 2, Montpellier, France <sup>2</sup>Universite de Nimes, Nimes, France <sup>3</sup>Centre National d'Etudes Spatiales, Toulouse, France

Since the discovery of radiation belts in 1958 by James Van Allen and the loss of several satellites as Telstar in 1962, radiation effects has become a major concern during the development of space vehicles. To address this concern, Universite Montpellier 2 is developing a dosimeter dedicated to the measurement of the space radiation environment.

This radiation sensor, so called "OSL sensor", is able to measure online both Total Ionizing Dose (TID) and Displacement Damage Dose (DDD). It is based on the Optically Stimulated Luminescence (OSL) properties of a radiation sensitive Rare-earth-doped alkaline sulfide. When an ionizing particle passes through this material, it creates a large amount of trapped carriers. Some charges remain trapped on the localized defects after irradiation. An InfraRed (IR) Light Emitting Diode (LED) stimulates the trapped species, releasing them. Finally, recombination of a fraction of the trapped charge induces a green light emission, proportional to the dose which is collected by means of a photodiode making it possible to evaluate the energy absorbed by the dosimeter. The photodiode current is amplified by a front end measure chain. The sensor main characteristics are a relatively small DLL package (33x20x14 mm3), a power consumption close to zero except for the 4 seconds reading time, a TID and DDD measurement threshold respectively below 1 mGy and 2x10<sup>4</sup> MeV.g-1.

The sensor is made of Components Off The Shelf (COTS). The LED is very sensitive to displacement damage dose, whereas the photodiode does not exhibit any significant radiation effect. The amplifiers were qualified to radiation up to 300 Gy. In order to harden the sensor from a "circuit" point of view, it is necessary to compensate the degradation of the LED. To do so, a feedback loop was implemented. The current monitoring through the LED yields information on DDD received.

The OSL sensor is currently onboard various in-flight experiments, such as JASON-2 for more than 4 years and SAC-D more recently. This recent return of experience, suggested significant improvements could be made. The OSL material was then encapsulated and a new package was designed to make the OSL sensor more reliable and protect it from the ambient parasitic light. A PIC microcontroller was also integrated to make the sensor autonomous, flexible and easier to use for any end user. The addition of a blue excitation LED endows the sensor with an internal self testing capability to verify proper operation after integration or during flight. Finally, a temperature sensor was implemented which provides accurate input for temperature effect correction on dose measurements.

#### FERT1-3: A Low-Resolution, GSa/s Streaming Digitizer for a Correlation-Based Trigger System

K. Nishimura<sup>1</sup>, M. Andrew<sup>1</sup>, Z. Cao<sup>1</sup>, M. Cooney<sup>1</sup>, P. Gorham<sup>7</sup>, L. Macchiarulo<sup>1</sup>, L. Ritter<sup>1</sup>, A. Romero-Wolf<sup>2</sup>, G. Varner<sup>1</sup> <sup>1</sup>Department of Physics and Astronomy, University of Hawaii at Manoa, Honolulu, HI, United States

<sup>2</sup>Jet Propulsion Laboratory, Pasadena, CA, United States

Searches for radio signatures of ultra-high energy neutrinos and cosmic rays could benefit from improved efficiency by using real-time beamforming or correlation triggering. For missions with power limitations, such as the ANITA-3 Antarctic balloon experiment, full speed high resolution digitization of incoming signals is not practical. To this end, the University of Hawaii has developed the Realtime Independent Three-bit Converter (RITC), a 3-channel, 3-bit, streaming analog-to-digital converter implemented in the IBM-8RF 0.13 um process. RITC is primarily designed to digitize broadband radio signals produced by the Askaryan effect, and thus targets an analog bandwidth of >1 GHz, with a sample-and-hold architecture capable of storing up to 2.6 gigasamples-per-second. An array of flash analog-to-digital converters perform 3bit conversion of sets of stored samples while acquisition continues elsewhere in the sampling array. A serial interface is provided to access an array of on-chip digital-to-analog converters that control the digitization thresholds for each channel as well as the overall sampling rate. Demultiplexed conversion outputs are read out simultaneously for each channel via a set of 36 LVDS links, each running at 650 Mb/s. We describe the design architecture of RITC and report on current testing and performance results of the ASIC, including prospects for the use of this architecture as the analog half of a novel triggering system for the ANITA-3 ultra-high energy neutrino experiment.

#### **PS3: Poster Session 3**

#### Thursday, June 14 16:00-17:20 Boiler room

# PS3-1: Readout Electronics and Data Acquisition of a Time of Flight Detector for Positron Emission Tomography

J. Y. Yeom<sup>1</sup>, V. Španoudaki<sup>1</sup>, K. J. Hong<sup>1</sup>, C. S. Levin<sup>2,3</sup>

<sup>1</sup>Molecular Imaging Program at Stanford, Department of Radiology, Stanford University, Stanford, CA, United States <sup>2</sup>Department of Physics, Stanford University, Stanford, CA, United States <sup>3</sup>Department of Electrical Engineering, Stanford University, Stanford, CA, United States

Time-of-Flight (ToF) information in Positron Emission Tomography (PET) can contribute to a significant improvement in the reconstructed image signal to noise ratio, enabling image contrast improvement, a reduction in patient radiation dose, and/or shorter scan times. We have recently developed a multi-element SiPM (Silicon photomultiplier) based block detector module for ToF PET. In this study, the detector, readout electronics and data acquisition are described, and a preliminary characterization of the detector module is presented. The detector module is based on a 4 x 4 array of LYSO-SiPM elements (Hamamatsu MPPC S10931-050P) read out by individual wideband RF amplifier to maximize timing performance. To preserve the fast signal waveform of the detector and extract relevant information from the data, each element is digitized with a channel of the high speed CAEN V1742 (32 channels, 5 GHz sampling, 12-bit amplitude resolution) waveform digitizer. As the digitizer is unable to trigger on itself, a trigger board to output a fast pulse that triggers the digitizer whenever any pixel of the detector detects a signal has also been fabricated.

To assess the performance of one of the modules, a 4 x 4 LYSO scintillator array (3 x 3 x 5 mm<sup>3</sup> elements) was coupled with optical grease to the photodetectors and energy resolution measurements were performed using a Ge-68 source. The energy spectra for each channel acquired and the photopeak resolution versus overvoltage has been measured. The energy resolution, not corrected for non-linearity effects, varied from 14.0 + 0.8 % to 7.7 + 1.6 % for overvoltage range from 0.8 V to 1.6 V. Results from one channel have been compared for the case of a high speed oscilloscope and the CAEN digitizer. The largest variation in energy resolution is 4.7 % between those two cases. We will present results for the timing resolution of the detector module used in conjunction with the CAEN V1742 digitizer.

# **PS3-2:** A Prototype of Underground Muon Counters Triggered from the Water Cherenkov Surface Detectors Built on Unified Altera Platform

Z. Szadkowski

Department of Physics and Applied Informatics, University of Lodz, Lodz, Poland

The aim of the additional underground muon counters is an investigation of Extensive Air Showers at energies lower than those accessible with the standard Auger array, where the transition from galactic to extragalactic sources is expected. The paper describes the prototype of the Master/Slave (standard Auger surface detector triggered the underground muon counters) synchronous data acquisition system with 80/320 MHz sampling in the surface/underground segments built on unified Altera platform - CycloneIII/CycloneIV FPGAs with implemented NIOS processors in each segment. NIOS processors eliminate external micro-controllers and allow generating necessary interfaces: SDRAM controller, UART, SPI, DMA, previously implemented from logic elements. Moving several slow tasks from the logic block (coding in the AHDL) to the NIOS (coding in C) dramatically simplified the system and increased its flexibility. 100 Hz T1 trigger rate remains sufficient time margin for all processors communicate each other via UART protocol, however with LVDS standard. Underground CycloneIV FPGA is programmed remotely via additional MAXII CPLD with non-volatile programming memory. Tests showed that a full synchronous cycle: a transfer of the trigger with a time stamp from the surface detector into the underground segment via galvanic insulated dedicated line, freezing data from 64 channels at 320 MHz sampling in an internal DPRAMs, writing/reading data into/from external SDRAM, extraction physical data identified by sent from Central Data Acquisition System (CDAS) GPS time stamps and its transfer from the underground NIOS via surface NIOS via surface NIOS to CDAS, is successful.

### PS3-3: Design of the Trigger Interface and Distribution Board for CEBAF 12 GeV Upgrade

W. Gu, D. Abbott, C. Cuevas, G. Heyes, E. Jastrzembski, B. Moffit, B. Raydo, J. Wilson, H. Dong, S. Kaneta, N. Nganga, C. Timmer, V. Gyurjyan

Physics, Jefferson Lab, Newport News, Virginia, United States

The design of the Trigger Interface and Distribution (TID) board for the 12 GeV upgrade at the Continues Electron Beam Accelerator Facility (CEBAF) in TJNAL is described. The TID board distributes a low jitter system clock, synchronized trigger, and synchronized multi-purpose SYNC signal. The TID also initiates data acquisition for the crate. With the TID boards, a multi-crate system can be setup for experiment test and commissioning. The TID board can be selectively populated as a Trigger Interface (TI) board, or a Trigger Distribution (TD) board for the experiments. When the TID is populated as a TI, it can be located in the VXS crate and distribute the CLOCK/TRIGGER/SYNC through the VXS P0 connector; it can also be located in the standard VME64 crate, and distribute the CLOCK/TRIGGER/SYNC through the VME P2 connector or front panel. It initiates the data acquisition for the front crate where the TI is positioned in. When the TID is populated as a TD, it fans out the CLOCK/TRIGGER/SYNC from trigger supervisor to the front erates through optical fibres. The TD monitors the trigger processing on the TIs, and gives feedback to the TS for trigger flow control. Field Programmable Gate Arrays (FPGA) is utilised on TID board to provide programmability. The TID boards were intensively tested on the bench, and various setups.

## **PS3-4:** A Correlation Measurement System for Ghost Imaging Experiment

L. Chen<sup>1,2</sup>, M. Zheng<sup>1,2</sup>, L. Zhang<sup>1,2</sup>, G. Jin<sup>1,2</sup>

<sup>1</sup>Department of modern physics, University of Science and Technology of China, Hefei, Anhui, China <sup>2</sup>State Key Laboratory of Particle Detection and Electronics, Hefei, Anhui, China

Ghost imaging, also known as correlated imaging, is a method to non-locally image an object by transmitting pairs of entangled photons through the object and a reference path respectively. The entangled two photons generated simultaneously are correlated in coordinate and momentum space, thus the sharp image of the object can be achieved by coincidence measurement between the signal (with object) and idler (without object) paths. Recently, the theory of entangled multiphoton ghost image which can improve the spatial resolution beyond the Rayleigh diffraction limit was put forward, then the corresponding measurement electrical device is necessary.

Some special requirements arise for the measurement device designed for entangled multiphoton ghost image. On the one hand, the production of entangled multiphotons is low, so it is very important to decrease the accidental coincidence counting rate. On the other hand, in order to obtain high spatial resolution of the image, the scanning precision of the reference path should be taken into account. A correlation measurement system based on high precision FPGA TDC is designed for the entangled multiphoton ghost image experiment. The time resolution of the system which depends on the contrast ratio of the experimental results is about 40ps. The spatial resolution of the system can simplify the debugging process. The offline data process can reduce as far as possible the influence of accidental coincidence.

# PS3-5: Design and Implementation of DAQ Readout System for Daya Bay Reactor Neutrino Experiment

X. Ji, F. Li, K. Zhu

Institute of High Energy Physics, Chinese Academy of Sciences, Beijing, China

The Daya Bay Reactor Neutrino Experiment will consist of seventeen separate detector subsystems distributed in three underground experimental halls. There will be eight PMT based anti-neutrino detectors (ADs), six water-Cherenkov detectors, and three RPC detector subsystems. Each detector will be read out using an independent VME crate. The data acquisition (DAQ) readout system (ROS) reads data fragments from electronics and trigger modules then concatenates them into an event in each crate. A detailed design and implementation of the DAQ readout system for Daya Bay Reactor Neutrino Experiment will be presented.

### **PS3-6: ATLAS IBL BOC Prototype Evaluation**

N. Schroer

ZITI - University of Heidelberg, Heidelberg, Germany

In 2013 an additional layer, the Insertable B-Layer (IBL) will be added to the pixel detector of the ATLAS experiment at the LHC at CERN. For this fourth and innermost layer 448 newly developed pixel sensor readout chips (FE-I4) are used which will provide data from about 12 million pixel. For the readout of the IBL new off-detector electronic components are needed as the FE-14s feature an increased readout bandwidth which can not be handled by the current system. To provide a degree of backward compatibility the new system will keep the structure of VME card pairs. The back of crate card (BOC) establishes the optical interfaces to the detector front end as well as to the read out system (ROS) while the read out driver (ROD) manages data processing and calibration. Both cards, the BOC and the ROD, have been redesigned and feature modern FPGA technology, yielding an integration four times higher than the current system. Regarding the new BOC this is achieved by replacing custom made optical and electrical (e.g. ASICs) components by commercial available ones and by integrating most of their functionalities into the FPGAs. Based on the first prototype we present details of the analysis of the design choices of the new BOC. The hardware components used to provide all the needed functionalities are tested and results will be shown. To evaluate commercial transmitter components versus the former used custom made ones, both have been placed on the prototype and comparisons can be done. The higher data rate and the support for a Fast TracKer system (FTK) made it necessary to implement the SLINK protocol, which is used to connect to the BOC to the ROS, into the FPGAs. For the optical connection Quad Small Form-factor Pluggable (QSFP) modules are evaluated as a substitute for the formerly used SFP transceivers. Furthermore the firmware blocks used to implement all needed tasks will be presented. Synchronization and 8b10b decoding is done for the incoming FE-14 data stream (data from the previous FE-13 chips is not encoded) and individual delay adjustment and BPM encoding is performed for the command channels to the detector front end in the Board Main FPGAs (BMF). The Board Control FPGA (BCF) is receiving control signals from the ROD via the VME backplane and is managing the configuration of the other two FPGAs and the PLL for the readout clock. Tests with the BOC prototype include communication tests with the new ROD prototype and first configuration and readout of an attached FE-I4 chip.

## PS3-7: High-Speed Data Acquisition System of Microwave Reflectometry Based on LabVIEW for Long Pulse Operation

S. Li, Y. Chen, F. Wang, Y. Wang, W. Huang, X. Sun

Computer Application Division, Institute of Plasma Physics, CAS, Hefei, Anhui, China

For the 2012 EAST experimental campaign, a high-speed data acquisition system for microwave reflectometry has been newly developed to measure the density profile of the plasma. By analysing this microwave reflectometry, the temporal and spatial resolutions are fast up to 10us, and less than 1cm respectively, so it processes high temporal and spatial resolution ratio. At the same time, in long pulse, it need run for a long time. To meet the high-speed and continue uninterrupted data acquisition and storage for microwave reflectometry, this new data acquisition system adopts the high-speed stream-to-disk technology from NI Company, so that the bottleneck for an acquisition or generation is no longer the bus, but actually reading or writing the data to the system storage. The main program of the system was developed based on LabVIEW, and this graphical programming language simplifies the overall complexity of the development of system. This paper will present the implementation of this high-speed data acquisition system and its operation results.

#### **PS3-8:** Clock Distribution Board for the $4\pi\beta\gamma$ Coincidence Counting System

<u>H. Wang</u><sup>1,2</sup>, K. Song<sup>1,2</sup>, J. Yang<sup>1,2</sup>, P. Cao<sup>1,2</sup>, K. Zhang<sup>1,2</sup>

<sup>1</sup>Department of Modern Physics, University of Science and Technology of China, Hefei, China <sup>2</sup>the State Key Laboratory of Particle Detection and Electronics, Hefei, China

Abstract: Clock distribution system is designed for Digital Coincidence Platform (DCP) which is used in  $4\pi\beta(LSC)$ - $\gamma$  national benchmarks of the China Institute of Metrology. Traditional coincidence device is analog signal processing system, and can only deal with the slow signal. Therefore we develop and implement a high speed DCP for the  $4\pi\beta(LSC)$ - $\gamma$  counting system which digitize the corresponding analog signals, store them on a computer, and use of software for the calculation later. This method simplifies the measurement process and improves the measurement uncertainty. The digital coincidence device is based on the 3U PXI chassis, including the data acquisition board and clock distribution board. The clock distribution board providing accurate clock distribution for the platform is the premise of realization of system coincident indicators. The primary task of clock distribution system is to produce high precision, low jitter and stable clocks and send them to all acquisition board. It also accept the control of the zero slot software to produce a synchronous trigger signal with the clock to generate a precise synchronization signal in each acquisition board. It also comes with a sine wave signal generator for acquisition board for self-test. This paper mainly discusses the system architecture and implementation of the clock distribution system. Through a dedicated PLL chip, it provide 4-way 62.5M clock signal to four acquisition board, including two fast acquisition board and two slow acquisition board. The PLL chip will also generate a clock to the Field programmable gate array (FPGA). With this clock, FPGA will trigger a synchronization signal to acquisition board through the PXI star trigger bus to synchronize the system sampling. Sine wave signal for system test is generated by the DDS chip, the input clock of DDS is also generated by PLL chip. So PLL chip with a total of 6-way clock output, is a core part of the system. In communication with the zero slot, the system uses a PCI-CORE which is provided by ALTERA Corporation, and thus avoid the use of a dedicated PCI interface chip, saving board space. The results show that the output to output skew of the clock is less than 50ps and absolute output jitter is less than 300fs at 62.5M which meet the DCPs clock demand.

#### PS3-9: Implementation of High-Speed USB Interface in Data Acquisition System for KTX

<u>W. Lv</u><sup>1,2</sup>, K. Song<sup>1,2</sup>, J. Yang<sup>1,2</sup>, P. Cao<sup>1,2</sup>, L. Dong<sup>1,2</sup>

<sup>1</sup>Department of Modern Physics, University of Science and Technology of China, Hefei, China <sup>2</sup>the State Key Laboratory of Particle Detection and Electronics, Hefei, China

In this paper, a High-Speed USB implementation in the Data Acquisition unit intended for a reversed field pinch (RFP) experiment device Keda Torus eXperiment (KTX) is presented.

RFP is an important toroidal magnetic confinement device, which has been suggested as one of the attractive paths to fusion reactor. Compared with other toroidal configurations, such as tokamak and stellarator, the distinctive feature of RFP is its weak toroidal magnetic field. This weak magnetic field yields a string of potential reactor advantages, such as normal magnets, high engineering beta and high mass power density. KTX is a building RFP device, which is included in ITER. KTX has a considerate size with MST (in America) and is only after RFX (in Italy). Compared to MST or RFX, KTX will adopt the advanced RFP concept and ensure its unique features.

Among those assistant systems for KTX, data acquisition system is an important electronic system part. Original design of KTX demands almost 600 channels and the maximum sampling rate of 2MSPS/s, which requires real-time transmission and storage for the huge amounts of data from an acquisition unit to PC.

There are some ways for connecting an acquisition unit to PC, among them USB port is attractive due to the speed and to the easy access, USB2.0 supports transfer rates up to 480Mbps and has attributes of plug-and-play. Typical hi-speed USB hard drives can be written to at rates around 200~240Mbps, and read from at rates of 240~336Mbps according to routine testing done by CNet. In our system we make it a good choice.

The aim of the High-Speed USB interface implementation is to assure fast data transmission between the acquisition unit and a PC. The CY7C68013 chip of CYPRESS is used as an interface controller. The software is developed in PC for testing the application. A prototype of KTX with 64 channels has been made before KTX is realized, sample rate of which is 16ksps and data length 24bits. Transmission speed is tested based on the prototype, and we find that the stable speed of USB2.0 transmission can achieve 123Mbps, which is

not as fast as maximum 480Mbps for USB2.0. Conclusion is made in the end in order to find reason and reach faster transmission.

# **PS3-10:** An FPGA-Based Readout Module for the DAQ Subsystem of the DSSC Detector at the European XFEL

#### T. Gerlach, A. Kugel

Institute for Computer Science (ZITI), Heidelberg University, Mannheim, Germany

The DSSC collaboration is developing an instrument to detect synchroton X-rays (E > 0.5 keV) at the European XFEL. The DEPFET based sensors with integrated signal compression will be read out by 16 ASICs per sensor module. During the XFEL bursts (600 us), data are acquired at a rate of up to 4.5 MHz, and subsequently read out during the 99.4 ms long burst gaps. Two detector specific, FPGA based modules, the I/O Board (IOB, Spartan-6), and the Patch Panel Transceiver (PPT, Kintex-7), form the DAQ readout chain of the DSSC detector. Each of the 16 sensor modules is provided with an IOB in close proximity to the sensors. An IOB concentrates the ASIC data into four serial 3.125 Gb/s data links. It also performs several controlling tasks, such as switching the sensor voltages for minimizing power consumption during the readout phase. A PPT serves as a master to the four IOBs of one detector quadrant. It receives the timing and control information from the XFEL timing system, and delivers DSSC specific control commands to the IOBs, and readout ASICs. It also concentrates the 16 x 3.125 Gb/s links of four IOBs (payload data rate ~ 1.1 Gb/s per link) into four 10 Gb/s optical SFP+ links, which connect to the central XFEL DAQ system.

We present the implementation of the IOB / PPT prototyping FPGA firmware, and the test environment.

The test setup comprises the IOB, and two custom developed FPGA boards (ATB and MPR2) to emulate basic ASIC and PPT functionality, respectively. The ATB houses a Spartan-6 FPGA, but also provides an interface to connect an ASIC prototype for realistic testing conditions. The Virtex-4 based MPR2 provides the high-speed transceivers for capturing the test data, and control interfaces to the IOB for sending commands to the IOB, and the ATB.

Data from the ATB are received by the input deserializer of the IOB's Spartan-6 device at a speed of 350 Mb/s per link. A readout controller captures the incoming data words, which are buffered in a FIFO, and transmitted to the MPR2 via four 3.125 Gb/s high-speed transceivers

(GTPs) using the Xilinx Aurora protocol. Additional controller entities, which will be used to operate hardware devices (clock buffers, FET drivers, etc.) located on both, the IOB, and the sensor mainboard, have also been implemented. A register bank stores information about the configuration status of the peripheral hardware, and its controllers. It is accessible through a low-speed (50 MHz) command interface to the MPR2. The firmware is written for use in the final design (with small modifications).

A MicroBlaze soft core CPU on the MPR2 FPGA provides a software user interface, which allows accessing both, the local, and the IOB register bank for configuration. The serial high-speed data from the IOB is received by an Aurora RX core, and transmitted using Xilinx 10 GbE networking cores (XGMAC, XAUI) via an optical SPF+ interface to a local PC for basic error checks.

# **PS3-11:** Development of a High Resolution PXI Based Data Acquisition System for Electron Momentum Spectrometer

<u>Y. Huang</u><sup>1,2</sup>, S. Liu<sup>1,2</sup>, J. Wang<sup>1,2</sup>, X. Hu<sup>1,2</sup>, C. Feng<sup>1,2</sup>, Q. An<sup>1,2</sup>

<sup>1</sup>Department of modern physics, University of Science and Technology of China, Hefei, Anhui, China

<sup>2</sup>State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei, Anhui, China

A high resolution PXI based data acquisition system is developed for a new electron momentum spectrometer (EMS) constructed in the University of Science and Technology of China (USTC). This system concludes the detecting of two outgoing (scattered and ionized) electrons and the detecting of ion, which is for more research in collision dynamics.

The whole data acquisition system mainly consists of a 9-channels time measurement module and a 6-channels charge measurement module, which are assembled in 6U PXI chassis. Related amplifiers are placed near the spectrometer.

The high resolution measurement of charge involved charge sensitive preamplifier and the technology of digital peak detection. With the preamplifier constructed by discrete devices and charge measurement module, the accuracy of charge measurement can reach up to 0.4%. Using the technology of TDC implemented in a field-programmable gate arrays(FPGA) from Xilinx Vertex IV family, the bin size of time measurement can reach up to about 50ps, while the standard deviation is less than 20ps. Cooperate with CFDs and fast preamplifiers, the whole time measurement system obtains a time resolution of about 50ps RMS.

## PS3-12: A Fast Data Streaming System Using PCI Express for EAST Tokamak

F. Wang, S. Li, Y. Wang, X. Sun

Institute of Plasma Physics, Chinese Academy Sciences, Hefei, China

For the purpose of long time data acquisition with high speed, a continuous data acquisition system has been designed and developed for EAST tokamak which is based on the cPCI technology [1]. However the sampling rate of the system is limited at 100KSPS because of bandwidth and hardware environment. So we upgrade the system using PCI Express technology so that it has the capability of 500KSPS data streaming for the quasi steady-state operation of EAST tokamak.

The system consists of 6 data acquisition unit and each unit has a simultaneous data acquisition card ACQ196CPCI with 96 channels. We have upgraded the system mainly through the following aspects. First, a new Rear Transition Module RTM-T is adopted for each data acquisition card to promote the sampling rate from 100KSPS to 500KSPS. Second, a high performance industrial computer with PCI Express is connected to the RTM-T module using PCIe direct link. Third, we upgrade the network from 1gb to 10gb so that the fast data streaming from data acquisition unit to data storage is possible.

The system has been developed and will be demonstrated in 2012 campaign of EAST tokamak. The detailed results will be given in this paper. [1] Wang Feng, Member IEEE, Li Guiming, Li Shi, Zhu Yingfei and Wang Yong, A Continuous Data Acquisition System Based on CompactPCI for EAST Tokamak, IEEE TRANSACTIONS ON NUCLEAR SCIENCE, Vol. 57, Issues 2, pp. 669-672, Apr. 2010.

### **PS3-13: A High Speed High Resolution Digital Platform for the 4πβ-γ Coincidence Counting System** <u>K. Zhang<sup>1,2</sup></u>, K. Song<sup>1,2</sup>, J. Yang<sup>1,2</sup>, P. Cao<sup>1,2</sup>, H. Wang<sup>1,2</sup>

<sup>1</sup>Department of Modern Physics, University of Science and Techonology of China, Hefei, Anhui, China <sup>2</sup>the State Key Laboratory of Particle Detection and Electronics, Hefei, Anhui, China

The  $4\pi\beta$ - $\gamma$  coincidence counting method has been widely used in the activity standardization of radionuclides. The classical analog coincidence counting (ACC) system uses a set of analog electronic modules to acquire and analysis the radiation detector signals. The ACC system requires manual variation of parameters and repeated measures, which usually take hours to perform, thus makes it very difficult to standardize short-lived radionuclides. And those conventional analog modules are not capable to handle fast signals such as those from liquid scintillation (LS) detectors, hence the accuracy of measurement is reduced.

Many National Metrology Institutes (NMIs) have been working on introducing the rapidly developing digital technology into coincidence counting system to make the data acquisition process more easy and flexible. And we have developed and implemented a digital coincidence counting (DCC) platform based on PXI platform which digitalize and store the pulse-trains along with their corresponding timestamps from multiple input channels. In this design we use 8-bit 500 MSPS ADCs to acquire fast signals, 16-bit 62.5 MSPS ADCs to acquire normal signals, and use a delicate time generation and distribution system to synchronize each channel. We use FPGA (field programmable gate arrays) to perform a real-time data compression which identifies valid pulse-trains and marks them with timestamps, therefore greatly reduce the amount of data to be transferred and stored, which makes it possible to process multiple channels simultaneously. We use two DDR2 sdrams to perform a ping-pong process to buffer and transfer the compressed data, which enables the system to work continuously. The data of pulse-trains combined with their timestamps is finally transferred via CPCI bus to a local system controller to be stored for the later off-line analysis.

This digital coincidence counting (DCC) platform can process 8 channels (4 fast channels and 4 normal channels) at the same time with 1ns synchronization accuracy. The high speed and high resolution recorded data can be processed later via software using different parameters, which could greatly simplify the  $4\pi\beta$ - $\gamma$  coincidence counting measurement routine and improve the result accuracy. It could also allow newly improved techniques to be easily carried out and tested, make it an excellent hardware platform for the  $4\pi\beta$ - $\gamma$  coincidence counting system.

## **PS3-14:** The Readout Electronics of the Micromegas-Based Large Time Projection Chamber Prototype for the International Linear Collider

D. Calvet, D. Attie, D. Besin, P. Colas, R. Joannes, A. Le Coguie, S. Lhenoret, I. Mandjavidze, M. Riallot, W. Wang, E. Zonca CEA-IRFU, Saclay, France

This works presents the design, implementation and test of prototype modules of a Time Projection Chamber based on Micromegas amplification technology which was built in view of the future International Linear Collider. The main goals of this development are to investigate the performance of the detector and to demonstrate the feasibility of extremely compact and low power readout electronics. We based the front-end electronics on the AFTER chip, a 72-channel ASIC originally built for the T2K experiment, and devised new hardware, mechanics and cooling to read out the 1728 channels of a detector module while staying confined in the available area of ~220 cm2. Using a multi-board layer structure with high density solderless connectors, ASIC die wire-bounding and other space-saving techniques, we reach a density of ~8 channels per square centimetres. The thickness of the readout electronics for one module is around 4 cm. The digital part of a module is based on a Xilinx Virtex-5 FPGA which interfaces to the 24 AFTER chips used for the readout. It receives the data digitized from the front-end by a 6-channel ADC, temporary buffers data, applies zero-suppression and transfers event data to a remote data concentrator over a 2 Gbit/s optical link. To simplify design and reduce the overall cost, we use a commercial Virtex-5 FPGA characterization platform customized with specific add-ons to build a 12-optical ports data concentrator. Software running on the embedded PowerPC processor of the FPGA performs front-end configuration, event data gathering and transfer to the data acquisition PC over a standard Gigabit Ethernet link. We present the operation of a vertical slice of the complete detector and readout system along with results obtained in a test beam. In particular, our tests show that using a resistive layer in the Micromegas detector improves tracking resolution and allows operation without the spark protection circuit normally required on each readout channel. This simplifies design and is more compact. Finally, we explain how this setup is now being scaled-up to build a 7-module detector prototype which is the final goal of this R&D program.

#### PS3-15: Design of a Low-Noise Analog Signal Processing Circuit for CZT Detectors

B. Gan<sup>1</sup>, T. Wei<sup>1</sup>, W. Gao<sup>1</sup>, D. Gao<sup>1</sup>, Y. Hu<sup>2</sup>

<sup>1</sup>School of Computer Science and Technology, Northwestern Polytechnical University, Xi'an, Shaanxi, China <sup>2</sup>Institut Pluridisciplinaire Hubert Curien, University of Strasbourg/CNRS, Strasbourg, France

Cadmium zinc telluride (CdZnTe) crystal is a new semiconductor material which has a high performance in the radiation environment at roomtemperature. The CdZnTe detectors, which have several significant advantages such as high stopping power, good energy resolution and high spatial resolution, are one of the principal detector materials for the next-generation X-ray and  $\gamma$ -ray imagers.

As the most important part of the detector system, the low noise front-end readout circuits would have an important impact on the performances of the whole system. Since the CdZnTe detectors are usually used for the detection of X-ray and  $\gamma$ -ray, the input signals, which would be processed by the readout circuit, should exist in a high bandwidth. And the minimum signal must be an extremely weak signal, it may be closed to 2,000 e-. In order to ensure the signal to noise ratio (SNR) and sufficient range of the output signals, very low noise and very high gain of the front-end readout circuits are required. Thus, low noise is the most important performance requirements in the front-end ASIC design for CdZnTe detectors.

In this paper, the design of a 32-channel low-noise analog signal processing ASIC for CdZnTe detectors is implemented in TSMC 0.35 µm mixed-signal CMOS technology. And the design specifications of the proposed chip are as follows.

The input range is from 2,000 e- to 40,000 e-; Signal-to-Noise Ratio ( $\dot{S}N\dot{R}$ ) is larger than 10 and the equivalent noise charge (ENC) is less than 200 e-; The overall gain of the analog signal processing circuits is more than 10 V/pC, and the gain nonlinearity is less the 1%; Power consumption for each channel is less than 5 mW.

The ASIC mainly includes a charge sensitive amplifier (CSA), a CR-RC shaper amplifier and an output buffer. In this paper, the circuits of the preamplifier and the shaper are improved. And a low-noise front-end readout and analog signal processing circuits has been realized. The die size of the prototype chip is 4.9 mm 2.2 mm. The simulation results are listed in the following sentences. The ENC noise is 75e- + 10.25e-/pF for a shaping time of 1.5  $\mu$ s. The power consumption is 3 mW per channel. According to the above results, this proposed design can be applied in multi-channel detector readout systems. Moreover, the attractive results show us a reasonable possibility for X-ray and  $\gamma$ -ray imaging to detect above 10 keV and in particle and nuclear physics as well.

## **PS3-16: An FPGA Based GEMROC ASIC Readout System**

<u>B. Mindur</u>, W. Dabrowski, T. Fiutowski, P. Wiacek, A. Zielinska AGH University of Science and Technology, Krakow, Poland

A Gas Electron Multiplier Readout Chip (GEMROC) is an Application Specific Integrated Circuit (ASIC) dedicated to process signals generated inside a GEM detector. The GEMROC ASIC is to be used as a part of a Proton Range Radiography (PRR) system being developed at CERN. Nevertheless before frontend chips can be employed in the final application one has to verify their performances during variety of tests. The tests should be done using an environment very similar to one in which they will be working, therefore a dedicated readout system is needed. In this paper we present an FPGA Ethernet based compact readout system dedicated for GEMROC ASICs. The main requirements for the ASICs data acquisition system (DAO) are as follows: a) simulations reading out up to 8 ASICs (4 per each detector plane), b) transferring the data to a host PC with commercially available interface, c) providing the slow control signals as well as a clock synchronization, d) online data processing and event reconstructing, e) compactness of the overall system with an affordable price. Above constraints lead us to utilize commercially available FPGA mezzanine board namely FXT70 Mini Module Plus from Silica which is to be plugged to custom ADC baseboard hosting a fast ADC with interconnection to two dedicated ASICs boards. The ADC and ASIC boards were design in such a way that they well fit to each other and can be further directly connected to GEM detector. The signals acquired by the GEM detector are transferred to the GEMROC ASICs where the amplification and shaping processes are performed independently for all input channels. The amplitudes of that signals are stored in analogue FIFOs and thereafter transferred to the ADC for the digitization process. Simultaneously the timestamp and the position information (in a channel number format) are kept inside digital FIFOs when the derandomization and zero suppression operations take place. The digital data are sent out of the GEMROC using 8-bit width LVDS data bus connected to the FPGA. A single FPGA is processing in parallel the digital data from four ASICs together with digitized analogue signals amplitudes from ADC. The system is running with a 125 MHz main clock. Two such boards are used in order to readout two GEM detector planes. The so-called X and Y detector planes signals amplitudes together with 8 ns LSB timestamp information allows us to make a 2D imaging (using centre of gravity algorithm for event

reconstruction) of the ionization particles passing through an inner active volume of the GEM detector. The overall system consists of two synchronized FPGA-ADC boards connected to 4 ASICs mezzanines housing 8 GEMROCs. On the other hand two FPGAs are connected to the host PC using 1 Gbps Ethernet link each. The DAQ PC is equipped with dedicated C++ based software which is responsible for a configuration of the FPGAs and ASICs settings, storing all the incoming data as well as online reconstruction of the 2D events.

# PS3-17: Design and Test of a High-Speed Flash ADC Mezzanine Card for High-Resolution and Timing Performance for Nuclear Structure Experiments

<u>X. Egea Canet<sup>1,2</sup></u>, E. Sanchis<sup>2</sup>, V. Gonzalez<sup>2</sup>, A. Gadea<sup>1</sup>, J. M. Blasco<sup>2</sup>, D. Barrientos<sup>1,2</sup>, J. J. Valiente Dobon<sup>3</sup>, M. Tripon<sup>4</sup>, A. Boujrad<sup>4</sup>, C. Houarnet<sup>4</sup>, M. Jastrząb<sup>5</sup>, G. de Angelis<sup>3</sup>, M. N. Erduran<sup>6</sup>, S. Erturk<sup>1</sup>, T. Huyuk<sup>1</sup>, G. Jaworski<sup>8,9</sup>, J. Nyberg<sup>10</sup>, M. Palacz<sup>2</sup>, G. de France<sup>4</sup>, A. di Nitto<sup>11</sup>, A. Pipidis<sup>3</sup>, R. Tarnowski<sup>9</sup>, R. Wadsworth<sup>12</sup>, A. Triossi<sup>3</sup>

<sup>1</sup>IFIC (Institut de fisica corpuscular), Valencia, Spain

<sup>2</sup>Electronic engineering department, UV (Universitat de Valncia), Valencia, Spain

<sup>3</sup>INFN, Laboratori Nazionali di Legnaro, Padova, Italy

<sup>4</sup>Grand Accelerateur National d'Ions Lourds, Caen, France

<sup>5</sup>Niewdoczanski Institute of nuclear physics, Polish Academy of Sciences, Krarkow, Poland

<sup>6</sup>Faculty of engineering and natural sciences, Instanbul Sabahattin Zaim university Instanbul, Instanbul, Turkey

Fen-Edebiyat Fakultesi, Fizik Blm, Nigde Universitesi, Nigde, Turkey

<sup>8</sup>Faculty of physics, Warzaw university of technology, Warzaw, Poland

<sup>9</sup>Heavy Ion Laboratory, University of Warzaw, Warzaw, Poland

<sup>10</sup>Deparment of physics and astronomy, Uppsala University, Uppsala, Sweden

<sup>11</sup>INFN, Sezione di Napoli, Napoli, Italy

<sup>12</sup>Department of physics, University of York, York, United Kingdom

When designing high speed acquisition systems is extremely important into account issues like noise and jitter, which can degrade significantly the signal to noise performance. The way how this design was approached, has been based on a choice of a commercial ADC, followed by a clock synchronizer, which allows the generation of low-jitter sampling clocks, and finally, the design of a full-differential analog stage, which can adapt the signal to the ADC range with a minimal noise degradation, being also capable to compensate at slow level the effect of the baseline by adding certain offset voltages. Besides the main parameters described above, issues like the control protocol, input / output interface with other systems, power consumption and signal integrity have been carefully studied in order to provide an optimal performance with the most simplicity.

Finally, a test bench is proposed in order to check out the global FADC performance based on the Xilinx ML605 evaluation board, and the internal oscilloscope Chip Scope Pro. Several ADC tests involving linearity, noise, crosstalk, distortion and effective number of bits will be performed and tested by this module, as well as other parameters to be tested out like the analog stage performance, especially in terms of linearity and noise. Further specific tests for nuclear experiments in gamma spectroscopy and gamma-neutron discrimination will be performed to on order to test the FADC mezzanine for this application. The tests illustrate the FADC performance for the design proposed. This board will be part of the upgrade for the new electronics for EXOGAM2 and NEDA detectors to deal with the problem of providing a sampling card, with a remarkable resolution for the recent gamma spectroscopy experiments, while sampling at very high rates, preserving the shape for further pulse analysis in other type of detectors such as gamma-neutron detectors based on scintillators. High resolution and high speed are often two parameters which conform a trade-off and it is hard to satisfy both of them. These constraints and the urge of an upgrade for more accurate analysis in nuclear physics led to the development of this FADC mezzanine involving sampling rates at 250 Msps preserving a minimal resolution of 11.7 effective bits in order to satisfy the experiment demands. In this work is going to be described the design procedure as well as the test bench proposed for a proper high speed ADC measurement system and the results obtained.

## PS3-18: Design of an Optical Uplink with 10GBit/s Link Between PCIe and MicroTCA

H. Kleines, P. Wstner, A. Ackens, M. Drochner, P. Kmmerling, S. van Waasen, M. Ramm ZEL, Forschungszentrum Jülich, Juelich, Germany

In the context of developments for the PANDA detector system an optical Uplink from MicroTCA to PCIe is being designed. The Link is based on X2 transceivers with a nominal speed of 10 GBit/s. The PCIe board has already been produced and it is currently under test. It is based on a Xilinx Virtex 5 (XC5VLX30T) FPGA. For the implementation of the XAUI interface to the X2 transceiver a PM8358 with a parallel interface to the FPGA is used. The corresponding AMC module, which is under development, is based on same components. Open issues regarding the implementation of the PCIe root complex functionality on this module will be discussed.

# **PS3-19:** Real-Time Data Acquisition for Long-Distance Reflective Ghost Imaging Experiment with Thermal Light

<u>F. Wen</u><sup>1,2</sup>, F. Li<sup>1,2</sup>, Q. Wang<sup>1,2</sup>, G. Jin<sup>1,2</sup>

<sup>1</sup>Department of Modern Physics, University of Science and Technology of China, Hefei, Anhui, China <sup>2</sup>State Key Laboratory of Particle Detection and Electronics, Hefei, Anhui, China

This paper introduced a data acquisition module for long-distance reflective ghost imaging experiment with thermal light. The ghost imaging, also known as correlated imaging, has been studied extensively in recent years. Though ghost imaging was first realized with entangled photons, it is now widely accepted that both classical thermal lights and quantum entangled beams can be used in ghost imaging. There are some debates on using which theory (quantum theory or statistical physics) to explain the physical phenomenon, more attention has been focused on how to apply ghost imaging to practical applications to improve conventional imaging technology. Actually, reflective ghost imaging experiment with thermal light, there are two light beams from the same thermal source. One light beam is directly collected by an ead etector (CCD camera). The other light beam illuminates an object which is about 1 kilometer away from the experiment equipments. The reflected light from the object is collected by a telescope and detected by a point detector (PMT). The image of object is produced by correlating the intensity fluctuations of two detectors.

A data acquisition module is developed for receiving the signal from the point detector (PMT). When the thermal light source transmits a light pulse to object, a trigger is sent to data acquisition module. After a time delay depending on the distance from object to PMT, the data acquisition module captures pulse signal from PMT and calculates area of the pulse which is directly proportional to light intensity. In the long-distance experiment the light intensity detected by PMT is very weak due to the atmospheric scattering effects, the peak of current from PMT is no more than 25uA and the width of pulse is less than 20ns. The input current signal is amplified, broadened and transferred by signal conditioning circuit to a voltage signal (peak: 600mV, width: 100ns) in order to be sampled by a 14bits, 250MSPS ADC chip. In FPGA data from ADC are packed with timestamps which correlate with data from CCD Camera. Some data processing work, such as peak searching and calculation of peak area, is also employed in FPGA. All the data are buffered in SRAM until sent to computer through USB bus. The test results have shown that the data acquisition module satisfies the requirements of the experiment. In fact, this module can also be used in many other light detecting applications like laser radar and bioluminescence.

## PS3-20: Upgrading the Backend of the Pipeline Readout System for Belle II

S. Y. Suzuki, T. Higuchi, M. Nakao, R. Itoh, Y. Igarashi

KEK, Tsukuba, Ibaraki, Japan

The Belle II experiment, the successor of the Belle experiment at KEK for the study of CP violation, will start operation in a couple of years. One of the challenge to the data acquisition system is the expected high trigger rate, which is about 40 times higher than that of Belle. Belle experiment used a large number of pipelined TDC on COmmon Pipeline Platform for Electronics Readout (COPPER) modules. Each COPPER module consists of a 9U-size baseboard, TDC mezzanine cards in the homemade form factor called FINESSE, a trigger receiving module in the PMC form factor (IEEE 1386.1), and a processor PMC module (PrPMC). At Belle II, digitization electronics are moved to further frontend of the data stream, but we still reuse the COPPER modules as the backend of the pipeline readout system.

The average trigger rate is about 30 kHz and we expect that the largest detector requires the bandwidth of 30 MB/s per COPPER module. To handle this bandwidth, both the interface speed and the processing power have to be sufficiently high enough. The task of the PrPMC is not only transmitting data from the baseboard to the network, but also the data error detection, consistency check, and removal of redundant data headers. The items to be checked depend on the detector subsystem and software could vary. In order to use the offline software module without modification for the online processor, the COPPER system requires the x86 architecture for the PrPMC.

The original COPPER board using the EPC-6315 PrPMC of Radisys Corporation with the Pentium-III 800MHz processor does not meet the requirement. The Fast Ethernet interface of EPC-6315 clearly cannot handle the bandwidth, buf we found that the use of the Gigabit Ethernet on the COPPER base board is still slow because of the bus bridge latency. As no suitable board was not found in the market, we decided to develop a new PrPMC with a Gigabit Ethernet interface and sufficient processing power for the COPPER systems of Belle II.

The new PrPMC equips the Intel Atom CPU Z530 processor, Poulsbo chipset, and a Gigabit Ethernet interface. Poulsbo chipset has two PCI express paths; one is dedicated to the Gigabit Ethernet, and the other is connected to the PCI bus bridge. The latter will be fully used to receive the data from FINESSE modules.

We rewrote the device drivers to operate COPPER systems for this new PrPMC, and we confirmed that the bus reading speed reaches 120 MB/s under the 30 kHz trigger rate. Previous drivers did most of device operations in interrupt handling context, that made the system unstable under a high trigger rate. Now most device operations are moved out to the process context using the WorkQueue feature in the Linux kernel. With this rewriting, the maximum acceptable trigger rate exceeds 70 kHz.

We will report further test results of the COPPER readout system using our new processor module and the updated software.

## **PS3-21: Development of an AMC Module MMC**

<u>P. Kaemmerling</u>, M. Drochner, H. Kleines, S. van Waasen, M. Ramm, A. Ackens *ZEL, Forschungszentrum Juelich, Juelich, Germany* 

The MMC (Module Management Controller) of an AMC module communicates with the CMC (Carrier Management Controller) and the ShMC (Shelf Management Controller). It handles the xTCA-FRU hardware signals, and negotiates inventory, power load, diagnosis, receives IPMI-commands and sends state and sensor data. We designed an AMC PCB and used a newly introduced PIC32MX460 with a MIPS32 125DMIPS microcontroller for the MMC. Together with the new MPLAB X gcc-toolchain we chose the open source software coreIPM / coreBMC as starting point and adopted it to PIC32 and our board hardware. We experienced a very dynamic deployment of IDE-, compiler-, library- and example-versions of the PIC32 family. We extracted hardware-related code to an extra library and implemented some extensions like a more reliable i2e-stack for the PIC32.

## PS3-22: Minimizing Dead Time of the Belle II Data Acquisition System with Pipelined Trigger Flow Control

<u>M. Nakao<sup>1</sup></u>, C. Lim<sup>2</sup>, M. Friedl<sup>3</sup>, T. Uchida<sup>1</sup> <sup>1</sup>IPNS, KEK, High Energy Accelerator Research Organization, Tsukuba, Ibaraki, Japan <sup>2</sup>Department of Physics, Yonsei University, Seoul, Korea <sup>3</sup>HEPHY, Austrian Academy of Sciences, Vienna, Austria

TThe Belle II experiment at the SuperKEKB  $e^+e^-$  energy-asymmetric storage ring at KEK, Tsukuba, Japan, is now under construction to search for physics beyond the Standard Model in B meson, charm mesons and  $\tau$  lepton decays. The detector consists of seven sub-detector systems, for which the data acquisition scheme is unified with an exception for the innermost pixel detector (PXD). For the unified systems, the data generated at the frontend upon the level-1 trigger distributed by the timing distribution system is transmitted to a COmmon Pipelined Platform for Electronics Readout (COPPER) system with a homemade protocol which we call the Belle2link. For the design luminosity of  $8 \times 10^{3}$  cm<sup>-2</sup> s<sup>-1</sup>, we expect 1 kHz rate each for B, charm and  $\tau$  production, or 10 kHz for the total physics event rate. Including backgrounds we design our data acquisition system to be able to handle 30 kHz level-1 trigger rate. In order to minimize the dead time, the frontend digitization system is operated in a pipeline manner. In order to minimize the dead-time fraction, we introduce a pipelined trigger flow control scheme to minimize the dead-time fraction (or garbage-event fraction) while avoiding data collapse in the data-flow. In this report, we describe the design of the trigger flow control of Belle II, the trigger distribution and status collection scheme to minimize their latency, the simulation results on the dead-time fraction for various parameters, and measurement of the dead-time fraction in a realistic setup. The trigger flow control design is

largely driven by the silicon vertex detector (SVD) readout scheme, which has a fixed pipeline length in its APV25 readout chip. We find an operation scheme which generates less than 1% dead-time fraction at the 30 kHz trigger rate. We present the parameter dependence of the dead-time fraction by using a simple simulation program. For other detectors, the data buffers are inside the FPGA which also handles the timing signal, and can be flexibly designed. The entire system has to be controlled by a single source of the trigger distribution tree, and therefore it is crucial to minimize the latency. We developed a serial data handling scheme that minimizes the overhead of encoding and decoding. Finally, using a dummy trigger generator, timing distribution modules and prototype frontend readout boards, we demonstrate this pipelined trigger flow control scheme.

### PS3-23: Development of New Data Acquisition System at Super-Kamiokande for Nearby Supernova Bursts

T. Tomura<sup>1</sup>, Y. Hayato<sup>1</sup>, M. Ikeno<sup>2</sup>, M. Nakahata<sup>1</sup>, S. Nakayama<sup>1</sup>, Y. Obayashi<sup>3</sup>, K. Okumura<sup>4</sup>, M. Shiozawa<sup>1</sup>, S. Y. Suzuki<sup>2</sup>, T. Uchida<sup>2</sup>, S. Yamada<sup>5</sup>, T. Yokozawa<sup>1</sup>

<sup>1</sup>Kamioka Observatory, Institute for Cosmic Ray Research, University of Tokyo, Kamioka, Gifu, Japan

<sup>2</sup>High Energy Accelerator Research Organization (KEK), Tsukuba, Ibaraki, Japan

<sup>3</sup>Kavli Institute for the Physics and Mathematics of the Universe, University of Tokyo, Kashiwa, Chiba, Japan

<sup>4</sup>Research Center for Cosmic Neutrinos, Institute for Cosmic Ray Research, University of Tokyo, Kashiwa, Chiba, Japan

<sup>5</sup>Research Center for Neutrino Science, Tohoku University, Sendai, Miyagi, Japan

Super-Kamiokande (SK), a 50-kiloton water Cherenkov detector, is one of the most sensitive neutrino detectors. SK can be used also for supernova observations by detecting neutrinos generated at supernova. In order to improve the performance of the detector for supernovae, we are developing two new features, one for recording all information within one minute and the other for recording calorimetrical information for nearby supernovae.

The current SK data acquisition (DAQ) system reads out all the photomultiplier tube (PMT) hits, including the dark noise, and applies software trigger to select events to record. Therefore, the PMT hits caused by very low energy events below the threshold are not stored. Since

supernova burst is a very rare phenomenon and details of the burst mechanism are not known yet, all possible data should be recorded without any bias in the trigger system. To accomplish this, we are adding a new feature to the DAQ system to record all the PMT hit information before and after the burst occurs for about one minute.

According to the simulation study based on the Livermore model, the neutrino burst from a supernova farther than about 1300 light years can be recorded without loss of data by the current DAQ system. However, if a supernova burst occurred within a few hundreds of light years, the neutrino event rate can be more than 30 MHz and the system can record only about first 20% of the events. To overcome this inefficiency, we are developing a new DAQ system that can handle such high-rate neutrino events. This new DAQ system records the number of hit PMTs so that we can count the neutrinos and obtain a time profile of the number of neutrinos emitted at the supernova.

We will present the implementation of these improvements. The results of the tests with the final prototype before the mass production will be shown.

## PS3-24: Development of a Clock Distribution System for Sub-Nanosecond Time Synchronization over Long Distances

Y. Yang, K. Hanson, T. Meures

Interuniversity Insitute for High Energies (IIHE), Brussels, Brussels, Belgium

The Askaryan Radio Array (ARA) is a new detector deployed at the South Pole designed to detect ultrahigh-energy neutrinos using radio frequency signals emitted by neutrino-induced cascades in the glacial ice. The whole array will contain 37 stations which cover O(100) km2 surface area. Each station consists of four two-hundred-meter-deep holes spaced 20 meters apart with 2 horizontally polarized and 2 vertically polarized antennas in each hole at depths ranging from 180 200 m. A custom-designed ASIC nominally located in close proximity to the antennas is used for high-speed digitization of the induced RF signals. In order to perform the complex particle reconstructions, each antenna signal must be recorded with a time precision of 50 ps relative to other antennas in the same station. In addition the digital data stream from the digitizer must be transmitted from the hole bottom to logic on the surface. This note describes our groups solution to both challenges that uses commercially available high-speed transceivers and the clock data recovery functions built into these ICs. This includes both a CAT5E twisted pair version and an optical fiber version. While this application is discussed in particular, the technology has potential applications in many fields: any system that requires ultra-high precision synchronization of two or more remote clocks could benefit from the system described herein.

## PS3-25: Development of the Data Acquisition System of a Large TPC for the ILC

<u>G. W. P. De Lentdecker<sup>1</sup></u>, E. Verhagen<sup>1</sup>, Y. Yang<sup>1</sup>, L. Jonsson<sup>2</sup>, B. Lundberg<sup>2</sup>, U. Mjornmark<sup>2</sup>, A. Oskarsson<sup>2</sup>, L. Osterman<sup>2</sup>, E. Stenlund<sup>2</sup> <sup>1</sup>Universite Libre de Bruxelles, Brussels, Belgium

<sup>2</sup>Lund University, Lund, Sweden

A large Time Projection Chamber (TPC) is proposed as part of the tracking system for a detector at the future electron positron linear collider ILC. The Linear Collider TPC (LCTPC) Collaboration is currently studying a large TPC prototype (60 cm long, with an outer radius of 77 cm), offering some modularity to investigate various gas amplification systems (GEM or Micromegas), pad sizes and geometries as well as different read-out systems. This prototype has already been extensively and successfully tested during more than 10 weeks, with 6 GeV electron beams. The read-out electronics of the ILC large TPC prototype is based on the ALICE ALTRO ADC chip in combination with a newly developed charge pre-amplifier, PC16, which is programmable with respect to shaping time, gain, decay time and polarity. The preamplifier was specially developed as a first step towards the final electronics for the ILC TPC. The data acquisition system of the prototype is also based on the ALICE ALICE Collaboration.

For the use of the TPC at the ILC, the current readout and data acquisition systems have to be upgraded in different aspects: size of the frontend electronics, power-pulsing capability, improved digital signal processing and higher bandwidth communication technology for the data acquisition. In this note we will mainly report on the latest developments concerning the front-end electronics and the new data acquisition system: we will report on the status of the design of the new Multi-Chip-Module (MCM) board that can house up to 8 new sALTRO16 chips as well as on the development of a first micro-TCA Advanced Mezzanine Card (AMC) prototype to replace the DRORC.

# PS3-26: Real-Time Performance of Commercial Intel-Based VME Controllers for the CODA Data Acquisition System

B. J. Moffit

Physics, Jefferson Lab, Newport News, VA, United States

We have evaluated the performance of several Intel-based VME controllers for use at in data acquisition systems (DAQ) at Jefferson Lab. In the 12 GeV era, PPC-based VME controllers running vxWorks will be replaced with those that are Intel-based running Linux. This is facilitated by the use of FPGAs on the VME modules to perform trigger logic and communicating trigger information over serial and fiber connections throughout the DAQ. The need for a hard realtime operating system on the VME controller is removed from the equation as the readout of the digitized data from the VME modules (using VME-2eSST) is done in a threaded environment with multiple cores while digitization is taking place in the buffered, pipelined system. In this paper, we briefly discuss the 12 GeV Hall D DAQ and the requirements of the VME Controller. We present results from baseline testing of various models from different vendors using a different Linux kernels, including results from a kernel compiled with the CONFIG PREEMPT RT patch.

## PS3-27: A Readout System Utilizing the APV25 ASIC for the Forward GEM Tracker in STAR

<u>G. J. Visser<sup>1</sup>, J. T. Anderson<sup>2</sup>, B. Buck<sup>3</sup>, A. S. Kreps<sup>2</sup>, T. Ljubicic<sup>4</sup> <sup>1</sup>CEEM, Indiana University, Bloomington, IN, United States <sup>2</sup>Argonne National Laboratory, Lemont, IL, United States <sup>3</sup>Bates R&E Center, Massachusetts Institute of Technology, Middleton, MA, United States <sup>4</sup>Physics, Brookhaven National Laboratory, Upton, NY, United States</u>

We have developed a modular readout system for the 30,720 channel Forward GEM Tracker recently installed in the STAR Experiment at RHIC, BNL. The modular architecture is based on a passive compact PCI backplane running a custom protocol, not PCI, connecting 6 readout modules to a readout controller module. The readout modules provide all necessary functions, including isolated power supplies, to operate up to 24 APV25 chips per module with high-impedance ground isolation. The frontend boards contain a minimal set of components as they are located inside the STAR TPC inner field cage and are inaccessible except during long shutdown periods. The frontend boards connect to the readout modules with cables up to 24 m in length, carrying unbuffered analog readout signals from the APV25 as well as power, trigger, clock and control. The readout module digitizes the APV analog samples to 12 bits at 37.532 MHz, and zero suppresses and buffers the data. The readout controller distributes trigger and clock from the central trigger system, gathers the data over the backplane, and ships it to a linux PC via a 2.125 Gbps optical data link (DDL from ALICE). The PC gathers data from multiple readout controllers and dispatches it to the STAR event builders. The readout modules, controllers, and backplanes are housed in a common crate together with the GEM HV bias power supplies.

# **PS3-28:** A Comprehensive Zero-Copy Architecture for High Performance Distributed Data Acquisition over Advanced Network Technologies for the CMS Experiment

<u>A. Petrucci</u><sup>1</sup>, G. Bauer<sup>2</sup>, U. Behrens<sup>3</sup>, J. Branson<sup>4</sup>, S. Bukowiec<sup>1</sup>, O. Chaze<sup>1</sup>, S. Cittolin<sup>5</sup>, J. A. Coarasa Perez<sup>1</sup>, C. Deldicque<sup>1</sup>, M. Dobson<sup>1</sup>, A. Dupont<sup>1</sup>, S. Erhan<sup>6</sup>, D. Gigi<sup>1</sup>, F. Glege<sup>1</sup>, R. Gomez - Reino<sup>1</sup>, C. Hartl<sup>1</sup>, A. Holzner<sup>4</sup>, L. Masetti<sup>1</sup>, F. Meijers<sup>1</sup>, E. Meschi<sup>1</sup>, R. Mommsen<sup>7</sup>, C. Nunez-Barranco-Fernandez<sup>1</sup>, V. O'Dell<sup>7</sup>, L. Orsini<sup>1</sup>, C. Paus<sup>2</sup>, M. Pieri<sup>4</sup>, G. Polese<sup>1</sup>, A. Racz<sup>1</sup>, O. Raginel<sup>2</sup>, H. Sakulin<sup>1</sup>, M. Sani<sup>4</sup>, C. Schwick<sup>1</sup>, A. C. Cristian Spataru<sup>1</sup>, F. Stoeckli<sup>2</sup>, K. Sumorok<sup>2</sup>

<sup>1</sup>CERN, Geneva, Switzerland

<sup>2</sup>Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
<sup>3</sup>DESY, Hamburg, Germany
<sup>4</sup>University of California, San Diego, San Diego, California, USA
<sup>5</sup>Eidgenssische Technische Hochschule, Zurich, Switzerland
<sup>6</sup>University of California, Los Angeles, Los Angeles, California, USA
<sup>7</sup>FNAL, Chicago, Illinois, USA

This paper outlines a software architecture where zero-copy operations are used comprehensively at every processing point from the Application layer to the Physical layer. The proposed architecture is being used during feasibility studies on advanced networking technologies for the CMS experiment at CERN. The design relies on a homogeneous peer-to-peer message passing system, which is built around memory pool caches allowing efficient and deterministic latency handling of messages of any size through the different software layers. In this scheme portable distributed applications can be programmed to process Input to Output operations by mere pointer arithmetics and DMA operations only. The approach combined with the open fabric protocol stack (OFED ) allows to attain a near wire-speed message transfer at application level. The architecture supports full portability of user applications by encapsulating the protocol details and network into modular peer transport services whereas a transparent replacement of the underlying protocol facilitates deployment of several network and prevents the potential difficult couplings to deal with when the underlying communication infrastructure changes. We demonstrate the feasibility of this approach by giving efficiency and performance measurements of the software in the context of the CMS distributed event building studies.

# PS3-29: A Novel Data Acquisition Scheme Based on a Low-Noise Front-End ASIC and a High-Speed ADC for CZT-Based Small-Animal PET Imaging

W. Gao<sup>1</sup>, D. Gao<sup>1</sup>, B. Gan<sup>1</sup>, L. Wang<sup>1</sup>, Q. Zheng<sup>1</sup>, F. Xue<sup>1</sup>, T. Wei<sup>1</sup>, <u>Y. Hu<sup>2</sup></u> <sup>1</sup>School of Computer of Science and Technology, Northwestern Polytechnical University, Xi'an, Shaanxi, China <sup>2</sup>Institut Pluridisciplinaire Hubert Curien (IPHC), UMR 7178 CNRS/UDS, Strasbourg, France
Over the last decade, cadmium zinc telluride (CdZnTe) have attracted increasing interest as X-ray and gamma ray detectors. Since It has well known good spatial and good energy resolution and especially the three-dimensional pixellization for the three-dimension localization of the photon interaction, CdZnTe detectors show great potentialities for small animal PET scanners. The drawback of CdZnTe detectors lies to the poor timing performance. However, the segmented CdZnTe detectors with bi-parametric corrections and PTF irradiation showed both good energy resolution and good time properties. Thus, CdZnTe detectors will be widely applied for high-performance small animal PET imaging systems. In this paper, we present a novel data acquisition scheme based on a low-noise front-end readout ASIC and a high speed ADC for CdZnTe small animal PET imaging system. Our objective is to develop a small animal PET imaging system with the following performances: the spatial resolution of 1 mm^3, the detection efficiency of 15 % and the time resolution of 1 ns. Since the system will require many hundreds or even thousands of detectors, an application-special integrated circuits (ASIC) is needed to provide a compact platform for processing the many signals. In our solution, a charge-sensitive amplifiers(CSA), a pulse shapers and a driving buffer are integrated for each CdZnTe detector pixel. The specifications of the ASIC are dependent on the dimension of the CdZnTe detector. In this study, a 32-channel ASIC will be taped out. The output voltage of the shaper is sampled by a high-speed ADC. According to the characteristics of the pulse shaper, the voltages of different energy levels have the same peaking time (shaping time), only the left side of the pulse shaping voltages are sampled. In our solution, eight points of the shaping voltages are sampled and digitized. The data from the ADC is collected by a programmable FPGA which can run an algorithm to calculate the peak value of the shaping voltages and the trigging timing of the shaped pulses. To achieve good noise performances and to realize a flexible front-end electronic system, the front-end ASIC and the ADC are not integrated together. Two chips will be bonded directly on the board. Both the front-end readout chip and the ADC chip are designed in TSMC 0.35 µm CMOS processing. The prototypes of the front-end readout chip and ADC chips have been fabricated and under tested. The input range of the front-end ASIC is from 2000 e- to 40000 e-. The equivalent noise charge (ENC) is below 200 e-. The shaping time is 1.5 µs. Thus, the sampled and converted period for the ADC is 100 ns. The simulated results show that the proposed method can achieve good spatial resolution and good detection efficiency, and meanwhile, the time resolution of the PET system is greatly improved.

### PS3-31: Communication Architecture of DAQ-Middleware

<u>Y. Nagasaka<sup>1</sup></u>, H. Sendai<sup>2</sup>, E. Inoue<sup>2</sup>, T. Koutoku<sup>3</sup>, N. Ando<sup>3</sup>, S. Ajimura<sup>4</sup>, M. Wada<sup>5</sup> <sup>1</sup>Hiroshima Institute of Technology, Hiroshima, Japan <sup>2</sup>High Energy Accelerator Research Organization, Ibaraki, Japan <sup>3</sup>The National Institute of Advanced Industrial Science and Technology, Ibaraki, Japan <sup>4</sup>Osaka University, Osaka, Japan <sup>5</sup>Bee Beans Technologies Co. Ltd., Ibaraki, Japan

DAQ-Middleware is a software framework of a network-distributed data acquisition system for a small or middle size experiment. The framework was developed for a data acquisition system based on Robot Technology Middleware, RT-Middleware, which is an international standard of OMG, Object Management Group and is not only for Robotics but also for embedded systems.

The framework is developed with the object-oriented technology and a CORBA, Common Object Request Broker Architecture, technology to communicate objects. The communication of CORBA is sufficient for Robotics. But other communication method, which is not based on CORBA, is also required for a DAQ system framework.

We developed new communication architecture of DAQ-Middleware which was based on a usual socket communication. Each communication method can be selected with using the configuration file. The performance of the DAQ-Middleware with using new communication architecture was measured and compared with CORBA. The throughput is improvement in case of a transfer of data whose size is greater than about 256 kByte.

# **PS3-32:** Implementation of the Disruption Predictor APODIS in JET Real Time Network Using the MARTe Framework

<u>J. M. Lopez</u><sup>1</sup>, J. Vega<sup>2</sup>, D. Alves<sup>3</sup>, S. Dormido-Canto<sup>4</sup>, A. Murari<sup>5</sup>, J. M. Ramirez<sup>4</sup>, R. Felton<sup>6</sup>, M. Ruiz<sup>1</sup>, G. D. Arcas<sup>1</sup>, and JET-EFDA Contributors<sup>7</sup>

<sup>1</sup>CAEND, Universidad Politecnica de Madrid., Madrid, Spain

<sup>2</sup>Asociacion EURATOM CIEMAT para Fusion, Madrid, Spain

<sup>3</sup>Associacao EURATOM/IST, Instituto de Plasmas e Fusão Nuclear. Instituto Superior Tecnico, Univ. Tecnica de, Lisboa, Portugal

<sup>4</sup>Dpto. Informatica y Automatica, Universidad de Educacion a Distancia, Madrid, Spain

<sup>5</sup>Consorzio RFX-Associazione EURATOM ENEA per la Fusione, Padova, Italy

<sup>6</sup>EURATOM/CCFE Fusion Association, Culham Science Center OX14 3DB, Abingdon, United Kingdom

<sup>7</sup>See Appendix of F. Romanelli et al Proc. 23rd IAEA Fusion Energy Conference 2010, Daejeon, Korea

Disruptions in tokamaks devices are unavoidable, and they can have a significant impact on machine integrity. So it is very important have mechanisms to predict this phenomenon. Disruption prediction is a very complex task, not only because it is a multi-dimensional problem, but also because in order to be effective, it has to detect well in advance the actual disruptive event, in order to be able to use successful mitigation strategies. With these constraints in mid a real-time disruption predictor has been developed to be used in JET tokamak. The predictor has been designed to run in the Multithreaded Application Real-Time executor (MARTe) framework. The predictor Advanced Predictor Of DISruptions (APODIS) is based on Support Vector Machine (SVM). The implementation uses seven relevant measurements, e.g. Plasma current, Mode lock amplitude, etc. These signals are processed using 32 ms time windows with a sampling frequency of 1 kHz. Various features are calculated (mean value and standard deviation of the FFT, without first component). The real-time implementation has been validated using the JET database signals obtaining an equivalent performance to the one of the off-line prediction algorithm. These results show that the system is able to predict a disruption 30 ms in advance with a hit rate of 90%. It is estimated that 30 ms is a sufficient time to take protective actions. The system has been implemented on a six core x86 architecture with an ethernet Network Interface Card (NIC) for remote administration and introspection and an Asynchronous Transfer Mode (ATM) NIC handling all real-time I/O within the JETs Real Time Data Network (RTDN). It is a user-space application running on a mainstream Linux vanilla kernel and implemented using MARTe. Real-time performance has been achieved by combining available Central Processing Units (CPU) isolation and Interrupt ReQuests (IRQ) routing mechanisms. Preliminary results of the systems prediction and real-time performances will be presented as well as the influence of the MARTe framework on the development and integration of the system into JETs distributed philosophy for real-time experiment control.

### PS3-33: A Versatile High Speed Data Acquisition Module with Four 10G Ethernet Links

I. Sheviakov, M. Zimmer

Deutsches Elektronen-Synchrotron, Hamburg, Germany

A FPGA-based Mezzanine board for high data rate applications in the range of 100Gbit/s is being presented. This concept separates all resources needed for fast data acquisition tasks from application-specific adaptions, which are located on dedicated carrier boards. It has been primarily designed for the digital readout of Megapixel X-ray cameras with high frame rates at Light Sources, but it can also be utilized in other environments, where multiple digital signals have to be preprocessed, buffered and transmitted to computing hardware. The design is based on a Xilinx Virtex5 XC5VFX70T FPGA and is able to handle peak data rates of more than 100 Gbit/s, implemented as 146 LVDS input pairs working in parallel at speeds up to 800 Mbit/s each. The maximum sustained data throughput of the module is limited to 40 Gbit/s by four 10G Ethernet links to external processing devices. For the data transfer a UDP Protocol has been implemented, which allows to read out the module by standard IT-hardware. Two SO-DIMMS slots can be equipped with 8 GBytes of DDR2 Memory providing a bandwidth of 80 Gbit/s for internal data sorting and processing. The board configuration is managed by a CPLD and allows to boot up the FPGA from multiple local (Platform Flash RAM, 8 GByte CompactFlash Card) and external sources like JTAG or a microcontroller interface. Several firmware versions with different functionalities and data processing algorithms can be kept locally on the board. Multiple boards can be used in a network to build scalable and online reconfigurable systems. A flexible clocking interface as well as several General Purpose IOs and buses like 12C, SPI, etc. simplify the integration of the board into various applications. The FPGA-internal PowerPC processor can be optionally used for additional control and monitoring tasks.

The Mezzanine card (8x25 cm2) is plugged with two high-speed / high density connectors (900 pins altogether) to a dedicated carrier-board, where the entire application-specific infrastructure is located. Such a base board can be rather simple, just providing the necessary system voltages for the mezzanine card and a connection to the digitized detector output-signals, or more sophisticated with additional features, like A/D conversion of analogue detector signals or slow control systems needed to operate a particular detector. This concept allows an easy and efficient adaption of the complex digital readout circuits to various applications. We will present details of the boards system architecture and first experiences.

## PS3-34: Advanced Linux PCI Services (ALPS) for Rapid Prototyping of PCI-Based DAQ Electronics

<u>S. A. Chilingaryan</u>, M. Caselle, A. Kopmann, U. Stevanovic, M. Vogelgesang *IPE, Karlsruhe Institute of Technology, Karlsruhe, Germany* 

Writing stable and performant drivers and keeping them up to date with the latest Linux kernel is complex and tedious task. It is especially difficult to synchronize parallel development of hardware and software. However, many components of PCI driver are standard. Basically, in development phase hardware engineers often only need access to the device registers and the ability transfer data between device and host memory in few different modes. This functionality may be provided uniformly for most devices. We developed a universal PCI driver and a debugging tool to facilitate hardware development. There are few basic ideas behind: A universal driver is used during the development phase. If necessary, the dedicated driver might be implemented when the hardware is ready. To simplify maintenance of new kernel versions, we split the DMA implementation into the two parts. The kernel module is kept as small as possible and it is responsible for the memory management only. The actual implementation of the DMA engine and most of other features are actually realized in user space. Finally, the design of the driver allows fine grained scripting. For example, it is possible to start the DMA engine, set some registers to initiate DMA transfer, read data from DMA engine, make an attempt to process it, and if the wrong data is returned, analyze the status registers to find the signature of the error. So, the hardware design is not blocked by missing or malfunction software and no software modifications are required for hardware debugging. The PCI board is identified by the vendor and device ids which are specified as module parameters. The register model is defined by a simple XML file. The driver is able to operate in two non-DMA modes: with plain PCI memory mapping and FIFO registers. DMA engines depend on the FPGA implementation and are supported by plugins. Along with driver we provide an SDK and a command line tool. To simplify integration with distributed data acquisition systems, we plan to enhance ALPS by a web-service interface. The universal driver is successfully used for the development of a high-throughput camera platform at KIT.

# **PS3-35: Implementation of Intelligent Data Acquisition Systems for Fusion Experiment Using EPICS and FlexRIO Technology**

<u>D. Sanz<sup>1</sup>, M. Ruiz<sup>1</sup>, R. Castro<sup>2</sup>, J. Vega<sup>2</sup>, J. M. Lopez<sup>1</sup>, E. Barrera<sup>1</sup>, N. Utzel<sup>3</sup>, P. Makijarvi<sup>3</sup> <sup>1</sup>CAEND-UPM-CSIC, Universidad Politecnica de Madrid, Madrid, Spain <sup>2</sup>Asociacion EURATOM/CIEMAT, Madrid, Spain <sup>3</sup>ITER Organizarion, St. Paul lez Durance Cedex, France</u>

The data acquisition systems used for fusion experiments have the following requirements: a large number of analog input channels synchronized among them, high speed sampling rate, pre-processing capabilities with real time constraints, interface for carry out control loops and data archiving to stores and process/display data off-line. In addition, some other features are becoming relevant. These are, the generation of hardware events, the TimeStamping of the data with the maximum accuracy. To meet this list of requirements, implies the use of reconfigurable input/output devices depending on the specific diagnostic. These functionalities in general are not available in general purpose multifunction data acquisition devices. The main objective of this work has been to propose and implement a methodology based on: a) having reconfigurable data acquisition system customized taking into account the requirements of the scientific in charge, b) providing multifunction data acquisition with TimeStamping functionalities, c) simplifying the implementation of scalable system. d) providing the integration in EPICS, a distributed control system framework. The complete solution has been achieved developing an EPICS asynDriver device support and a design-model for configuring RIO FPGA based devices. This EPICS device support will be able to manage every RIO/FlexRIO device using a ruleset, making possible to get the features presented above. The design-model requires following a workflow with these steps: 1) the scientist lists in a designed spreadsheet the features he needs. 2) The spreadsheet generates automatically the db file used to create an IOC EPICS application to control the RIO device. 2) Using LabVIEW and following the design-model defined by this project and by providing a spreadsheet filled by the scientist, the design is compiled to obtain the bitfile to program the FPGA. 3) With the bitfile, LabVIEW tools generate a header file with the mapping of the resources in the FPGA. 4) The user uses the bitfile, the header file, the db file, and the EPICS device support to create the IOC application capable to control and manage the RIO/FlexRIO device configured with specific features. The resulting system is a complete and easily reconfigurable data acquisition system which permits, in a short period of time to achieve a ready-touse solution for a new type of experiment. The relevant aspects of the proposed solution will be presented, with its main advantages and limitations.

# PS3-36: DEAP-3600 Dark Matter Experiment Data Acquisition and Trigger Sytem

A. J. Muir

Science, TRIUMF, Vancouver, Canada

The DEAP-3600 experiment will search for dark matter particle interactions on liquid argon at SNOLAB, location 2 km underground in Sudbury Ontario. The DEAP-3600 experiment will be instrumented with 255 Hamamatsu PMT, each channel being decoupled from the PMT high voltage by a set of custom analogue signal conditioning boards. The signal conditioning boards split the PMT signals into three outputs: high gain; low gain; and into a twelve channel analogue sum. Three different digitisers sampling at 250 MHz, 62.5 MHz and 50 MHz digitise the signal conditioning outputs, the 50 MHz sampler being a custom VME Straix IV FPGA based module generating trigger decisions based on pulse-shape discrimination (PSD). The 250 MHz and 62.5 MHz digitisers are commercial CAEN V1720 and V1740 modules respectively. DAQ from the digitisers to disk is handled by the MIDAS system with additional software event filtering provided by DEAP-3600 specific algorithms. With a total mass of 3600 kg the expected argon beta decay background is 3.6 kHz, a key feature of the DEAP-3600 electronics and DAQ system is the reduction in data volume to ~5 MB/s, while recording all events of interest.

### PS3-37: A 16-Channel 15 ps TDC Implemented in a 65 nm FPGA

L. Zhao<sup>1,2</sup>, X. Hu<sup>1,2</sup>, S. Liu<sup>1,2</sup>, J. Wang<sup>1,2</sup>, Q. An<sup>1</sup>

<sup>1</sup>State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei, Anhui, China <sup>2</sup>Anhui Key Laboratory of Physical Electronics, Department of Modern Physics, University of Science and Technology of China, Hefei, Anhui, China

We present the implementation of a high-resolution Time-to-Digital Converter (TDC) targeting a Field Programmable Gate Array (FPGA) from Xilinx Virtex-5 family. There are a total of 16 channels with the timing performance of about 15ps RMS, 30 ps Bin per channel. The design of TDC is based on a counter and an interpolator method. A counter is used to track the coarse time elapsed since the TDC is enabled, and gives the coarse time. This approach also offers a large dynamic range that is only limited by the number of counter bits. Dedicated carry-in lines in CARRY4 block of Virtex-5 FPGA are utilized for time interpolation, which gives fine time measurements within a system clock period. There are many approaches to implement time interpolation. Vernier method tuning two ring oscillators, pure tapped delay line (TDL), Wave Union TDL and Vernier TDL. We focus on pure TDL method after making a serious trade-off among high-resolution, flexibility, resource utilization and dead time. Our simulation shows that the delay from CIN to COUT in CARRY4 block is as large as 104 ps. Thus we need to subdivide the delay of CARRY4 into finer taps for a higher resolution. Temperature, power voltage and process variations are common causes to the inhomogeneous delay cells. However, additional uneven delays need to be calibrated due to the dividing operation. Multiple strategies are applied to calibrate the non-uniformity of delay cell and to enhance the TDC resolution. The initial point is a code density test. A bin-by-bin calibration look-up table can be built inside FPGA according to the test results and utilized to compensate the temperature and voltage instability.

Due to the disparity of delay lines in Virtex-5 FPGA compared to the carry-in resources inside previous Xilinx FPGA families an extra effort is needed to flatten the inhomogeneous delays. It can be done either with software calibration or directly with hardware compensation. For software calibration, we can get the asymmetric distribution of the bin width using MATLAB and analyze its influence on the linearity of TDC. The asymmetric distribution, to a large part, is normal. We can easily compensate this delay variation during the offline data processing of time. For hardware compensation, we derive the asymmetric delay distribution inside CARRY4, and make most of this asymmetry balance the asymmetric delay using the inherent tapped delay line. Meanwhile, we also apply Place and Route (PAR) constraints to fit our TDC design. Two different configurations are used to get the tapped point of TDL for comparison. Hardware compensation requires no extra resource and is more attractive and efficient.

We design an evaluation board to verify the performance of the TDC. This board is not only set up for bench-top test, but has potential in modularization as it is physically implemented in 6U PXI format.

# **PS3-38:** Development of High Resolution TDC Implemented in Radiation Tolerant FPGAs for Aerospace Application

X. Qin<sup>1,2</sup>, C. Feng<sup>1,2</sup>, L. Zhao<sup>1,2</sup>, D. Zhang<sup>1,2</sup>, X. Hao<sup>3</sup>, S. Liu<sup>1,2</sup>, Q. An<sup>1,2</sup>

<sup>1</sup>Department of modern physics, University of Science and Technology of China, Hefei, Anhui, China

<sup>2</sup>State Key Laboratory of Technologies of Particle Detection & Electronics, Hefei, Anhui, China

<sup>3</sup>Department of Geophysics and Planetary Sciences, University of Science and Technology of China, Hefei, Anhui, China

A time measurement architecture with low power consumption, high integration, high resolution, and high radiation tolerance is presented, which could be applied in future space experiments. In many space particle experiments, such as the CLUSTER mission in the end of 20th century and the ROSETTA mission launched in 2004, the application of time measurement has been restricted by many negative factors, such as total weight of the module, power consumption, radiation tolerance, environment temperature and so on. Due to the harsh space environments, most of the chips with the best time measurement performance cant be applied in space exploration. FPGA-based TDC which is implemented in radiation tolerant FPGAs is a good solution for space missions. Research on implementing TDCs in both ACTEL Flash FPGAs and Anti-fuse FPGAs is described in this paper, and high time resolution could be achieved with a counter and interpolation method. The simulation results show that a minimum bin size lower than 670 ps is possible in ACTEL PROASIC3E family FPGAs by employing CMOS buffers as the interpolation cells, while the result is 880 ps in PROASIC family. In addition, the minimum bin size could be decreased to below 80ps in ACTEL Axcelerator family high radiation tolerant anti-fuse FPGA (A3PE1500), a time resolution of about 150ps RMS (with a 440ps bin size, which is smaller than the simulation value) and a dynamic range of 1.6384ms can be obtained. The total power consumption of the TDC board is below 1 watt. Tests for the ACTEL FPGA-based TDC have been finished. Tests on the differential non-linearity (DNL), time resolution, and bin size dependence on temperature have been conducted, and the result sprove that the TDC is qualified for high precision time

measurement. The TDC architecture is going to be implemented in ACTEL anti-fuse FPGAs (industry grade) in the next step. Furthermore, the TDC codes could be easily transplant to space grade radiation tolerant FPGAs in the future, and could be applied in outer space experiments.

### PS3-39: SEUs Tolerance in FPGAs Based Digital LLRF System for XFEL

M. K. Grecki

MSK, DESY, Hamburg, Hamburg, Germany

The rapidly developing semiconductor technology allows to implement sophisticated digital control in the programmable devices platforms (FPGAs, CPUs). However the increasing size and performance of the circuits has also a drawback at the failure sensitivity, in particular for soft errors due to ionizing radiation. The sensitivity to SEUs is related to the critical charge which strongly depends on the transistor dimensions and supplying voltage. The sensitivity to ionizing radiation increases faster than the circuits complexity due to Moore's law. Therefore the life critical systems and systems operating in radioactive environment have to deal with soft errors. The countermeasure can be special design techniques introducing the redundancy to the algorithms and/or circuit design allowing to detect and correct errors. Recently also the semiconductor manufactures provides some tools to the designers to help them to fight for highest reliability of their designs. The system designer can use of these tools but he/she is not limited to that. Even on the algorithm and implementation levels there are possibility to apply general or customized countermeasures against failures. But this is sometimes costly and/or induces performance limitations. The goal is to find the compromise between cost, performance and reliability. The LLRF control system for XFEL will use sophisticated digital systems based on FPGAs and DSPs. It will be installed in the close proximity to the accelerator pipe since the accelerator is constructed using the single channel concept. Therefore electronic circuits will be exposed to gamma and neutron radiation. The electronics will be built using normal COTS components therefore normal radiation resistivity is expected. The racks with electronic systems will be partially shielded against radiation but moderate radiation level will be present during machine operation. Therefore the soft errors are expected and must be taken into account. In order to evaluate the possible consequences of the radiation to the LLRF control the experiments at FLASH accelerator have been performed. The paper presents some techniques used to improve the tolerance against SEUs applied in the LLRF system and presents results of experiments performed at FLASH accelerator tunnel. The cost and efficiency of these methods (smart algorithms, spatial and time redundancy etc.) are also discussed.

# PS3-40: Maximum Likelihood Estimation and Non-Linear Least Squares Fitting with Levenberg-Marguardt Algorithm Implementation in FPGA Devices for High Resolution Hodoscopy

J. M. Blasco, E. Sanchis, V. Gonzalez, J. D. Martin, X. Egea, D. Barrientos, D. Granero Universidad de Valencia, Valencia, Spain

Applications like X-ray and medical imaging or ion-implantation for microelectronics demand high-resolution beam tracking (hodoscopy) capabilities. Knowledge of the exact position involves the calculation of the maximum point of incidence of the particle beam. To carry out this task, scintillating fibers laid in X or X and Y planes are a common practice. Light coming out from these fibers is usually converted to electrical signals by using some kind of optoelectronic devices such as photodiodes, avalanche photodiodes or SiPM. These signals are then digitized and processed to calculate the beam position.

In this work we present the implementation of two different signal processing algorithms for the estimation of the beam position for an scintillating fiber hodoscope prototype. This prototype has been built with two planes of 128x128 0.5 mm2 Kuraray SCSF-38 squared fibers. The read-out system is performed with a Silicon Photodiode Array S8866-128-02, coupled to a charge integration circuit C9118 (Hamamatsu[1-2]), and controlled with a FPGA evaluation board. The set-up has been tested with good results of linearity and SNR compared with the commonly used CCD cameras, besides its smaller size and lower prize [3-4].

The two algorithms implemented and compared are, on one hand, a statistical model based on the Maximum-Likelihood Estimation (MLE) and, in the other hand, a numerical solution to the problem of minimizing a non-linear function with the LevenbergMarquardt Algorithm (LMA). These methods have been widely used for data processing. For several reasons i.e. power consumption, portability and size, in some applications the implementation of the process in a portable device can be very useful. The previously mentioned system allows the online visualization of the results, which implies the need of development of the algorithms in a semi-custom device such as a FPGA.

Precision with MLE and LMA has been calculated for Geant4 simulated data and also for real Sr-90 radioactive measurements, obtaining a precision below 100 um (worst possible case). Results of the implementation regarding time response and FPGA employed resources are presented for two Xilinx platforms: one, the Spartan 3ADSP originally used for the management of the read-out system and the other, a Virtex6 also from Xilinx, with higher performances. The implementation of the algorithm in the FPGA provides with accuracy the position of the beam and its maximum point of incidence in real time.

[1-2] Hamamatsu Photonics. S8866-128-02 and C9118 series. Datasheet [3]

E.Sanchis, F.Carrio, V.Gonzalez, J.Torres, C.Marn, S.Chollet, M.Haguenauer, P.Poilleux. Evaluation of a commercial APD array (Avalanche PhotoDiode) for a readout detector in a hadrontherapy beam characterization application, NSS. October 26, 2010 [4] F.Carrio, V.Gonzalez, E.Sanchis, J.M.Blasco. Evaluation of a commercial PhotoDiode array for radiation detectors readout, The Open Optics Journal Vol V, 2011 (62-65)

# PS3-41: Multiple Register Synchronization with a High-Speed Serial Link Using the Aurora Protocol <u>D. Barrientos</u><sup>1,2,3</sup>, V. Gonzalez<sup>3</sup>, M. Bellato<sup>2</sup>, A. Gadea<sup>1</sup>, D. Bazzacco<sup>2</sup>, J. M. Blasco<sup>3</sup>, D. Bortolato<sup>2</sup>, F. J. Egea<sup>1,3</sup>, R. Isocrate<sup>2</sup>, A. Pullia<sup>4</sup>, G. Rampazzo<sup>2</sup>, E. Sanchis<sup>3</sup>, A. Triossi<sup>2</sup>

<sup>1</sup>Instituto de Fisica Corpuscular (CSIC-UV), Valencia, Spain

<sup>2</sup>Istituto de Fisica Nucleare (INFN), Sezione di Padova, Padova, Italv

<sup>3</sup>Departamento Ingeniera Electronica, Universitat de Valencia, Valencia, Spain

<sup>4</sup>Istituto de Fisica Nucleare (INFN), Sezione di Milano, Milano, Italy

The synchronization of general purpose registers in the framework of distributed hardware systems becomes essential when the amount of programmable integrated circuits raises dramatically. In this work, we propose the synchronization of user-controlled registers between two Field Programmable Gate Arrays (FPGAs) through a high-speed serial link at 2.5 Gbps using the Aurora protocol. Aurora is an open, lightweight and scalable protocol, that performs a 8B/10B codification, flow control, clock correction, etc. On top of that, a set of VHDL

modules manages the synchronization between a variable number of registers, whose length is also variable, and can be configured before the synthesis of the code. Thus, a final hardware core is provided, allowing the user to implement a specific register configuration up to 254 registers with 8-bit width.

The development and validation of the code has included a simulation process for each developed module and several hardware testbenches, using Virtex-5 and Virtex-6 FPGAs from Xilinx. In addition, Bit Error Ratio (BER) tests for the whole firmware and hardware system have been performed. From those tests, some characteristics of the core have been quantified, such as the maximum frequency for updating the registers as a function of the number of synchronized registers, and the latency of the link from the local to the remote user interface. For that purpose, a generator of pseudo-random values using a Linear Feedback Shift Register (LFSR) in both FPGAs has been used, allowing to measure the BER for the whole setup.

The work has been developed on a general basis, in order to make it fully compatible with several possible implementations. However, it has also been validated for the first use that was conceived: the slow control system in the second generation of electronics for the Advanced GAmma Tracking Array (AGATA). Furthermore, the firmware has been included as a peripheral in a microprocessor embedded in one of the FPGAs, while its partner was linked to a serial 2-wire bus. The registers in the created peripheral have been programmed with an application software layer, written in C code, using bit-banging techniques. The modularity of the C code also provides the possibility of encapsulating the serial protocols, providing, to the high-level user, read and write functions in a fully transparent way.

Distributed slow control systems managing remote devices, using bit-banging techniques, or register-dependent protocols could take advantage of the versatility of the developed core without the need of embedding a microprocessor for that purpose. Furthermore, the low resource utilization and the small user interface makes it easily portable and usable into a custom user application.

#### PS3-42: Graphical User Interface for Serial Protocols Through a USB Link

D. Barrientos<sup>1,2,3</sup>, V. Gonzalez<sup>3</sup>, M. Bellato<sup>2</sup>, A. Gadea<sup>1</sup>, D. Bazzacco<sup>2</sup>, J. M. Blasco<sup>3</sup>, D. Bortolato<sup>2</sup>, F. J. Egea<sup>1,3</sup>, R. Isocrate<sup>2</sup>, A. Pullia<sup>4</sup>,

G. Rampazzo<sup>2</sup>, E. Sanchis<sup>3</sup>, A. Triossi<sup>2</sup>

<sup>1</sup>Instituto de Fisica Corpuscular (CSIC-UV), Valencia, Spain <sup>2</sup>Istituto de Fisica Nucleare (INFN), Sezione di Padova, Padova, Italy

<sup>3</sup>Departamento Ingeniera Electronica, Universitat de Valencia, Valencia, Spain

<sup>4</sup>Istituto de Fisica Nucleare (INFN), Sezione di Milano, Milano, Italy

Within the last decade, the trend towards smaller systems with greater functionality in the electronics community has been widely accepted. This fact has led integrated circuit designers into levels of integration and complexity barely imagined a few years ago. However, the price paid has been the increased number of integrated circuits in the boards. In addition, as the physical space has usually been reduced, most of the configuration interfaces for these circuits are performed with 2-wire or 3-wire serial links. Under these circumstances, the qualification of complex board prototypes becomes a hard task when different protocol and computer interfaces are needed. In this work, we have developed a Graphical User Interface (GUI), for Windows operating systems, that provides an intuitive and fully transparent way to interact with several devices. The software has been developed using Dynamic-link libraries (DLL), linked at run-time, encapsulating the hardware USB interface and three implemented protocols. The GUI is composed of four tabs that correspond to the "Port setup", "I2C bus", "SPI bus" and "uWire bus". In the first tab, the USB port is setup, as well as the signal bonding for the four available ports can be configured. As a consequence of protocol selection by the user, read and write operations are available in the specific tab. The integrated circuit chosen as bridge is the USB to UART bridge CP2103, from Silicon Laboratories. This chip, as well as the USB to UART conversion, provides four General Purpose Input Output (GPIO) ports used for the aims of this work within a QFN-28 package.

As presented, the development of a GUI integrating the commonly used 2-wire and 3-wire serial protocols provides a portable and friendly interface for the configuration of several devices very useful during prototype validation stage.

### **PS3-43: SEU Effects on Power Consumption in Xilinx FPGAs**

A. Aloisio<sup>1,2</sup>, V. Bocci<sup>3</sup>, G. Chiodi<sup>3</sup>, R. Giordano<sup>1,2</sup>, <u>V. Izzo<sup>2</sup></u>, L. Sterpone<sup>4</sup>, M. Violante<sup>4</sup> <sup>1</sup>Dipartimento di Scienze Fisiche, University of Naples 'Federico II' and INFN, Napoli, Italy <sup>2</sup>Sezione di Napoli, INFN, Napoli, Italy <sup>3</sup>Sezione di Roma, INFN, Roma, Italy

<sup>4</sup>Dipartimento di Informatica e Automatica, Politecnico di Torino, Torino, Italy

SEU effects in the configuration memory are the most important cause of fault in SRAM-based FPGAs exposed to ionizing radiations or neutrons. While the mechanism of random changes in the device resource networks has been thoughtfully studied for their impact on the overall logic reliability and fault analysis, much less effort has been paid in checking the consequences on power consumption. Even if the functionality of the design is hardened by means of SEU mitigation techniques, sensible changes in power consumption for selected domains (most notably in the logic core) are experienced. Also, power consumption of an unconfigured FPGA will rise due to bit flips. These effects have to be taken into account when dimensioning the supply system, especially in applications with severe power budget issues like avionic equipment, satellite payloads and on-detector electronics in High Energy Physics Experiments. Apart from scrubbing, no general purpose technique or power-aware design style have been envisaged in order to moderate such a side effect of SEUs. Moreover, very few data are available from the silicon vendors and in the published literature.

In this paper, we present a detailed analysis of the quiescent consumption of a Xilinx Virtex 5 LX50T on the CORE, AUX, MGT and IO power domains during irradiation with 62 MeV proton beams. The tests have been performed at the Superconductive Cyclotron of the LNS-INFN facility (Catania, Italy). We describe the architecture of the test bench developed on purpose and show the trends of the current drawn on each rail versus the dose. We also discuss the injection of specific bit flips in the configuration memory in order to study the effects on the power consumption of selected faults like logic clashes, power post shorts and routing issues. Results from SEU emulation on the bench and beam tests are compared and analyzed.

### **PS3-44:** Online Software Time Calibration for a Continuous Air Shower Array

<u>S. Mastroianni<sup>1</sup>, M. Iacovacci<sup>2</sup></u> <sup>1</sup>*INFN, Naples, Italy* <sup>2</sup>*Universit, Naples, Italy* 

Time calibration is a crucial item for a shower array performance as it uses the time of flight method to reconstruct the arrival direction of the primary particle.

This paper presents a software time calibration algorithm exploiting the continuous detector feature based on the assumption of locally flat shower front. On the small portions of the detector (tens of m^2) a simple time-position fit of the arriving particles provides the time calibration constants of that part of detector. Then, a second step is needed to measure the time offset among the different portions obtaining the complete detector calibration

This algorithm is applied to the ARGO-YBJ detector with a high trigger rate (several kHz) and its performance is discussed.

#### FERT2: FPGA and Electronics Applied to Realtime Systems 2

### Friday, June 15 08:30-10:40 Crystal Ballroom

# FERT2-1: Status and Perspectives of Fast Waveform Digitizers R. Paoletti

Dept. of Physics, University of Siena and INFN Pisa, Siena, Italy

High energy and astroparticle physics communities are now designing the next generation experiments with stringent requirements of high temporal resolution and large number of channels. Recent developments have focused on large integration to accommodate more input channels, high frequency sampling (1-2 Gsps or more), excellent linearity response and large dynamic range. Integration and modularity of data acquisition systems under study are imposing more challenging requirements like cascading architecture, zero dead time processing, large memory cells, high input analog bandwidth and self trigger functionality. The current status of waveform digitizers is reviewed and future developments of next generation digitizers is presented with emphasis on applications in new experiments or upgrades.

### FERT2-2: Hardware Timebase Calibration in the Multi-GSa/s LABRADOR-4 ASIC

G. S. Varner, M. Z. Andrew, Z. Cao, K. A. Nishimura, P. W. Gorham Physics and Astronomy, Hawaii Univ., Honolulu, HI, United States

In recent years inexpensive, multi-Giga sample per second CMOS waveform samplers have become available, enabling a new generation of low-power, high channel count experiments in particle and astroparticle physics. Power savings over other architectures is realized by having nothing operating at the direct sampling rate of interest. Instead, the Switched Capacitor Array sampling is driven by timing generators based upon voltage-controlled delay lines. Stabilization of the timebase and the significant calibration effort required, due to the non-uniform timesteps introduced by CMOS process variations in these delay lines, have limited their more wide-spread adoption in the community. In most of the CMOS processes used, the sample-to-sample time step difference is of order 10-20%, and cannot be neglected in many applications. Especially for applications involving real-time processing of the waveform samples from these devices, splining and resampling the smoothed waveforms on a uniform time grid is computationally very expensive. To address this issue, in the 4th generation LABRADOR ASIC, individual time sample trim DACs have been implemented to tune out this time step variance. Performance results of this hardware-level calibration will be reported.

### FERT2-3: Time Interval Analyzer with FPGA-Based TDC for Free Space Quantum Key Distribution: **Principle and Validation with Prototype Setup**

<u>Q. Shen</u><sup>1,2</sup>, S. Liao<sup>3</sup>, S. Liu<sup>1,2</sup>, J. Wang<sup>1,2</sup>, W. Liu<sup>4</sup>, C. Peng<sup>1,3</sup>, Q. An<sup>1,2</sup>

<sup>1</sup>Department of modern physics, University of Science and Technology of China, Hefei, Anhui, China <sup>2</sup>State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei, Anhui, China <sup>3</sup>Hefei National Laboratory for Physical Sciences at Microscale, University of Science and Technology of China, Hefei, Anhui, China <sup>4</sup>College of Information Science and Engineering, Ningbo University, Ningbo, Zhejiang, China

Quantum Key Distribution (QKD) can provide unconditional secure communication between two remote parties, often called Alice and Bob. Lots of OKD protocols are proposed and the most widely used one is the decov-state assisted Bennett-Brassard 1984 (BB84) protocol, which could utilize a weak coherent state instead of single photon source and still guarantee the security. In such a QKD system, its necessary to accurately detect and record every photon pulse signal's arriving time. Meanwhile, the two parties are often far away separate and need synchronization.

Generally, a high precision Time Interval Analyzer (TIA) can be used to effectively reduce the overall system timing jitter in time tag measurement. With the external laser assisted synchronization scheme, it can also improve the synchronization precision. Furthermore, the QKD system with a better timing jitter, could work with a higher clock rate and smaller measurement timing window. Both will increase the bit rates and reduce the Quantum Bit Error Rate (QBER)

This paper provides a Time-to-Digital Convertor (TDC) based on Field-Programmable Gate Array (FPGA), which can be well utilized as TIAs in QKD experiments. A free space QKD system setup using the FPGA-based TDC has been demonstrated in the paper. The timing performance of the TDC is 50 ps per bin size (LSB) and less than 50 ps RMS. It has a dead time of 10 ns and the dynamic range is 167 ms. The timing precision of the synchronization in the QKD system is 206 ps and the overall system timing jitter is 818 ps. With this TDC circuit, quantum key has been distributed over a 40-km free space link in the Qinghai Lake. The QBER is less than 3% with a final key rate more than 400 bps, which proves that our TDC circuit is capable in the free space QKD experiment.

Furthermore, the multi-channel TDC, and other necessary modules such as the random data module, Laser Diode (LD) control module and system control module, are integrated into a single FPGA, which greatly reduce the system's size and costs.

### FERT2-4: 128 Channels of Multi-GSa/s Waveform Sampling and Digitization in an 800 cm^3 Package

M. Z. Andrew, C. N. Lim, K. A. Nishimura, L. J. Ridley, G. S. Varner High Energy Physics Group, University of Hawaii, Honolulu, HI, United States

Extremely fast timing from Micro-Channel Plate PhotoMultiplier Tubes (MCP-PMTs) and multi-gigasample per second (GSa/s) waveform sampling ASICs will allow precision timing to play a pivotal role in the next-generation of Ring Imaging CHerenkov (RICH) detectors. We have developed a prototype of the electronics to instrument the Imaging Time of Propagation (iTOP) counter for the Belle II detector at KEK in Tsukuba, Japan. The front-end electronics modules consist of an array of waveform sampling / digitizing ASICs controlled by an FPGA. The ASICs digitize signals from an array of multi-anode MCP-PMTs coupled to a quartz radiator bar. Readout and control are done via multi-gigabit-per-second fiber optic links to a custom back-end, where Digital Signal Processors (DSPs) correct for unwanted artifacts in the data before performing feature extraction.

Variants of the modules will be used in other applications besides Belle II, including a novel tabletop neutrino detector, beam size monitoring at SuperKEKB, readout of wavelength shifting fibers for the Belle II K-long and muon system, and a Focusing Detection of Internally Reflected Cherenkov (fDIRC) prototype. Important aspects of the system include thermal management problems in a very compact module, as well as the expected lifetime of the module in the intended high radiation environment(s). Our experiences running these modules as standalone entities with a pulser/laser on the bench have fed into the design of the next version of each component in the system. Cosmic ray tests and running a full system at a Fermilab beam test in late 2011 have contributed to our understanding of needed improvements for the system as a whole.

### FERT2-5: A Stepped-Up Tree Encoder for the 10-ps Wave Union TDC

<u>X. Hu</u><sup>1,2</sup>, L. Zhao<sup>1,2</sup>, S. Liu<sup>1,2</sup>, J. Wang<sup>1,2</sup>, Q. An<sup>1,7</sup>

<sup>1</sup>State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei, Anhui, China <sup>2</sup>Anhui Key Laboratory of Physical Electronics, Department of Modern Physics, University of Science and Technology of China, Hefei, Anhui, China

Wave Union is an effective scheme to achieve the timing resolution of FPGA based TDC beyond its cell delay. One of the challenges in its implementation lies in the encoder of non-thermometer-to-binary (NTH2B), especially with long tapped delay line (TDL). The Wave Union launcher produces pulse train cycle by cycle simultaneously, and the 1-0/0-1 transition in the raw bits is encoded as the fine time. It is important to find the transition within one clock cycle, otherwise the Wave Union launcher has to be stopped, which increases the dead time and the complexity of control logic as well. In this paper, we propose a stepped-up tree encoder (SUTE) algorithm and it has been verified in Xilinx Virtex4 FPGA. With this encoder scheme, we achieved sub-10 ps timing resolution with a conversion rate up to 100MHz for more than 200 bits of NTH2B code.

The SUTE structure is primarily made up of two sequential stages: a Pre-Encoder and a Parallel-encoding. Pre-Encoder locates the coarse position of the transition of the raw TDC bins. We cluster delay cells and process their outputs with the 4-input LUT primitives. The delay cells are grouped every four adjacent delay cells and their states are sent to four LUTs: one (FLAG-LUT) is responsible for edge detecting, and the remaining three LUTs (INSIDE-LUT) are accountable for the position encoding. Anytime there is a Hit edge falls into these four delay cells, FLAG-LUT outputs a valid flag and the INSIDE-LUT gives the position of the Hit. At the same time, an additional LUT (BORDER-LUT) is implemented in case the transition falls into two adjacent groups. In the actual implementation, we floor planned the algorithm with place and route (PAR) constraints to avoid unpredictable signal delays. We also integrate Bubble error correction (BEC) in the implementation. Bubble errors, i.e. zeros surrounded by one (e.g., 000010000) or vice-versa, are resulted from uneven clock signal distribution, uneven propagation delays in the carry-in line and meta-stability problems in FPGA.

The pre-encoded position from the first stage is delivered to the second stage to determine the final position with pipeline method. Parallel encoding algorithm also shortens encoding time, which is a great bottleneck for a higher speed of the system clock, and reduces the TDC dead time without performance degradation.

One typical application of this encoding strategy is the multi-time measurements averaging TDC. We verified the proposed encoder algorithms in a 9-channel Wave Union TDC module implemented in XC4VFX60FFG1152 device from Xilinx Virtex-4 family. The SUTE works well and the achieved TDC timing performance is below 10 ps for all 9 Wave Union channels.

# FERT2-6: A Silicon Diode Based Detector for Radiation Measurement in High Altitude Natural Environment

<u>D. Pantel</u><sup>1</sup>, J.-R. Vaille<sup>1</sup>, F. Wrobel<sup>1</sup>, L. Dilillo<sup>2</sup>, J.-M. Galliere<sup>2</sup>, J.-L. Autran<sup>3</sup>, P. Cocquerez<sup>4</sup>, P. Chadoutaud<sup>4</sup>, F. Saigne<sup>1</sup> <sup>1</sup>IES-RADIAC, Universite Montpellier 2, Montpellier, France <sup>2</sup>LIRMM, Montpellier, France <sup>3</sup>IM2NP, Marseille, France <sup>4</sup>CNES, Toulouse, France

Because of the stars emission, the earth is continually exposed to high energy particles which produce particles shower until the ground. This particles shower is a major problem for aircrafts electronics. Radiation is a cause of issues in electronic devices and become more and more important as the size of the active gates, and especially memory cells, of the devices is decreasing. The HAMLET project started in 2007 at the University Montpellier 2 and aims investigating the natural radiative environment and its effects on electronics. In the framework of this project, we produced a specific small particle detector named LAERTES, which we developed for radiative environment characterization during flight. Based on a silicon diode, the LAERTES detector measures the energy that is deposited during ionizing activities. The latter can be due to direct ionization as with protons or to indirect ionization as in the case of neutrons impact. The active area of the diode is 4.5cm with a 150m thickness. In the diode, the nuclear interaction takes place generating a current pulse on the electrodes, which is proportional to the deposited energy. This current pulse is then converted into voltage by the use of a charge preamplifier. The output of the preamplifier is then routed to two devices. The first one is a microcontroller that filters the input pulses, by using a detection threshold triggering. The second one measures amplitude for each pulse and stores it in a bin of a microcontroller. The major challenge has been building a small and lightweight instrument to be easily used on board of airplanes or balloons. The latest version of LAERTES presents different features such as data storage, event timestamp and adjustable detection threshold. Moreover, the detector can perform real time measurements and also measurements with delayed data transmission. We performed the instrument calibration by using a wave generator and a Californium 252 particle source. This allowed us to correlate the energy deposited in the diode and the microcontroller binning results. Since the two instrument boards were rather small (10x10x3 cm3) and the overall weight lower than 1.5kg, we were already able to use it on several stratospheric balloon flights. Those flights lasted about four hours each, with a maximum achieved altitude of forty kilometers. A comparison between the flight profile of one balloon, which flew at 25 km, and the measurement rate evolution confirmed the previously calculated correlation. The instrument has also been tested on neutron beam experiments. We performed tests at The Svedberg Laboratory in Uppsala using a quasi-monoenergetic 50MeV

neutron beam as well as an atmospheric spectrum neutron beam. We measured the deposited energy spectra in the diode and compared it to simulations performed with the MC-ORACLE code. The experimental results demonstrated to be in good agreement with the simulation.

#### PS4: Poster Session 4

## Friday, June 15 11:05-11:35 Crystal Ballroom PS4-1: A General Self-Organization Tree-Based Energy-Balance Routing Protocol for Wireless Sensor Network Z. Han<sup>1,2</sup>, J. Wu<sup>1,2</sup>, J. Zhang<sup>1,2</sup>, L. Liu<sup>1,2</sup>, K. Tian<sup>1,2</sup>

<sup>2</sup> Than<sup>2</sup>, 5, Wu<sup>2</sup>, 5, Ehang<sup>2</sup>, E. Eha<sup>2</sup>, K. Than<sup>3</sup> <sup>1</sup>Department of Modern Physics, University of Science and Technology of China, Hefei, Anhui, China,230026 <sup>2</sup>State Key Laboratory of Particle Detection & Electronics, University of Science and Technology of China, Hefei, Anhui, China,230026

Wireless sensor network (WSN) is composed by a large number of low-cost micro-sensors to collect and send various kinds of message to base station (BS). It has a wide-range of applications, including military surveillance, disaster prediction, and environment monitoring, so it attracts a lot of attention. Since battery replacement is not an option for network with thousands of physically embedded nodes, energy efficient routing protocol must be employed to get long-life work time. To achieve that, we need not only to minimize energy consumption but also to balance WSN load. Researchers have proposed many elegant protocols. LEACH, HEED, PEGASIS, and PEDAP are typical protocols based on data-fusion. However, LEACH and HEED consume energy heavily in the head nodes so the head nodes tend to die early. PEGASIS which is known as a chain-based energy efficient protocol, has a long time delay. PEDAP consist a minimum spanning tree which has nearly optimal cost. But such a static protocol needs BS to build the topography. On another hand, PEGASIS and PEDAP are typical for the case that relay node should transmit the message include both of its own and its children's, which can not be fused. LEACH, HEED adapt to this case to a certain extent. They are all cluster-based and try to balance the load in such case, but the nodes further from BS still die first.

In this paper, a general self-organization tree-based energy-balance routing protocol (GSTEB) is proposed. This protocol assumes that each node can get its coordinate by GPS or other manners. Through sending query packets for a certain radius, nodes can get the neighbors' information such as coordinates, energy-level (EL), etc. EL is a parameter for load balance. It's a relative and estimated energy value rather than a true one. Each round, BS assigns a new root and broadcast to all nodes. After that, each node selects its parent in parallel by using the EL and coordinate information. The selection criteria are: 1) The distance between parent node and root is smaller than that between root and itself. 2) If root is BS, parent node should have the largest EL among neighbors. If root is a general node, the EL of parent node shouldn't be smaller than its own. 3) The parent node chose should lead to the least energy consumption. A MATLAB simulation shows that use the same model with PEGASIS, each round GSTEB spends only 0.5% extra energy than PEDAP. Because GSTEB is a dynamic and parallel protocol, it can change the root and reconstruct routing tree with shorter delay and less overhead depending on the criteria mentioned above, so a better balance load is achieved, especially for dense nodes deployed. For this model, GSTEB improves the death round of the first node by 150% comparing with PEGASIS. For the other situation that data can't be fused, we compare GSTEB with HEED, result shows that GSTEB improves the death round of the first node by 100% compared with HEED.

### **PS4-2: Real Time Control System of Active Reflector of FAST**

X.-C. Deng<sup>1,2</sup>, W.-Q. Wu<sup>1,2</sup>, M.-C. Luo<sup>1,2</sup>, H.-T. Shen<sup>3</sup>, L.-C. Zhu<sup>3</sup>, P.-Y. Tang<sup>1,2</sup>, J.-J. Liu<sup>1,2</sup>, <u>F. Li<sup>1,2</sup></u>, G. Jin<sup>1,2</sup>, J. Wang<sup>1,2</sup>

<sup>1</sup>Univ. of Sci. & Tech. of China, hefei, anhui, China

<sup>2</sup>State Key Laboratory of Technologies of Particle Detection and Electronics, hefei, anhui, china <sup>3</sup>National Astronomical Observatories, beijing, china

Five-hundred-meter Aperture Spherical radio Telescope (FAST) is a Chinese mega-science project to build the largest single dish radio telescope in the world as shown in Fig.1. It use the karst depression as the site which is large to host the 500-meter telescope and deep to allow a zenith angle of 40 degrees. As a huge scientific device, the supporting structure of the radio telescope FAST demands special requirements beyond those of conventional structures. The most prominent one is that the supporting structure should enable the surface formation of a paraboloid from a sphere in real time through active control. The main object of control system is to build a control network in the karst depression which consists of master control computer, layered control unit and 2300 actuator control nodes, that can drive the actuator to make the reflector surface formation of a paraboloid from a sphere in real time. As known in popular, EPICS (Experimental Physics and Industrial Control System) is a good and easy-use framework for a real time control system which was developed by LANL and ANL initially. EPICS now is used in many scientific device such as large accelerators, large telescopes. In the design of ARS control system, we selected EPICS. The function makeup of ARS control system has three layers. At the lowest level there are Control nodes which can control motor to drive cable and sampling points which can sample the cable tension and node position. The middle level is area control unit which control about 200 nodes at defined area in which the control nodes are in one field bus such as RS485 or CAN. The top level is the ARS Master Control Unit which gets the data of position benchmark from benchmark system and interface with upper system called Central Control System and Control and manage components in the lower levels. In the ARS control system, the design is based EPICS, the lowest level is Control Node and Sampling Point corresponding to the motor control and node position sampling and cable tensions sampling. In the middle level, an IOC is designed for Area Control Units which consists of about 200 nodes. We have finished the IOC in LAB, design a simulator for control nodes and accomplished master control Unit interfacing with CCS, and tested in LAB environment.

#### **PS4-3: IPMI Test Software for MicroTCA Developments**

M. Drochner, P. Kaemmerling, H. Kleines, S. v. Waasen FZJ/ZEL, Juelich, Germany

To support our MicroTCA (AMC) module developments, and for diagnostic investigations in MicroTCA systems, we found a need to develop tools to exercise the IPMI functions of Management Controllers within the system. Based on the freely available "ipmitool" program, we built a framework which allows to communicate with AMC Module Management

Based on the freely available "ipmitool" program, we built a framework which allows to communicate with AMC Module Management Controllers through the Shelf Manager's network interface, without need to tap an IPMB. It contains functions to exercise various IPMI

transactions and to check integrity and correctness of FRU (Field Replacable Unit) information and other data structures published by Management Controllers.

The general structure of the library will be presented, as well as first results and future plans.

# **PS4-4: The Research and Design of the Data Acquisition System and the Control System of KTX** J. An, K. Song, P. Cao, J. Yang

modern physics, the State Key Laboratory of Particle Detection and Electronics, hefei, China

The objective of this paper is to design the data acquisition system and the control system for the reversed field pinch device called Keda Toroidal eXperiments (KTX) which is now been built in University of Science and Technology of China.

KTX is a Chinese special subject relevant to International Thermonuclear Experimental Reactor (ITER) project. Reversed field pinch (RFP) is an important kind of toroidal magnetic confinement device which has been suggested as one of the attractive paths to fusion reactor. The RFP scientific program can address issues relevant not only for RFP, but also more generally for magnetic confinement fusion. the rich phenomena associated with the strong magnetic self-organization in RFPprovides an unusually close connection to a set of important problems in plasma astrophysics.

The data acquisition and control system of KTX includes three subsystems: the master control system, the data acquisition and storage system, the plasma control system. The master control system is the core part of operation scheduling and Centralized management in experiments for monitoring the systems situations and coordinating operations of each module, synchronizing and inspecting. The data acquisition and storage system includes not only hardware modules such as analog to digital converter but also software platform for acquisition control and data access. The plasma control system is used for the control of plasma parameters during experiments. The design of the data acquisition and control system is based on modularized design thoughts and adopt mainstream hardware platform which is easy for maintaining and upgrading.

# **PS4-5: Conception Design and Key Issues on Remote Participation in EAST Tokamak** X. Sun, F. Wang, S. Li, Y. Wang

Department of Computer Application, Institute of Plasma Physics, Chinese Academy of Sciences, Hefei, China

The magnetic fusion experiments keep growing size and complexity resulting in a growth in worldwide collaborations. As the first Tokamak with fully superconductive poloidal and toroidal magnet coils, the EAST Tokamak facility attracts worldwide attentions. It was very costly and time consuming via representatives on site to participation EAST experiment. The main objective of remote participation system is to provide an efficient and economical way to international collaboration. This paper discusses the key technical issues and to gives an overview of the current state of the design of remote participation for EAST Tokamak. The design for remote participation system, which is an integral part of the global control and data acquisition system for EAST, is focus on to provide both safely and conveniently participate in remote site.

# PS4-6: A Prototype GUI for the Multi-Channel Sensor Data Acquisition and Monitoring System of KTX

L. Dong<sup>1,2</sup>, K. Song<sup>1,2</sup>, J. Yang<sup>1,2</sup>, P. Cao<sup>1,2</sup>, D. Mao<sup>1,2</sup>, W. Lv<sup>1,2</sup>

<sup>1</sup>Department of Modern Physics, University of Science and Technology of China, Hefei, China <sup>2</sup>State Key Laboratory of Particle Detection and Electronics, Hefei, China

Abstract The paper presents a highly responsive GUI (Graphic User Interface) for KTX, through the use of a multi-threaded program architecture.

Keda Torus eXperiment (KTX) RFP device is a supporting research for ITER in china. Reversed field pinch (RFP) is an important toroidal magnetic confinement device, which has been suggested as one of the attractive paths to fusion reactor. Compared with other toroidal configurations, such as tokamak and stellarator, the distinctive feature of RFP is its weak toroidal magnetic field. This weak magnetic field yields a string of potential reactor advantages, such as normal magnets, high engineering beta and high mass power density. Specifically the KTX program includes the following ingredients: the physical design and basic plasma diagnostic, the mechanical construction, the power supplies, the operating control system and the data acquisition system. The KTX reversed field pinch program will provide a significant opportunity to advance magnetic confinement fusion.

The data acquisition and monitoring system of KTX deals with about 600 input channels. The maximum sampling rate of each channel can be achieved 2 MSPS/s. Existing solutions to this task are plagued with time-consuming image processing and data monitoring steps. In order to improve the efficiency of the whole process, we designed a parallel architecture for image processing. Because drawing space of a single channel is very limited, the data used in the drawing process would be less than the collected data. To improve the graphics quality and the drawing speed, a method to extract data before drawing is present in this paper. And the tasks are evenly assigned to several threads. This technique is called uniform partition. Necessary inter-thread communication ensures the coordination of the various threads, and results are place in shared memory.

Pipeline is an important parallel technique. It is used in the data monitoring module. The major task is divided into a series of small tasks in the data monitoring module. In the pipeline structure, if a task is completed, then the subsequent task can start immediately. Each level of the pipeline can accept new task after finishing the current task, so the pipeline can increase efficiency.

Reasonable framework makes the program run highly efficient and stable. At the same time, it can make the full and efficient use of the multicore processor s. As the result, the GUI presented in this paper can meet the high demand for data Collection and monitoring, and the user has real-time feedback on settings and results.

#### **PS4-7: Axisymmetric Magnetic Control in ITER**

L. Zabeo<sup>1</sup>, G. Ambrosino<sup>2</sup>, M. Cavinato<sup>3</sup>, Y. Gribov<sup>1</sup>, D. Humpreys<sup>4</sup>, J. A. Snipes<sup>1</sup>, M. Walker<sup>4</sup>, A. Kavin<sup>5</sup>, V. Lukash<sup>6</sup>, G. Vayakis<sup>1</sup>

<sup>1</sup>ITER Organisation, Cadarache, France

<sup>2</sup>CREATE/ENEA/Euratom Association, Universita' di Napoli Federico II, Naples, Italy

<sup>3</sup>Fusion for Energy (F4E), Barcelona, Spain

<sup>4</sup>General Atomics, San Diego, USA

<sup>5</sup>D.V.Efremov Scientific Research Institute, St Petersburg, Russia

<sup>6</sup>Kurchatov Institute, Moskow, Russia

In magnetically confined fusion plasmas feedback control of plasma parameters is assuming an increasingly important role. The complexity of phenomena that occur in the plasmas and the limited number of actuators available require the implementation of sophisticated control systems to achieve adequate quality of plasma confinement. The range of requirements is wide and the relevant timescales can range from a few to hundreds of milliseconds. In addition, the high degree of coupling between control parameters increases the level of complexity that the control systems have to address.

In this respect, the ITER device, which is under construction and will be the worlds largest fusion experiment, doesn't differ conceptually from a range of smaller scale experiments currently in operation across the world. Nevertheless, limited operational space, together with more demanding requirements in machine protection implies more restrictive constraints for the control systems.

One area of particular importance is that of axisymmetric magnetic control of the plasma. ITER will produce a D-shaped toroidal plasma with a tokamak configuration in which the hot plasma will be confined and controlled using magnetic fields generated by a set of superconducting coils that surround the vacuum chamber. Although magnetic control is well understood and well developed, being the basic control required for operation of a tokamak, in the case of ITER new constraints need to be taken into account in the control strategies: long response times due to the thick metal shell of the vacuum vessel, achieving an acceptable level of magnetic noise that can compromise the quality of control, and establishing an acceptable range of controllability are some of the key issues which need to be addressed.

Axisymmetric magnetic control encompasses primarily control of the plasma shape and position. Two controllers are usually envisioned for this purpose: the controller responsible for Plasma Current, Position and Shape and the controller stabilizing plasma vertical displacements which are unstable for plasmas that have an elongated poloidal cross section (Vertical Stability Control). The dynamics of the two control systems differ by almost an order of magnitude in timescale, but the two share some of the magnetic actuators. Additional constraints on the control capability are associated with the quality of the diagnostic data available for the control. Sensitivity of the measurements to the 3D non-axisymmetric components of the required accuracy on the measurements and hence on the magnetic control performance. This paper will illustrate the present status of development of the magnetic control system for ITER and identify the major outstanding issues. Some possible solutions will be presented together with open questions under investigation.

### PS4-8: Present Status of the ITER Real-Time Plasma Control System Development

A. Winter, P. Makijarvi, S. Simrock, J. Snipes, A. Wallander, L. Zabeo ITER Organization, St. Paul lez Durance, France

ITER will be the worlds biggest magnetic confinement tokamak fusion device and is currently under construction in southern France. The ITER Plasma Control System (PCS) is a fundamental component of the ITER Control, Data Access and Communication system (CODAC). It will control the evolution of all plasma parameters that are necessary to operate ITER throughout all phases of the discharge. The design and implementation of the PCS poses a number of unique challenges. The timescales of phenomena to be controlled spans three orders of magnitude, ranging from a few milliseconds to seconds. Novel control schemes, which have not been implemented at present-day machines need to be foreseen, and control schemes that are only done as demonstration experiments today will have to become routine. In addition, advances in computing technology and available physics models make the implementation of real-time or faster-than-real-time calculations to forecast and subsequently to avoid disruptions or undesired plasma regimes feasible. A further novel feature is a sophisticated event handling system, which provides a means to deal with plasma related events (such as MHD instabilities or L-H transitions) or component failure. Finally, the foreseen schedule for design and implementation poses another unique challenge. The beginning of ITER operation is foreseen for late 2020, but the conceptual design activity has already commenced as required by the on-going development of diagnostics and actuators in the domestic agencies and the need for integration and testing. In this paper, a brief overview about the functional requirements for the plasma control system will be given. The main focus will be on the requirements and possible options for a real-time framework for ITER and its interfaces to other ITER CODAC systems (networks, other applications, etc.). The limited amount of commissioning time foreseen for plasma control will make extensive testing and validation necessary. This should be done in an environment that is as close to the PCS version running the machine as possible. Furthermore, the integration with an Integrated Modeling Framework will lead to a versatile tool that can also be employed for pulse validation, control system development and testing as well as the development and validation of physics models. An overview of the requirements and possible structure of such an environment will also be presented.

### PS4-9: Experiences with the MTCA.4 Solution for the EuXFEL Clock and Control System

E. Motuk, M. Postranecky, M. Warren, M. Wing

Department of Physics and Astronomy, University College London, London, United Kingdom

The clock and control (CC) system for the EuXFEL mega-pixel detectors consists of a multi-purpose MTCA.4 AMC card with a Xilinx FPGA and a custom designed Rear Transition Module (RTM) which provides the CC functionality. The system resides in a MTCA.4 crate with the Timing Receiver (TR) board and synchronises the DAQ system to the general EuXFEL timing. This paper presents the experiences with the prototype system in addition to describing the RTM hardware and the CC system firmware in detail. The tests that have been performed to validate the basic and MTCA.4 specifications related functionality are presented first. The next stage of tests involve confirming the system functionality by using the TR board as it would be in the EuXFEL DAQ system and a development board to simulate a Front End Electronics (FEE) unit. The performance metrics in terms of jitter and bit error rates for FEE communication are presented. As a result of the performance tests, the improvements and modifications to the current hardware for the final system are outlined in the conclusions.

# PS4-10: New strategy for the control of low frequency large band mechanical suspensions and inertial platforms

F. Barone<sup>1,2</sup>, F. Acernese<sup>1,2</sup>, R. De Rosa<sup>3,2</sup>, G. Giordano<sup>1</sup>, R. Romano<sup>1,2</sup>

<sup>1</sup>Dept. Scienze Farmaceutiche e Biomediche, Universita'di Salerno, Fisciano (Salerno), Italy <sup>2</sup>Sezione di Npoli, Istituto Nazionale di Fisica Nucleare, Napoli, Italy <sup>3</sup>Dept. Scienze Fisiche, Universita' di Napoli Federico II, Napoli, Italy

Low frequency seismic suspensions (attenuators) and inertial platforms require a careful design not only of the mechanical attenuation stages but also of the control system, especially if a residual horizontal motion better than 10-15 m/sqrt(Hz) in the band 0.01 - 100 Hz is a requirement. One of the most important element of the control system is the tipology of the sensors, whose accuracy, stability, sensitivity and band may constitute a real limitation for the improvement of their performances, especially if very large seismic attenuations are required in the low frequency band. In particular, the present most effective control systems, based on accelerometric sensors (force feed-back configuraton), are mainly limited by the sensor electronics. To try improve the performances of low frequency suspensions (attenuators) and inertial platforms, we introduced a new control philosophy: the control system directly acquires the instantaneous relative positions of the mechanical components through monolithic folded pendulum sensors without any force feed-back (seismometer configuration). In the paper we discuss this new control architecture and the results of the tests on a state-of-the-art mechanical suspension.

### PS4-11: Superconducting Cavities Automatic Loaded Quality Factor Control at FLASH

<u>W. Cichalewski</u><sup>1</sup>, J. Branlard<sup>2</sup>, H. Schlarb<sup>2</sup>, N. Walker<sup>2</sup>, J. Carwardine<sup>3</sup> <sup>1</sup>Technical University of Lodz, Lodz, Poland <sup>2</sup>MSK, Deutsches Elektronen Synchrotron, Hamburg, Germany <sup>3</sup>Argonne National Laboratory, Argonne, USA

The free electron laser accelerator in Hamburg (FLASH) consists of superconducting TESLA cavities controlled through their vector-sum. In this approach, the Low Level Radio Frequency (LLRF) control system includes one feedback controller driving a single microwave klystron providing RF power to 8-16 cavities. The main goal of the LLRF controller is to optimize the accelerating field parameters for best beam acceleration in the superconducting structures. This task is challenging both from the control theory point of view and from taking into account real system limitations and cavity to cavity operating parameters spread. Making use of other actuators, such as the cavity loaded quality factor, Ql, can be beneficial to optimize field parameters in each resonator. The paper focuses on the control of the superconducting cavities Ql by means of automatic adjustments of the input power antenna which couples the RF power to individual cavity. Tuning Ql for each cavity allows for a better control of individual accelerating fields. The paper includes a description of the approach that has been used at FLASH to coupler antenna control but also quality factor optimization are also included. Additionally, tests results are presented together with a description of the operational experience with Ql tuning algorithms during a regular accelerator run.

# PS4-12: Timing and Triggering System for the European XFEL Project - a Double Sized AMC Board

<u>A. Hidvegi</u><sup>1</sup>, P. Gessler<sup>2</sup>, H. Kay<sup>3</sup>, K. Rehlich<sup>3</sup>, C. Bohm<sup>1</sup> <sup>1</sup>Physics Dept., Stockholm University, Stockholm, Sweden <sup>2</sup>European X-Ray Free Electron Laser Facility GmbH, Hamburg, Germany <sup>3</sup>Deutsches Elektronen-Synchrotron (DESY), Hamburg, Germany

The European X-Ray Free Electron Laser (XFEL) [1] is a 3.4km long linear accelerator that will enable new scientific research, by studying structures and events on the nanoscale.

For such a complex machine to operate properly, precise timing and trigger information must be distributed throughout the entire accelerator. This information is used by the monitoring equipments of the accelerator and also by the experiment stations. The phase stability (jitter) of clock signals must be better than 5 ps (RMS), including drifts due to changes of propagation delay in fiber cables, caused by temperature variations.

The system was developed in several steps, starting with an evaluation board to test key concepts, a single-size AMC prototype for  $\mu$ TCA system with two revisions and finally a double-size AMC board for  $\mu$ TCA system.

The double-size AMC board is intended for the final system that incorporates all the functionality that different user groups have requested and all the experiences gained from the previous prototypes. To reduce overall system cost some parts were implemented on daughter-boards. For customization to different application the possibility to use rear-transition modules (RTM) has been added. This presentation will mainly focus on the new double-size AMC board, give an architectural overview and report some performance measurements.

# PS4-13: Secure and Reliable Remote Access for the European XFEL Control System

R. Kammering<sup>1</sup>, <u>C. C. W. Robson<sup>2</sup></u>, C. Bohm<sup>2</sup>, K. Rehlich<sup>1</sup> Deutsches Elektronen-Synchrotron, Hamburg, Germany

<sup>2</sup>Physics, Stockholms universitet, Stockholm, Sweden

The collaboration work between the Stockholm University and DESY in Hamburg regarding the European XFEL is among other things aiming towards to provide access methods to the accelerator control system for the European XFEL project with the possibility of a secure and reliable access from remote places. The requested expansion of the control system has been designed as an extra layer for control and data transmissions integrated with systems for authentication and authorization. This makes it possible to have these remote access methods working side by side with more conventional access methods in the control system. This gives you the possibility to access the control system remotely with traditional, already in use rich clients implemented in Java and expanded with an extra layer for the remote communication or you can use other clients in the future, e.g. web applications. A key focus here is to implement these requirements, what parameters and security measures we are using in order to implement a reliable, secure and easy to use secured access layer and how it is integrated to the

accelerator control system. The system is designed as a Service-Oriented Architecture (SOA) and implemented with the latest web service technology available at present in order to connect the different nodes such as control system nodes, application servers and different clients for control and monitoring with quality of service (QoS). The architecture and the chosen access security manager (OpenSSO) for authentication and authorization will be described in its parts and how they are integrated. We will present our experiences of the architecture so far, what design choices we have found suitable in order to meet the requirements of a modern control system and what performance we have achieved when transporting data and commands in a reliable way.

### **PS4-14: Directive Multi-Channel Beta Probe for Detecting Small Tumors**

S. J. Jeon, J. H. Park, K. S. Joo

Physics, Myongji University, Yongin, Gyeonggido, South Korea

Devices referred beta-probes have been developed to assist surgeons in locating tumor or tumor remnants during surgery. This study was developed for a compact multi-channel beta probe based on Silicon Photo-Multiplier using BCF-12 optical fiber scintillator. The compact multi-channel beta probe that is produced for this experiment is able to distinguish beta-ray count rate and annihilation gamma-ray background count rate. Each detection channel is made of 1mm diameter, 15cm length BCF-12 scintillator and silicon photomultiplier. In order to separate the beta-ray signal from the gamma-ray background, each detection channel is associated with a close channel shielded from beta-ray with a 200 $\mu$ m thickness lead. The probe was evaluated for detecting performance using two kinds of radioisotopes which were Na-22(1 $\mu$ Ci), Cs-137(1 $\mu$ Ci). In order to detect spatial information, line response function was calculated. Line response function of the probe was calculated by 0.5mm stepping Na-22 source. The source was placed at 0.5mm from the front of the probe. The probe has a good detecting efficiency : Na-22 and Cs-137 are measured 15% and 16% at 1  $\mu$ Ci radiation source. The annihilation gamma-ray background is eliminated by a subtraction method. Gamma-ray background subtraction data has 2.79mm FWHM, and non-subtraction has 4.57mm FWHM. This result demonstrates the potential ability of the probe to trace more accurately the small tumor. The beta-probe has been made to be visually small and has efficiency of detecting small tumor as supportive surgical machinery of nuclear medicine.

### PS4-15: MicroTCA for the European XFEL: a Hardware and Software Report

K. Rehlich

DESY, Hamburg, Germany

After more than ten years of operation of the Free Electron Laser FLASH in Hamburg a much bigger accelerator, the European XFEL, is currently under construction. FLASH is controlled by VME hardware. Based on this experience it was decided to use MicroTCA for the much larger XFEL. First complete MicroTCA systems are in operation and the excellent performance in precision analog data acquisition could be demonstrated. The new MTCA.4 specification is a key element to extend the application range of MicroTCA to analog IO with clock and trigger distributions. Full differential signaling significantly reduces the distortions in sensitive analog front-ends caused by the backplane traffic of older parallel busses. And on the other hand MicroTCA provides much higher data transfer speeds. The paper describes results from e.g. the implementation of a complex feedback system. It includes also the software developments for the data processing chain and the remote system management.

# PS4-17: High-Performance Scalable Information Service for the ATLAS Experiment.

S. Kolos

Department of Physics and Astronomy, University if California Irvine, Irvine, California, USA

The ATLAS experiment is being operated by highly distributed computing system which is constantly producing a lot of status information which is used to monitor the experiment operational conditions as well as to access the quality of the physics data being taken. For example the ATLAS High Level Trigger(HLT) algorithms are executed on the online computing farm consisting from about 1500 nodes. Each HLT algorithm is producing few thousands histograms, which have to be integrated over the whole farm and carefully analyzed in order to properly tune the event rejection. In order to handle such non-physics data the Information Service (IS) facility has been developed in the scope of the ATLAS TDAQ project. The IS provides high-performance scalable solution for information exchange in distributed environment. In the course of an ATLAS data taking session the IS handles about hundred gigabytes of information which is being constantly updated with the update interval varying from a second to few tens of seconds. IS provides access to any information item on request as well as distributing notification to all the information subscribers. In latter case IS subscribers receive information within few milliseconds after it was updated. IS can handle arbitrary types of information including histograms produced by the HLT applications and provides C++, Java and Python API. The Information Service is a primarily and in most cases a unique source of information for the majority of the online monitoring analysis and GUI applications, used to control and monitor the ATLAS experiment. Information Service provides streaming functionality allowing efficient replication of all or part of the managed information. This functionality is used to duplicate the subset of the ATLAS monitoring data to the CERN public network with the latency of the order of 1ms, allowing efficient real-time monitoring of the data taking from outside the protected ATLAS network. Each information item in IS has an associated URL which can be used to access that item online via HTTP protocol. This functionality is being used by many online monitoring applications which can run in a WEB browser, providing real-time monitoring information about ATLAS experiment over the globe. This paper will describe design and implementation of the IS and present performance results which have been taken in the ATLAS operational environment.

### **PS4-18: Recent Developments in Control Software for Optical Synchronization Applications at DESY** <u>P. Prędki</u>, T. Kozak, A. Napieralski

Department of Microelectronics and Computer Science, Technical University of Lodz, Lodz, Poland

Proper operation of FELs such as the Free-Electron Laser in Hamburg (FLASH) and the European X- Ray Free-Electron Laser (XFEL), which is currently under construction in Hamburg at DESY, requires many specific subsystems to be synchronized with a precision exceeding 10

femtoseconds. Those components are often separated by several hundred meters or even kilometers, as in the case of the European XFEL. Such distances mean that it is extremely difficult to use only conventional RF signal distribution in coaxial cables for synchronization because of high losses and phase drifts. Electromagnetic interference is also an issue. As an alternative solution, a laser-based synchronization scheme can be employed in parallel. In this case, the signals are transmitted via stabilized optical fibers. Such an architecture is currently being used at FLASH and will also be the main means of synchronization at the European XFEL. The hardware for such a synchronization system consists of many optical elements such as commercial lasers and self-built free-space and fiber optic setups. However, a significant part of it is also the electronics responsible for control, diagnostics and signal processing. Currently, the VME standard is used throughout FLASH for the majority of the control system digital hardware infrastructure. For the European XFEL, however, an architecture with a high level of reliability and availability is required. Because of that, the Micro Telecommunications Computing Architecture (TCA) had been chosen. It is a fairly new standard and it provides significantly better performance and employs modern technological solutions making it more suitable than the older VME architecture.

This paper focuses on the development of specialized control software applied to phase-lock the various lasers and fiber link stabilization units used in the laser-based synchronization system at FLASH. The presented software solutions are hardware-independent and the code is portable to any architecture able to support the Distributed Object Oriented Control System (DOOCS) used at DESY. Therefore tests of the software can be thoroughly performed at FLASH and later seamlessly moved to operate in the European XFEL environment. In this article, the authors first describe the basic block used in all the applications which is a proportional-integral-derivative (PID) regulator implemented in a Texas Instruments Digital Signal Processor (DSP). Later, they focus on the more advanced features of the phase-locking software such as automatic switching between reference signals, error-recovery routines, automatic signal discovery, and tuning. The software can also be used to measure and characterize features of the optical hardware such as the timing jitter of the locked lasers or the arrival time delay for the link distribution units. The measurement results for some of the equipment used are also presented.

### PS4-19: The New Generation of the LHC Accelerator Radiation Monitoring System

<u>A. Masi</u>, M. Brugger, M. Donze', G. Spiezia, P. Peronnard *CERN, Geneva, Switzerland* 

The Large Hadron Collider (LHC) is a complex radiation environment consisting of several particle types at different energies. The RadMon detector has been conceived to measure radiation effects on the electronics in the LHC tunnel and its adjacent shielded areas in order to anticipate possible degradation and identify instantaneous failures of the electronic equipment. For these purposes, the RadMon provides the measurement of the Total Ionizing Dose in silicon by means of RadFets, of the Displacement Damage (DD) in silicon by means of p-i-n diodes, and of the High Energy Hadrons (HEH) and thermal neutrons fluence by counting Single Events Upsets (SEU) of SRAM memory. The measurements are delivered over a WorldFIP fieldbus to central gateways that act as data concentrators exposing the measurement results of each detector to the operation user interfaces via the standard CERN accelerators middleware. Nowadays more than 400 detectors are installed in the LHC tunnel and shielded areas and are connected via 24 WorldFIP segments to 17 gateways providing radiation measurements at 1 Hz reading frequency. The RadMon system is continuously in evolution; new detectors are installed and/or moved during the LHC Christmas shutdowns or the bimonthly technical stops to cross check simulation results, measure radiation in new points where electronics has proved to be sensitive or simply for maintenance purpose. Based on experience accumulated over the last years' operation, a significant system upgrade is being prepared. New RadMon detectors have been conceived, are being produced and gradually will replace the current detectors installed in the tunnel. They are based on a radiation tolerant FPGA that embeds the control of the ADC for the sensors reading and a nanoFip, the slave element for the WORLDFIP bus communication. The main innovative features of this new design are the fully remote configurability, the advanced diagnostic and testing commands, and the possibility to use different types of RadFets and SRAM memory, supplied at various voltages. The use of different sensor types allows the measurement of the TID and DD at various ranges with the necessary resolutions, and, eventually, SEUs generated by HEH and thermal neutrons to be distinguished. Moreover, a new monitoring software is under development to increase maintainability and optimization of the low level communication. The new software architecture, based on an operational and a configuration database, as well as Java tools, will facilitate the installation and/or relocation of new detectors, the system maintenance and operation. In this paper, the operational architecture of the new detector will be discussed focussing mainly on the advantages provided to the LHC monitoring. The new data acquisition software infrastructure will be detailed referring to the implemented approach to making the management and maintenance of the entire system easier.

### PS4-20: High-Precision Accelerator RF Control for the European XFEL

H. Schlarb<sup>1</sup>, F. Ludwig<sup>1</sup>, M. Hoffmann<sup>1</sup>, T. Jezynski<sup>1</sup>, J. Branlard<sup>1</sup>, C. Schmidt<sup>1</sup>, M. Grecki<sup>1</sup>, V. Ayvazyan<sup>1</sup>, S. Pfeiffer<sup>1</sup>, K. Czuba<sup>2</sup>, A. Piotrowski<sup>3</sup>, O. Hensler<sup>1</sup>, W. Jalmuzna<sup>3</sup>, D. Makowski<sup>3</sup>, L. Butkoswki<sup>2</sup>, W. Cichalewski<sup>3</sup>, I. Kudla<sup>1</sup>, J. Piekarski<sup>2</sup>, K. Przygoda<sup>3</sup>, I. Rutkowski<sup>2</sup>, D. Sikora<sup>2</sup>, J. Szewinksi<sup>1</sup>, W. Wierba<sup>1</sup>, B. Yang<sup>1</sup>, L. Zembala<sup>2</sup>, S. B. Habib<sup>2</sup> <sup>1</sup>MSK, DESY, Hamburg, Germany <sup>2</sup>ISE, WUT, Warsaw, Polen <sup>3</sup>DMCS, Uni of Lodz, Lodz, Polen

Fourth generation light sources based on linear accelerator driven Free Electron Lasers (FELs) open new research opportunities in singlemolecule imaging, material science, atomic physics, biology and extremely short timescale X-ray science. Currently, the largest FEL project under construction is the 3.5 km long European-XFEL in Hamburg targeted towards high photon pulse production rate (30000 pulses/sec) with an unrivaled brilliance in the Angstrom wavelength range. The large number of photon pulses is achievable by accelerating the electron beam to 17.5 GeV in a pulsed superconducting accelerator comprised of 100 cryogenic modules each containing 8 nine-cell Niobium cavities cooled to 2 K. To make a cost-effective, reliable, maintainable and scalable system, which meets industrial standards, a new development of the RF controls based on MTCA.4 architecture was started. While most of the RF controls are realized in an external 19 chassis in order to achieve the very challenging RF field detection precision, we could demonstrate that when the appropriate precautions are taken, field detection, RF generation, RF distribution, together with digital DAQ system and the high-speed real-time can be entirely embedded in the MTCA.4 crate system. This ground breaking result of embedding ultra-high precision analog electronics for detection on the Rear Transition Module (RTM) together with the high power digital procession units on the AMC opens up entirely new possibilities for MTCA.4 and is particularly relevant for Free Electron Lasers where the acceleration field precision should be well below 0.01% and 0.01 deg (equivalent to 20 femtoseconds) in amplitude and phase. In this paper, we present the architecture of the superconducting RF control system with various pre-, main- and postprocessing entities for the 2500 RF channels and give an overview of the firmware structure, software architecture and automation.

### PS4-21: The Application of Embedded System in Csns Experimental Control System

J. Zhuang, K. Zhu, Y. Chu, L. Hu, J. Li, D. Jin

Division for Experimental Physics, Institute of High Energy Physics, CAS, Beijing, China

CSNS (China Spallation Neutron Source) is a large scientific plant that will be settled in China and the plan of its construction will be carry out in next 6.5 year. The control system of CSNS is a large-scale open source DCS system. The way that the front controller integrated into DCS is critical in the control system. Traditionally, a single board computer with vxWorks in VME crate is used as IOC (Input/Output controller) to integrate control device into DCS system. Now, the emerging of SOC chip makes the lower cost and more flexible IOC possible. Also, for the sake of reducing cost, the real-time Linux is another option of OS on IOC.

There are two kinds of task on IOC, one is the information exchange task and the other is control task. The requirements of the two tasks are different. Control task requires that it executes at the exactly time. The information exchange task requires that it executes as frequently as it can. The control task may be interfered by frequently executed information exchange task. Generally, to guarantee the balance of these two tasks, we use CPU time planning. The upper limit of net access is set to guarantee the control task performance. Through CPU test, net test, application test, the performance of the embedded CPU is well studied, and the limit can be obtained. The performance and limit are useful for system designing. After these test, different embedded CPU is selected for different application.

The test can be standardized for other application in our system. The test tools are free, and are useful for other system.

# PS4-22: Development of an ATCA Based Data Acquisition System for High Speed, Direct Detection X-Ray Pixel Sensors

<u>J. Joseph</u><sup>1</sup>, D. Contarato<sup>1</sup>, P. Denes<sup>1</sup>, D. Doering<sup>1</sup>, P. McVittie<sup>1</sup>, J. Weizeorick<sup>2</sup> *Lawrence Berkeley National Laboratory, Berkeley, CA, United States* <sup>2</sup>Argonne National Laboratory, Argonne, IL, United States

Large format X-ray pixel sensors operating at frame rates higher than 100 frames per second have driven the need to develop data acquisition systems capable of handling large volumes of acquired data using ultra-fast communication links operating at 10 Gigabit rates. The new generation 1 Megapixel X-Ray cameras, operating at readout speeds of up to 200 frames per second, are capable of producing greater than 400 Megabytes of image data per second. Because these sensors are used in continuous source applications with long acquisition periods (e.g., synchrotron radiation), the acquired data must be reliably processed and stored in real-time to minimize exposure dead time or data loss that could compromise the integrity of the data and thus limit the scientific reach of the experiment. This work describes the development and performance of an Advanced Telecom Computing Architecture (ATCA) based data acquisition system used for high speed, direct detection X-Ray pixel sensors, focusing on the technical challenges and solutions of moving large data volumes through digital signal processing algorithms and to storage arrays in real-time.

# PS4-23: Data Acquisition System Based on Time-Interleaved Analog-to-Digital Conversion for Timeof-Flight Mass Spectrometer

<u>X. Hu</u><sup>1,2</sup>, L. Zhao<sup>1,2</sup>, W. Zheng<sup>1,2</sup>, S. Liu<sup>1,2</sup>, Q. An<sup>1,2</sup>

<sup>1</sup>Modern Physics Department, University of Science and Technology of China, He Fei, An Hui, China

<sup>2</sup>State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, He Fei, An Hui, China

The Time-of-Flight Mass Spectrometer (TOF-MS) is widely used in many domains, such as chemistry analysis, drug research and air quality monitoring. In recent years, the resolution of TOF-MSs is obviously improved benefiting from space focusing, ion reflectron and orthogonal acceleration techniques. To guarantee a good resolution of the whole system, a high-performance data acquisition system is demanded. According to the requirements of the TOF-MS in National Institute of Metrology P.R. China (NIM), a 2-Gsps, 8-bit data acquisition system is designed, which is based on Time-Interleaved Analog-to-Digital Conversion (TIADC) technique. With the TIADC method, the system performance is easily degraded by three main mismatch errors such as gain error, offset error and time-skew error. Therefore, these errors should be corrected. In this system, the values of these errors are firstly calculated using the four parameter sine wave fitting method, and then the gain and offset errors can be corrected by addition and multiplication algorithms, and the time-skew error is corrected based on hybrid filter banks. Since the sampling rate of this A/D system is 2 Gsps, the Nyquist baseband is up to 1 GHz. Considering the analog input signal bandwidth of 1 MHz to 500 MHz, a digital Low Pass Filter (LPF, pass-band frequency: 500 MHz; stop-band frequency: 600 MHz; stop-band attenuation: 55 dB) is designed to suppress the noise above 500 MHz. Tests have been conducted to evaluate the effects of the mismatch error correction and the digital low pass filtering algorithms. As for the data transfer, the Peripheral Component Interconnect (PCI) bus interface is integrated in this system to communicate with the Personal Computer (PC), and the Direct Memory Access (DMA) transfer method is employed to achieve a data transfer rate of 20 MByte/s. According to the test results, the system performance is improved with the mismatch error correction algorithm applied, and further enhanced with the use of the digital LPF. With the mismatch correction, the ENOB is enhanced more significantly with increasing input frequencies (around 1.5 bit better with 500 MHz input). With the digital LPF, the ENOB is further improved by 0.4 bit in the frequency range below 500 MHz. The final test result of ENOB is better than 7 bit in the input range below 180 MHz, and better than 6.4 bit in the frequency range from 180 MHz to 500 MHz, which is good enough for the application requirement. Besides, this system has also been approved to function well according to the initial commissioning test results.

#### FERT3: FPGA and Electronics Applied to Realtime Systems 3

#### Friday, June 15 11:35-12:15 Crystal Ballroom

# FERT3-1: Real-Time Clustering for Pixel Detectors: the DCE3 ASIC for the PXD Detector in the Belle II Experiment @KEK

A. Wassatsch, R. Richter

HLL, Max-Planck-Institut fuer Physik, Munich, Germany

Grouping of data elements based on characteristic relations is known as clustering. It can either be used for data compression in a DAQ chain, or even to calculate the characteristic trigger input values based on event data. Clustering can be done for instance by the well known k-means algorithms. There are also hardware architectures that implements these algorithms in order to address the well known performance issues. But these implementations can only limited be used in hard real-time measurement systems due to the variable data-dependent processing time. For the innermost collision point detection system of the Belle II experiment @KEK/Japan. the so called PXD detector, an appropriate clustering solution was requested. This system is build up in a rotated ladder configuration with two layers, two side readout and a to 75m thinned down active region. Each ladder consist of two 768x250 DEPFET detector pixel arrays with position dependent pixel sizes. Driven by the requirements of our PXD detector, a new real-time engine for direct neighborhood clustering was therefore developed. This software-inspired hardware architecture, synthesized to a TSMC 65nm technology, is able to perform by a pipelined structure up to 50k times per second the full 2D clustering of the zero suppressed data out of a single detector pixel array with a up to 3% fill rate and only one frame latency. To achieve this performance, fully parallel operating clustering nodes were combined with binary tree based control structures. Each pixel out of the zero suppressed data stream will be dynamically assigned to an individual clustering node. A two step approach is utilized to reach the single frame latency. In the first step the cluster networks are build up during the pixel data read-in. The internal cluster networks are represented by pointer like hardware structures. In the following step these cluster networks are read-out by a stack based traveling algorithm. Due to the scalable architecture of the clustering core, the engine can be easily adapted to the specific needs of other target applications, even to 3D or higher dimensional operation.

Simulated experiment data with physics events and also the estimated corresponding background was been used for the HDL simulation based verification of the correct behavior of the clustering core. After the successful verification a first test chip with a limited amount of clustering nodes was submitted in a low power version of the target process technology. This test chip is back from the production and goes now in initial tests. After the successful verification of clustering nodes has to be submitted end of this year to keep the tight time schedule of the experiment preparation alive. Furthermore the corresponding clustering carrier board (CCB) for the integration of the clustering ASIC in the PXD data acquisition system is in parallel under development.

### FERT3-2: Quantization Analysis of the Infrared Interferometer of the TJ-II for Its Optimized FPGA-Based Implementation

L. Esteban<sup>1</sup>, J. A. Lopez<sup>2</sup>, E. Sedano<sup>2</sup>, M. Sanchez<sup>1</sup>

<sup>1</sup>Fusion por Confinamiento Magnetico, Centro de Investigaciones Energticas Medioambientales y Tecnolgicas, Madrid, Spain <sup>2</sup>Ingeniera Electronica, Universidad Politecnica de Madrid, Madrid, Spain

Infrared interferometers are used in magnetic fusion devices for measuring the line-integrated-electron density of the plasmas. An FPGA-based processing system is currently being used in the TJ-II infrared interferometer to compute the line-integrated electron density. In high performance Digital Signal Processing (DSP) applications, the computations carried out in the FPGAs are usually performed in fixed point. The floating-point values of the algorithm description must be quantized to their fixed-point counterparts, introducing some deviations with respect to the unquantised case. These deviations are modelled as Round-Off Noise (RON) sources, whose effects are propagated through the different parts of the system. Thus, these quantization operations significantly affect to the maximum attainable system performance. In the TJ-II system, the amount of RON is a limiting factor [1]. Therefore, its analysis and reduction is essential to perform the control operations in real time.

The effects of the RON have been traditionally estimated using profiling-based methods. These methods are capable of providing accurate results, but at the expense of extremely long simulation times. Hence, some analytical frameworks have been developed to provide fast and accurate estimates of the quantization effects of the digital implementations. However, existing approaches are not accurate in the general case of non-linear systems with feedback loops.

To cope with this limitation, a novel technique based on Modified Affine Arithmetic (MAA) and Polynomial Chaos Expansion (PCE) has been recently developed to provide fast and accurate analysis of the RON [2]. This technique is based on decomposing the random variables into weighted sums of Legendre orthogonal polynomials. Using this method, the contributions of the random signals are propagated through the non-linear system, while the correlations among them are preserved.

This paper will provide and discuss a variety of quantization configurations for the TJ-II infrared interferometer using Legendre PCE. The theoretical background of the PCE will be introduced first. Next, our approach will be applied to the algorithms of the TJ-II to provide the set of optimal word-lengths of the variables and the functional units. Finally, the results of the implementation of the algorithms in FPGA devices will be presented. In addition, it will be shown that our approach obtains speedups between 8 and 400 times faster with respect to profiling for several testbenches [2].

 M. Snchez, L. Esteban, P. Kornejew, M. Hirsch. Admisible Crosstalk Limits in a Two Color Interferometer for Plasma Diagnostics. AIP Conference Proceedings, 993(1):187-190, 2008.

[2] L. Esteban, J. A. Lpez, A. Fernndez, C. Carreras, G. Caffarena, O. Nieto-Taladriz, M. Snchez, Round-off-Noise Estimation of Fixed Point Architectures using Polynomial Chaos Expansion. IEEE Transactions on Circuits and Systems I, 2012 (under review).

# **Closing: Session**

Friday, June 15 12:15-12:45 Crystal Ballroom Closing-1: RT2014 <u>M. Nomachi</u> Osaka University, Osaka, Japan Closing-2: Closing Talk <u>S. Zimmermann</u>

LBNL, Berkeley, USA