The pilot release of eMarketplace, a new online procurement system for buying off-the-shelf commodity items, will occur on Monday, June 20. The pilot release will include members of Computing, TD and Finance. The second release, for the rest of the lab, will occur in late summer. Through the eMarketplace, users will be able to order items from suppliers such as Grainger, McMaster-Carr, Newark, Epic (for office supplies), and others. The eMarketplace will help users quickly and easily find products, enable price comparison between suppliers, leverage pre-negotiated supplier prices and provide reduced costs to the lab. Learn more on the eMarketplace website, http://emarketplace.fnal.gov. Individuals designated by senior management who will use the eMarketplace system directly have been contacted by email regarding upcoming required training sessions.
The Computing Strategic Plan is now posted in DocDB for all Computing employees to read through and reference.
As part of the ESH&Q reorganization, Amy Pavnica will have lab-wide responsibilities in addition to her role as the Computing ESH&Q division manager. For now, her office will still be located in FCC. As part of this change, there is no longer a general safety task budget. Safety equipment purchases will be made through the department budgets.
Come check out the digital sign in the FCC lobby area (on the lower right monitor). Do you have cool demos or graphs that you want to showcase on our digital sign or have suggestions for things you'd like to see? If so, please send them to Fang Wang at firstname.lastname@example.org.
Free posters of FCC, as pictured to the right, are available in FCC 1 West office 128. Stop by if you would like one!
Retired Fermilab historian and CCD employee Adrienne Kolb will be the colloquium speaker on Wednesday, June 1 at 4 p.m. in One West. Her presentation is titled "The Mysterious Death of the SSC: They All Did It." Read the abstract here.
(5, 10, 15 & 20+ years)
Adam Walters- 36 years
Jean Reising- 32 years
Marc Mengel- 25 years
Eric Neilsen Jr.- 25 years
Roger Slisz Jr.- 25 years
Matt Crawford- 24 years
Quinton Healy- 20 years
John Hendry- 15 years
Dave Dykstra- 10 years
Computing employees attended and presented at the National Laboratories Information Technology (NLIT) summit from May 1-4 in Albuquerque, New Mexico. Their presentations are posted under the NLIT 2016 event page on DocDB.
SCD's Liz Sexton-Kennedy, Patrick Gartung and Marc Paterno attended the HEP Software Foundation Workshop from May 2 through May 4 in Orsay, France.
The HEP Software Foundation group aims to be relevant to both the IF and LHC experiments. Paterno gave a presentation about a performance analysis case study. Sexton-Kennedy and Benedikt Hegner from CERN lead the packaging working group. Gartung’s presentation on Spack, a software package build manager developed for supercomputers and also used in HEP, was very well received. Jim Amundson has a project within SCD to make Spack a useful tool for the IF.
Now playing in the FCC lobby:
"Driving IT Value- Working on the Right Priorities," Tammy Whited, Pink16, Las Vegas, Nevada, Feb. 15.
"Bringing Federated Identity to Grid Computing," Dave Dykstra, CISRC 16, Oak Ridge National Laboratory, April 6.
"Deploying a CMDB," Krysia Jacobs, NLIT 2016, Albuquerque, New Mexico, May 2.
Photos of the month: Guess who?
Can you guess who the following caricatures are? Click on each photo to see the answer. The caricatures were done at NLIT.
Lifting is a common task for many. But if performed improperly, lifting even light objects can lead to injury. According to the National Safety Council, 25 percent of all occupational injuries are related to improper lifting. Most of the injuries are sprains and strains to the back.
To help eliminate these injuries, the National Safety Council recommends the following:
- Get fit and stay fit.
- Make sure that you have a good grip on the item.
- Test the load before lifting. If it is too heavy, get help from a co-worker.
- Keep items close to your body. Keep your feet close to the load and point your feet in the direction of the load.
- Lift the load using the muscles in your legs, not your back.
- Twist your back while lifting.
- Lift while in an awkward position.
- Extend arms out while lifting.
- Lift an object that is too heavy – ask for help!
From the CIO: Under the operations microscope
Chief Information Officer Rob Roser
While relatively few were aware, last month the lab went through the most extensive operations review in more than a decade. Eighteen experts from around the world, plus eight from DOE’s Office of High Energy Physics (OHEP) came together for three-and-a-half days to evaluate how well we operate our facilities. The review team not only looked at the obvious operations areas―accelerator, detector and scientific computing operations―but they also evaluated us in terms of detector and accelerator test facilities, computing facilities, IT and finance. In short, they looked at the lab broadly. Their charge can be boiled down into: making sure we are efficient, ensuring we are supporting facilities that are not duplicated in another Office of Science lab and evaluating how well we execute our mission.
For Computing, I gave a 45-minute talk on IT that was focused on governance and how the OCIO supports both CCD and SCD efforts with the same people and services. However, the review focused on scientific computing rather than IT. Barb Helland, deputy head of ASCR and, thus, an individual well positioned to understand what we are trying to do, led the computing review team. She was joined by Ian Bird, head of scientific computing at CERN, and Chip Watson, who leads the high performance lattice QCD effort at JLab. Panagiotis gave an overview talk on computing on day one, and then he and his team spent day two in parallel sessions, where we delivered 10 separate talks. These breakout talks included topics on cyber security, operating our computing facilities, HEP Cloud, scientific workflows, FIFE, software tools and applications, artDAQ, software strategy and evolution of our workforce and workforce skills. By the end of the second day, the review team had an excellent picture of what SCD is all about and what our strategy is to operate in a productive and efficient manner.
For those unfamiliar with these reviews, next come the questions. While there were inquiries throughout the breakout talks, the review team decided to skip the pre-arranged tours and spend a few hours with Panagiotis and his team in open conversation. This was very interesting because it gave us insight into how outsiders who are experts in our field look at the challenges we are facing, and, based on the types of questions they asked, perhaps their approach or opinion. It was a lively two hours that went by very fast.
In the end, the lab did very well in this review, and I think Computing was reviewed exceptionally well. Here are some highlights for Computing:
- The review team feels that we are lean in staffing to deliver the kind of support that we want to provide to the community.
- Barb complimented our team on our ability to deliver on a common software stack. She said that the level of success we have had is unprecedented within the Office of Science. Also, while we are making the right decision to support services over aging hardware, she told OHEP that Computing needs about one-half to one million dollars in funding to ensure that our computing facility will meet steady-state future demand.
- Finally, the review team supported the concept of HEP Cloud and the advantages that it will afford both Fermilab and HEP.
I would like to thank Panagiotis and his review team for their excellent preparation for this review. And kudos to all of Scientific Computing Division for the recognition of your outstanding work!
Heroes of NOνA: SCD increases NOνA uptime to 100%
This graph shows the detector uptime before and after Real-Time Software Infrastrcture group resolved the DAQ software issues. The light blue line represents the uptime percentage. The dark blue line represents the percentage of time the detector is on. Since fixing the two applications, the light blue line is consistently at 100 percent whereas previously it fluctuated greatly.
The ‘Hero of NOνA’ Medals may not be real medals, but the work done to earn them had a very real impact. Ron Rechenmacher, Eric Flumerfelt and Kurt Biery from SCD’s Real-Time Software Infrastructure (RSI) group solved a challenging data acquisition software problem that was preventing the NOνA detector from reaching maximum uptime and were therefore “awarded” the honorary virtual medals.
Since the detector is very large, NOνA scientists started running it in phases while finishing construction. As more of the detector started running, DAQ crashes began occurring. After working on the problem for a year, NOνA asked RSI, which supports the DAQ software for many experiments, to work on it.
NOνA, as Fermilab’s flagship experiment, is the primary consumer of the neutrino beam. The energy and manpower required to generate and watch the beam necessitates maximum uptime, which is the percentage of time that data can be taken while the beam is being generated. Until a few months ago, the uptime hovered between 85 and 90 percent.
The issues that caused the crashes originated in two software applications: the Data Logger, which collects data and writes it to disk when a trigger occurs, and the Application Manager, which monitors the experiment’s computer farm and DAQ system. After a certain amount of time running, the Data Logger would crash and stop writing to disk. Additionally, the Application Manager could not manage more than about one-third of the computers in the DAQ cluster without also crashing. As a result, the length of time the data could be stored in buffers waiting for a trigger was reduced.
The challenge was identifying what was causing the problems. The standard tools didn’t point to the cause, so the team recreated the whole system in a test environment.
This was not easy, as the NOνA test stand consists of three computers, while the NOνA Far Detector consists of 300. To make the test area accurate and usable, the team identified the essential pieces of the 300 systems and brought those into the test space. They then worked to recreate the issues by stressing the systems and randomizing interactions between them thus drastically condensing the timeframe in which issues might occur. The team weeded out false leads until they were able to definitively determine the real root cause of the issues.
In the end, the team fixed the bugs in the code that were causing the issues. With help from RSI, NOνA now operates at 100 percent uptime on a regular basis.
Network and Communication Services/Telecommunications
I was hired into the Telecommunications Department as a clerk back in 1999 and haven’t left! I had been living in Batavia for six years at the time and knew nothing about Fermilab or the work done here other than they often had fires as I could see the smoke from my front yard.
Almost 17 years later as the program administrator for Telecom, I keep track of the cellular models and plans available to us from AT&T and Verizon, and I provide employees with the necessary paperwork for purchasing these devices. I track and monitor cellular usage and make plan and feature changes whenever I can to keep the cellular costs as low as possible for the lab. I purchase all the telephony equipment, process the monthly chargebacks, record all Telecom equipment into the Sunflower inventory database, program features onto landline extensions, create voice mailboxes, modify the Telecom webpage and the online Yellow Pages, apply screen protectors to cell phones, program pagers and fix their broken clips. I like to do a little of everything, and for that, my job is great!
I will celebrate 30 years of marriage this summer, have two beautiful grown children, have lived all my life in the Chicagoland area, and am a proud member of Team Stark!
Scientific Computing Services/Scientific Distributed Computing Solutions/User Support for Distributed Computing
I came to Fermilab just over a year ago from the nuclear power industry, for which I performed computational engineering analyses. My true passion, however, is software development and data visualization. This brought me to the User Support for Distributed Computing group. In addition to directly supporting users of the FIFE batch system, I develop and maintain tools for monitoring grid jobs and resources. Over the past year, I have driven the development of the Fifemon application, which provides users near-real-time views into the status of their batch jobs and provides grid administrators views into the state and health of grid resources. This comprehensive monitoring has helped increase the scientific computing throughput for all FIFE experiments.
Under the umbrella of the Landscape program, I am involved in several projects beyond the FIFE ecosystem. One of these projects is to develop new monitoring for the cutting-edge HEP Cloud project, which is expanding HEP computing beyond traditional computing facilities and into the cloud. I support the Gratia grid accounting software used by the Open Science Grid, and I am developing the next generation accounting platform, GRACC.
Outside of work, I enjoy woodworking, gardening and shooting sports, and in July, I will be getting married to my wonderful fiancée, Leah.