Skip to main content

Revolutionizing Database Technology with GSFRS

 Revolutionizing Database Technology: GSFRS

Written by Gitika Gorthi, Chantilly High School


Has there ever been a day where you were free from the usage of all electronic devices? Won’t the day feel dry without some form of digital juice? A lot of the data dealt with these days are all classified as big data, collections of large and diverse data that is more sizable than traditional databases. Have you ever had a moment when a large software took a long time to load or open a particular file, and the computer seemed stuck on repeat at that one page? If so, you are not the only one. Whether it be students, researchers, or even technologies such as rovers, fast data processing, and analytics is always a desirable outcome. What if we told you that there is a novel, portable, and highly efficient rapid data access data tool that can allow near-real-time access to any part of a large-sized data file? You might be thinking, is that possible? The answer is yes, it is possible. A tool named Giant Signal File Random Samples (GSFRS), a software developed for other softwares, is the solution to the slow processing concerns we are facing today. GSFRS is a tool that loads small parts of data without loading the entire file into the computer memory unlike present-day practices through two fundamental steps: indexing and parsing.


You might wonder, how does the GSFRS help the current technology we have now? How can it be applied in the real world? How does it impact our planet? Lately in the news, information on the recently landed National Aeronautics and Space Administration (NASA) Perseverance Rover on Mars has been historical and many new discoveries are hoped to be made on the red planet. However, how does the rover move? The main power source for rovers are their multi-panel solar arrays; however, when there is no solar energy, like at night, the rovers contain two rechargeable batteries, which eventually degrade. Faster input and output of data using GSFRS as a software processor will enable a lower energy footprint and thus will preserve the non-renewable power, which can then be utilized for other functions. And that is how a longer-lasting rover can be developed. If that did not convince you of GSFRS’s potential, let’s talk about submarines. Just in the United States, 71 submarines are currently active and in order to keep them running abundant energy is required. The electrical equipment on submarines is usually run off batteries, and therefore if less batteries are consumed, more energy will be saved. The GSFRS reduces the number of computations that have to be made due to its high portability for electronic equipment, reducing battery activity.



       Figure 1: NASA Mars Perseverance   Figure 2: Submarine electricity production systems




Now the question that may come to mind is how does saving energy by a few minutes or seconds really make that big of a difference? The GSFRS being used at once for one appliance may only save a small, almost insignificant percent of energy; however, if the GSFRS is used long-term in most appliances and current softwares --both on Earth and in space--, the amount of energy saved can be enough to significantly reduce global warming. Imagine, current data collected displays that annually 5,000 hours are spent on electronic devices; if this number can be reduced due to increased productivity, imagine the large positive change it can create.


With Earth Day that recently happened on April 22nd, the importance of protecting our planet and decelerating climate change has been on a lot of our minds. Climate change is the drastic change in the average weather in a particular location, and a major factor of climate change is human emissions of greenhouse gasses. By reducing the daily energy used, the electricity needed to be produced also decreases, which in turn reduces one’s carbon footprint and release of greenhouse gas emissions. More than we realize, our footprint on this planet is large, and to minimize our footprints, utilizing software such as GSFRS to tackle big data processing in other software will save much energy.



Figure 3: Simple diagram of how in the long run the climate can be positively impacted through the large use of GSFRS.



Data centers that account for a large amount of energy costs and usage are facilities such as the Ernest Orlando Lawrence Berkeley National Laboratory, U.S. Energy Information Administration, and Resources for the Future. The table below illustrates one report from the Berkeley National Laboratory on energy consumption, and it demonstrates that energy consumption has and will continue to exponentially increase over the years. It is our job to stay under the predicted curves through proactive measures, such as less energy usage and screen time.



Figure 4: Estimates include energy used for servers, storage, network equipment, and infrastructure in all U.S. data centers. The solid line represents historical estimates from 2000-2014 and the dashed lines represent five projection scenarios through 2020; Current Trends, Improved Management (IM), Best Practices (BP), Hyperscale Shift (HS), and the static 2010 Energy Efficiency.



By behaving as a tool that samples data without loading the entire file in the memory unlike present-day practices, GSFRS provides an efficient method of processing only the necessary information. Random accessibility makes various parts of a data source available to parallel processing in a multi-threaded environment and thus helps us make use of optimal hardware resources. Apart from the algorithm, the highly sophisticated features of GSFRS are crafted by carefully utilizing the very modern C++20 standard with features like move semantics, filesystem, lambda function, and multi-threading.


GSFRS is revolutionizing in its own way, and hopefully, you were able to gain a glimpse of its importance.

Comments

Popular posts from this blog

Speeding up Model Training with Multithreading and GSFRS

S peeding up Model Training with Multithreading and GSFRS                   written by Rahat Ahmed Talukder , Notre Dame University Bangladesh                  We live in a multicore universe where great things can happen in parallel. Parallel processing is equivalent to enormous performance gain. Organized parallelism is how our own body works through dynamic bit organized activation of billions of single neurons. Everybody wants to parallelize a workload done on a data frame. In the machine learning (ML) lifecycle, different workloads are parallelized across a large VM. This allows you to take advantage of the efficiency of the VM and maximize the use of your notebook session. Nonetheless, many of the machine learning or scientific libraries used by data scientists ( Numpy, Pandas, sci-kit-learn,...) release the GIL, allowing their use on multiple threads. It is important to keep in mind that when our dataset is large, threads are more practical than processes because of the possible

Catalyzing A Data Science Revolution: SOCKS + GSFRS

  Catalyzing A Data Science Revolution: SOCKS + GSFRS Written by Gitika Gorthi, Chantilly High School Why the technologies Giant Signal File Random Sampler (GSFRS) and Statistical Outlier Curation Kernel Software (SOCKS) ? How will they benefit you in achieving your data science goals? “Data is the new fuel of the digital economy” or can be viewed as the new gold; harnessing and accurately decoding the meaning of the numbers is crucial to increase two types of efficiency for organizations: speed and accuracy. GSFRS addresses speed and SOCKS addresses accuracy, coupled together, make the power team. Suppose you are a data analyst and you are assigned to take a bunch of numbers and make sense of them -- I know your reaction, you are most probably scared. But don’t worry, we have artificial intelligence to the rescue in order to develop algorithms to do the hardwork for us (yay!). Now the question is, is the program really telling us the right information? Some may trust blindly whatever

GSFRS : The Story of a Gigantic Random Sampler by Dr. Prasanta Pal, Brown University

                                                            What is GSFRS ? All about data and more data! We want stuff! A lot of stuff! Often, more stuff than we can handle. These days, with everything turning digital, it means, we are looking for a lot of data which in disguise means trouble! Let's understand the kind of of troubles we may be asking for through some stories. Suppose you've been collecting radio signals from the alien world and storing it in a gigantic file (call it the big book of universal secrets) for last million years. By now, the size of the book got so big that you started counting it in exabytes! (10^18 bytes). If someone tells you that the secret equation for time-travel is buried somewhere around the 3 trillionth line and you want the secret code right now because a giant asteroid is about to hit the earth, there is no other way around but to time travel! How do you retrieve the time-travel code from "The big book of universal secrets" at