Multi-threading in Node.js.
Originally, JavaScript was designed for simple tasks on the web, such as form validation or creating mouse trails. It wasn’t until 2009 when Ryan Dahl, the creator of Node.js, made it possible for developers to use JavaScript for backend development.
Backend languages typically support multithreading and offer various mechanisms for synchronizing values between threads and other thread-related features. However, implementing these capabilities in JavaScript would require changing the entire language, which was not Dahl’s original intention. To work around this limitation and enable multithreading in JavaScript, he had to devise a workaround. Let’s delve into the details…
Understanding the Inner Workings of Node.js
Node.js operates based on the single-threaded event loop paradigm. In order to fully grasp how Node.js functions, it is crucial to comprehend the concept of threads within Node, the event loop that forms the core of Node.js, and have a clear understanding of its basic architecture, including whether it is single-threaded or multi-threaded.
Threads in Node.js
In Node.js, a thread refers to an independent execution context within a single process. It is a lightweight unit of processing that can operate concurrently with other threads within the same process. Each thread has its own execution pointer and stack, but shares the process heap.
Node.js employs two types of threads: a main thread managed by the event loop, and multiple auxiliary threads in the worker pool. In the context of Node.js, the terms “auxiliary thread” and “thread” are used interchangeably to refer to worker threads.
The main thread in Node.js is the initial execution thread that is launched when Node.js starts. It is responsible for executing JavaScript code and handling incoming requests. On the other hand, a worker thread is a separate execution thread that runs in parallel with the main thread.
Does Node.js Operate in a Multithreaded or Single-Threaded Manner?
The term “single-threaded” refers to a program that has only one thread of execution, allowing it to perform tasks sequentially. On the other hand, “multi-threaded” implies a program with multiple threads of execution that can perform tasks concurrently.
Each thread operates independently and task allocation is managed by the operating system. However, both approaches have their challenges. In single-threaded processes, tasks are executed in sequence, and a blocking operation can delay the execution of other tasks. In contrast, in multi-threaded processes, synchronization and coordination between multiple threads can be challenging.
Node.js is considered single-threaded as it has a single main event loop that processes JavaScript operations and handles I/O. However, Node.js provides additional features that, when used correctly, can offer advantages similar to multithreading. To understand how Node.js achieves this and how to handle the challenges associated with this approach.
The main element in Node.js’ single-threaded architecture is the event loop, which makes Node.js powerful despite being a single-threaded runtime. As mentioned before, there are two types of threads in Node.js, with the main thread utilizing the event loop.
The event loop is a mechanism that registers callbacks (functions) to be executed in the future and operates in the same thread as the JavaScript code. When a JavaScript operation blocks the thread, the event loop is also blocked.
The worker pool is an execution model that spawns and manages separate threads, which synchronously perform tasks and return results to the event loop. The event loop then executes the provided callback with the result. The worker pool is primarily used for asynchronous I/O operations, such as interactions with the system’s disk and network, and is implemented in libuv. Although there may be a slight delay when Node.js needs to communicate internally between JavaScript and C++, it is hardly noticeable.
With both the event loop and the worker pool mechanisms, we are able to write code that handles asynchronous operations effectively in Node.js.
fs.readFile(path.join(__dirname, './package.json'), (err, content) => {
if (err) {
return null;
}
console.log(content.toString());
});
The fs module mentioned earlier instructs the worker pool to use one of its threads to read the contents of a file and notify the event loop when it’s done. The event loop then executes the provided callback function with the file contents. This example illustrates non-blocking code, where we don’t have to wait synchronously for a task to complete. Instead, we tell the worker pool to read the file and call the provided function with the result, allowing the event loop to continue executing other tasks while the file is being read.
However, there may be situations where synchronous execution of complex operations is needed. Functions that take a long time to run can block the thread, potentially decreasing the throughput of the server or even causing it to freeze. In such cases, delegating the work to the worker pool may not be possible.
Fields that require complex calculations, such as AI, machine learning, or big data, have traditionally faced challenges in using Node.js efficiently due to the operations blocking the main (and only) thread, resulting in an unresponsive server. However, this changed with the introduction of multiple thread support in Node.js v10.5.0.
Introducing "worker_threads"
The worker_threads module is a package that enables the creation of fully functional multi-threaded Node.js applications.
A thread worker is a section of code typically extracted from a file and executed in a separate thread.
It’s worth noting that the terms thread worker, worker, and thread are often used interchangeably and refer to the same concept.
To begin using thread workers, we need to import the worker_threads module. We can then create a function that assists us in spawning these thread workers, and further discuss their properties.
type WorkerCallback = (err: any, result?: any) => any;
export function runWorker(path: string, cb: WorkerCallback, workerData: object | null = null) {
const worker = new Worker(path, { workerData });
worker.on('message', cb.bind(null, null));
worker.on('error', cb);
worker.on('exit', (exitCode) => {
if (exitCode === 0) {
return null;
}
return cb(new Error(`Worker has stopped with code ${exitCode}`));
});
return worker;
}
To instantiate a worker, we need to create an instance of the Worker class. The first argument should be a file path that contains the code for the worker, and the second argument should be an object that includes a property called workerData. This workerData is the data that we want the thread to have access to when it starts executing.
It’s important to note that regardless of whether we are using JavaScript directly or a language that transpiles to JavaScript (such as TypeScript), the file path should always point to files with .js or .mjs extensions.
I would also like to highlight why we are using a callback approach instead of returning a promise that would be resolved when the message event is triggered. This is because workers can emit multiple message events, not just one.
As demonstrated in the example above, communication between threads is event-driven, meaning we are setting up listeners to be invoked once a specific event is sent by the worker.
Here are some of the most common events:
worker.on('error', (error) => {});
The error event is triggered whenever an unhandled exception occurs within the worker. As a result, the worker is terminated, and the error can be accessed as the first argument in the provided callback function:
worker.on('exit', (exitCode) => {});
The exit event is emitted when a worker exits. If the worker called process.exit(), the exitCode will be provided to the callback function. If the worker was terminated using worker.terminate(), the exit code would be set to 1:
worker.on('online', () => {});
The message event is emitted when a worker sends data to the parent thread. Now, let’s examine how data is shared between threads.
There are two approaches to utilizing workers:
There are two approaches to leveraging worker threads in order to harness the benefits they provide.
The first approach is to spawn a worker, execute its code, and send the result back to the parent. However, this approach has significant overhead costs, including the creation of a new worker thread, the memory overhead of managing each thread, and the resources required to start and manage threads. While tasks can be accomplished using this approach, it may not be efficient, especially in large-scale Node-based systems. To address the challenges associated with this approach, a second, more commonly used industry practice is often employed.
The second approach is to implement a worker pool, which mitigates the drawbacks of the first approach by creating a pool of worker threads that can be reused for multiple tasks. Instead of creating a new worker thread for each task, a pool of workers is created, and tasks are assigned to them.
In technical terms, a worker pool can be considered an abstract data type that manages a pool of worker threads. Each worker thread in the pool is assigned a task, and the thread executes the task in parallel with other threads.
There are multiple ways of assigning tasks within a worker pool, and the pool acts as a manager by distributing tasks to the worker threads, collecting results from them, and facilitating communication among the threads within the pool.
Implementing a worker pool can involve using different data structures and algorithms, such as task queues and message passing systems. The choice of a specific data structure depends on various factors, including the number of worker threads required, the nature of the tasks, and the level of communication needed among the threads.
Implementing the worker pool
In Node, a worker pool can be implemented using built-in features or third-party tools. The node’s built-in worker-threads module provides support for worker threads, which can be used to create a worker pool. Additionally, there are several libraries available that can complement the worker pool by providing high-level APIs for worker threads and additional support for task scheduling and thread management.
These libraries automate the process of scheduling tasks and managing threads, making it easier to implement a worker pool. To illustrate, here is an example code that utilizes the built-in worker-threads feature of Node:
const { Worker, isMainThread, parentPort } = require('worker_threads');
if (isMainThread) {
// Main thread code
// Create an array to store worker threads
const workerThreads = [];
// Create a number of worker threads and add them to the array
for (let i = 0; i < 4; i++) {
workerThreads.push(new Worker(__filename));
}
// Send a message to each worker thread with a task to perform
workerThreads.forEach((worker, index) => {
worker.postMessage({ task: index });
});
} else {
// Worker thread code
// Listen for messages from the main thread
parentPort.on('message', message => {
console.log(`Worker ${process.pid}: Received task ${message.task}`);
// Perform the task
performTask(message.task);
});
function performTask(task) {
// … operations to be performed to execute the task
}
}
The code above consists of two parts: one for the main thread and the other for the worker thread. In the main thread portion, necessary members are imported from the module, and if the current execution context is in the main thread, an array is created to store four workers. Subsequently, a new message with a task to be performed is sent to each of the worker threads.
In the worker thread portion, messages from the main thread are listened for using the on
method of the parentPort
property. Once a message is received, the process ID along with the task is logged, and then the task is passed to a function that applies appropriate methods to perform the task.
What are the key advantages of utilizing threads?
Threading is a powerful tool that can greatly impact a program’s performance, responsiveness, and overall efficiency. When utilized effectively, threads can significantly improve a program’s ability to meet user demands and deliver optimal results.
In Node.js, threading is a valuable feature for developers as it enables the splitting of a process into multiple independent execution streams. When used correctly, threading can enhance the speed, efficiency, and responsiveness of a program.
Some of the main benefits of using threads are:
- Improved performance: Threads allow for concurrent execution of multiple tasks, resulting in faster overall program execution compared to running tasks sequentially.
- Responsiveness: Threads can prevent compute-heavy tasks from blocking or delaying the execution of other operations, ensuring the program remains responsive to user input and other tasks.
- Resource sharing: Threads in Node.js can share resources, such as variables, allowing for concurrent processing and faster execution of the program.
- Ease of programming: Threading eliminates the limitations of single-threaded architecture in Node.js, making programming more efficient and scalable.
- Improved scalability: Threads are easily scalable, making it simpler to build high-performance and scalable Node.js applications that can handle increased load without difficulty.
Conclusion
The worker_threads module offers a straightforward approach to incorporating multi-threading support into our applications. By offloading intensive CPU computations to separate threads, we can greatly enhance our server’s throughput. The availability of official threads support is likely to attract more developers and engineers from fields such as AI, machine learning, and big data to start leveraging the power of Node.js in their projects.