Click-based EEG brain-computer interface enables long-term high-performance spelling scanning

Thank you for visiting nature.com. The browser version you are using has limited CSS support. For the best experience, we recommend using a newer browser (or disabling compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we will display the site without styles and JavaScript.
Brain-computer interfaces (BCIs) can help restore communication for people with mobility and/or speech impairments by providing neural control over computer typing applications. A click command detector provides basic but powerful functionality.
We aimed to test the performance and long-term stability of click decoding using a chronically implanted high-density electrocorticographic (ECoG) BCI covering the sensorimotor cortex of a participant in a human clinical trial with ALS (ClinicalTrials.gov, NCT03567213). We trained participants’ click detectors using a small amount of training data collected during the first 21 days of BCI use (less than 44 minutes over 4 days) and then tested them for 90 days without any retraining or refreshing.
Using the click detector to navigate a switch-and-scan spell check interface, participants were able to maintain an average writing speed of 10.2 characters per minute. Although the instantaneous decrease in signal power modulation breaks the use of a fixed model, the new click detector achieves comparable performance even when trained with less data (<15 min, over 1 day).
The obtained results show that click detectors can be trained using small ECoG datasets while maintaining reliable performance over long periods of time, enabling functional text communication for BCI users.
Amyotrophic lateral sclerosis (ALS) is a progressive neurological disease that causes muscle weakness and eventually paralysis. As a result, people with ALS can have difficulty communicating with family and caregivers. We investigated whether the brain signals of ALS patients could be used to control a spell check app. Specifically, when participants attempted to perform a grasping movement, a computer method detected an increase in brain signals from electrodes implanted on the surface of their brains, resulting in a mouse click. Participants typed sentences by clicking on letters or words in the spell check app. Our method was developed using 44 minutes of brain signals and worked reliably for three months without any retraining. This approach has the potential to be used to restore communication in other severely paralyzed people over a longer period of time, while requiring only a short training period.
Brain–computer interfaces (BCIs) enable individuals with a variety of motor impairments to control assistive devices using neural signals1,2,3,4,5,6,7,8,9,10. These features are determined based on the activity of individual neurons or population activity recorded using implantable microelectrode arrays (MEDs) or macroelectrodes (typically consisting of electrocorticography (ECoG) arrays on the cortical surface)11, as well as non-invasive recording methods such as electroencephalography (EEG). Although sophisticated functionality of MEA-based BCIs has been reported, signal loss12,13,14 may impact long-term performance, and daily signal instability may require frequent decoder recalibration15. However, encouraging progress has been made in continuous online recalibration (in the background and on a trial-run basis) after correcting text output using a language model16. On the other hand, EEG-based machine learning interfaces can efficiently perform single-command decoding, which has been used in various paradigms17. However, external EEG sensors are still often required to be applied and maintained by caregivers or technicians. On the other hand, due to the population-level stability of the cortex, ECoG-based BCI interfaces can provide powerful long-term and accessible functionality without the need for frequent model retraining1,18. However, the effectiveness of ECoG in long-term (>30 days) BCI implantation has only been tested in a small number of participants1,3,19.
Recent studies have demonstrated that ECoG-based BCI control is possible in patients with amyotrophic lateral sclerosis (ALS) by detecting brain clicks1,3,19 (event-related changes in spectral signals due to unique actions such as attempted arm movements). In a recent clinical trial, patients with ALS (or primary lateral sclerosis) were implanted with an electrode array mounted on an intravascular stent to detect such brain clicks3,19. These brain clicks were generated by attempted leg movements and were used to select a specific icon or letter after navigating to it on a computer screen using eye tracking (ET)19. As a result, participants achieved high spelling scores and were required to train with the Brain-Click brain-machine interface for 1–12 sessions before its long-term use. Despite these impressive results, when testing a system using only BCI (without ET), the accuracy was 97.4% with a detection latency of 2.5 seconds, while the accuracy dropped to approximately 82% with a detection latency of 0.9 seconds3. Thus, the effectiveness of spell checking using only BCI systems remains unclear.
The switch-scanning paradigm is a method of augmentative and alternative communication (AAC) that allows users to navigate and select icons or letters by clicking a mouse when the desired row or column is highlighted1,20,21,22,23,24,25,26. However, users are not required to use ET to control the cursor, as this can be tiring27,28,29 and may become ineffective as eye movements deteriorate in patients with ALS30,31,32,33. In an early clinical trial of chronic EcoG-MCI, a participant with ALS attempted to control a switch-scanning application by generating brain clicks through hand movements. These brain clicks were detected using a pair of electrodes on the surface of the hand region of the contralateral motor cortex1. Although participants had been using these brain clicks to communicate in their daily lives for over 3 years18, it took several months of data collection to optimize the parameters. The click accuracy (including correctly detected and stored clicks) is reported to be 87-91%, the latency is 1 second, and the maximum scanning speed is 0.5 scans per second.
The above-mentioned previous studies have shown that click detectors can be used in a variety of BCI applications and can significantly help users learn different communication modes. Despite these encouraging results, the potential performance limitations of such click detectors remain relatively underexplored. In particular, long-term high-performance use without model retraining is key to enabling independent use at home, as BCI users must have 24/7 access to a fully functional click detector with minimal caregiver intervention. By leveraging the stability of the ECoG signal, we were able to train the model on a limited dataset and test it for three months without the need for retraining or daily model adjustments. In particular, we demonstrate a switch-and-scan BCI interface that achieves significant improvement in spelling performance compared to previous switch-and-scan BCI works1.
This study was part of the CortiCom clinical trial (ClinicalTrials.gov identifier: NCT03567213), a preliminary Phase I feasibility study investigating the safety and preliminary efficacy of an implantable ECoG BCI. The current recruitment status is recruiting, with plans to recruit and implant a maximum of five participants. Due to the exploratory nature of this study and the limited number of participants, the primary outcome measures of the study are outlined and were designed to collect preliminary data on: (1) the safety of the implanted device, (2) the documented feasibility of the implanted device, and (3) the BCI functionality achieved with the implanted device using different strategies. Given the exploratory nature of these results and the limited number of participants, no statistical analysis methods or plans were pre-specified to evaluate these results. However, the results obtained for each participant were assessed with the highest statistical rigour to be consistent with comparable studies that were also limited to individual participants1,2,5,7,8,34. Results related to the first two main outcome variables (as they were obtained from only one participant) are presented in Results and Supplementary Note 1, respectively, but are necessarily preliminary. Results related to BCI functionality are also necessarily preliminary and exploratory (Supplementary Note 2), as discussed in the subsequent Methods and Results, but rigorous analysis and statistics were still applied. Secondary results from the CortiCom study are only partially presented here, as click detection was only one of multiple BCI control strategies investigated in the study; specifically, success and latency rates are presented as click detection accuracy and time from start of movement attempt to click. The study protocol can be found in the Supplementary File. Inclusion and exclusion criteria for the clinical trials can be found in the Supplementary Methods section 1. The study protocol was reviewed and approved by the Johns Hopkins University Institutional Review Board and the US Food and Drug Administration (FDA) under an investigational device exemption (IDE).
All results presented here are based on data from the first participant in the CortiCom study. After reviewing the nature of the study and the risks associated with implantation, participants provided written consent to participate in the study and were enrolled on July 5, 2022. Participants also provided written consent for clinical data and results related to the study to be published. The patient later provided written consent for audio and video recordings to be used in publishing the study results. The experimental team plans to meet with participants three times per week to collect data on training or BCI use. Experiments are planned to be conducted weekly in a Johns Hopkins lab setting and will be dependent on progress on specific tasks. To date, this participant has not experienced any serious or device-related adverse events, making the primary outcome of the CortiCom study a success. Participants have consented to continue the study. The device has now been implanted for over 2 years and continues to be used for research purposes.
The participant was a right-handed male who was 61 years old at the time of implantation in July 2022 and had been diagnosed with ALS approximately 8 years earlier. The participant suffered from severe dysphagia and progressive dysarthria due to bulbar dysfunction. This was accompanied by progressive shortness of breath. The participant could still produce discernible speech, but it was slow and with limited intelligibility. However, he did not rely heavily on assistive communication devices (Supplementary Note 3). He had developed progressive upper limb weakness to the point that he was unable to perform daily activities without assistance. He could partially close his fingers to attempt to grasp a cup, but did not have enough strength to hold it with one hand. He had good lower limb strength, allowing him to ambulate, although his arm span was impaired, causing occasional balance problems. The participant’s ability to perform daily activities was assessed using the ALSFRS-R35 and he scored 26 out of 48 points (Supplementary Note 4).
Before entering the study, participants underwent cognitive testing, which revealed no signs of dementia. During the monthly safety assessment, participants underwent brief cognitive testing. This revealed no significant decline in cognitive function since joining the study.
The CortiCom study device consisted of two 8 × 8 subdural ECoG grids manufactured by PMT Corporation (Chanhassen, MN) connected to a transcutaneous 128-channel Neuroport base manufactured by Blackrock Neurotech Corporation (Salt Lake City, UT). Final assembly and sterilization of the study equipment was performed by Blackrock Neurotech. Both subdural grids consisted of a soft silicone rubber sheet into which platinum-iridium disc electrodes (0.76 mm thick, 2 mm diameter on the exposed surface) were embedded, spaced 4 mm apart on center, with a total surface area of 12.11 cm2 (36.6 mm × 33.1 mm). The device consisted of two subdural control wires whose tips were uninsulated to match the recording surface area of the ECoG electrodes. Due to the small diameter of the pins (0.07 mm), they could not be detected on the postoperative CT scan. For all recordings using the research equipment, the Neuroport base was connected to a small (24.9 mm × 17.7 mm × 17.9 mm) external device (Neuroplex-E; Blackrock NeurotechCorp.) for amplification, digitization, and digital transmission of the signal via a micro HDMI cable to the Neuroport Biopotential System (Blackrock Neurotech Corp.) (Fig. 1a). During all recordings, signals were referenced to the same reference line; no additional references were made.
a Participants sat upright with their forearms on the armrests of the chair, in front of a computer monitor displaying the Switch Scan Speller application. b Layout of two 64-electrode grids superimposed on the left cortical surface of the virtual reconstruction of the participant’s brain. The dorsal and ventral grids primarily covered the cortical supra-limb and facial regions. Electrodes are numbered in ascending order from left to right and bottom to top. Purple: precentral gyrus; orange: postcentral gyrus. c ECoG voltage signals were transmitted in 100 ms bursts to update a 256 ms working buffer for online spectral preprocessing. Sample signals from 20 channels are shown. d Spectral power was calculated using a fast Fourier transform filter over a 256 ms buffer, from which the high gamma (HG, 110–170 Hz) log power was placed into a 1 s buffer (10 eigenvectors). e The current buffer is then used as the temporal history of a recurrent neural network (RNN). f The RNN-FC network (a fully connected RNN) then predicts rest or grasp every 100 ms based on the higher output probability. g Each classification result is stored as a vote in a 7-vote buffer such that the number of grasp votes must exceed a predefined threshold of votes (shown as a threshold of 4 votes) to trigger a click. A 1 s lock is applied immediately after each detected click to prevent repeated clicks during the same movement trial. A transparent hand image indicates a grasp trial before a click, otherwise a relaxed configuration. h When a click is detected, the spell checker switches to the selected line or item within that line. It takes two clicks to enter a letter or autocomplete a word. Example sentence: “The birch canoe slid along the smooth boards.”
Two electrode grids of the study device were surgically implanted subdurally over the left hemisphere sensorimotor cortical representations responsible for speech and upper limb movements. Implantation was performed via craniotomy under monitored anesthesia care, with local anesthesia and sedation tailored to the intraoperative task. No surgical complications or procedure-related adverse events were observed. Prior to implantation, the location of the target cortical representation was assessed using anatomical landmarks obtained with preoperative structural MRI, functional MRI (sequential finger tapping, tongue movement, and humming), intraoperative somatosensory evoked potentials, and intraoperative high-gamma responses to vibrotactile stimulation of individual digits. Following implantation, the position of the subdural mesh relative to the superficial tortuous anatomy was confirmed by co-registering postoperative high-resolution CT with preoperative high-resolution MRI using FreeSurfer36 (Fig. 1b).
At the beginning of each session, a 60-second calibration period was recorded during which participants were instructed to sit quietly, with their eyes open, and look at a computer monitor. For each channel, we then calculated the mean and standard deviation of the spectral-temporal log power at each frequency point. These statistics on the baseline cortical activity at rest were subsequently used to normalize power estimates during model training and BCI operation.
Training data were collected over four sessions (a total of six training blocks) over 16 days (Figure 2a). We define day 0 as the last class of training data collection. Participants were instructed to attempt a brief grasp of each block with their right hand (i.e., contralateral to the implanted array) in response to a visual cue (Supplementary Figure 1). Due to the participant’s severe upper limb impairment, the movements attempted primarily involved flexion of the middle and ring fingers. After each attempt, participants released their grip and passively returned their hands to a resting position in which their wrists hung over the ends of the chair armrests.
a The training data were collected over 4 sessions spanning 16 days. For each day, each substring represents a separate block of training data collection (6 training blocks in total). b Using a fixed detector, a switch scan block (purple) is performed using the communication board at +21 days after the training data collection. From day +46 to +81, the fixed detector was used for switch spell checking with a 7-vote threshold (blue). From day +81 to +111, the fixed detector was used for switch spell checking with a 4-vote threshold (light blue). For each day, each substring represents a separate spelling block consisting of 3-4 sentences. The horizontal axis spanning (a) and (b) represents the number of days relative to the last day of training data collection (day 0).
Each trial of the training task consisted of a 100-ms Go stimulus (hereafter referred to as the Go cue), prompting participants to attempt to grasp the object, followed by an interstimulus interval (ISI) during which participants remained motionless and fixed their gaze on a crosshair in the center of the display. Previous experiments using longer cues resulted in more variable response latencies and durations. The length of each ISI was randomly chosen to vary uniformly between a lower and upper limit (Supplementary Table 1) to reduce expected behavior. The experimental parameters for all training sessions are shown in Supplementary Table 1. In total, approximately 44 min of data (480 trials) were collected to train the model.
All data collection and testing were performed in a laboratory setting. The Neuroport system records neural signals at a sampling rate of 1 kHz. The BCI200037 was used to present stimuli during training blocks, and the click detector was used to store the training data and the online use of the BCI for offline analysis37. Video of the participant’s right hand (with obvious attempts at grasping movements) and the monitor showing the spelling checker application were recorded at 30 frames per second (FPS) during all spelling sessions except the last two (60 FPS). A synchronized 150 ms audio signal was played at the beginning of each spelling block (see Online Switch Scanning) so that the audio recorded by the analog input of the Neuroport biopotential system could be used offline to synchronize video frames with the neural data. Click detection timestamps were recorded by the BCI2000 and synchronized with the neural data. A pose estimation algorithm38 was applied offline to video recordings of participants’ hands to determine the horizontal and vertical positions of 21 hand and finger landmarks in each video frame. The horizontal coordinates of the metacarpophalangeal joint (MCP) landmarks of the first and fifth fingers were used to normalize the horizontal positions of all landmarks, while the MCP and fingertip coordinates of the same finger were used to normalize the vertical positions.
For each of the 128 recorded channels, we used a fast Fourier transform (FFT) filter to calculate the spectral power of a 256 ms window shifted by 100 ms steps. The spectral power in each frequency bin was log transformed and normalized to the corresponding calibration statistics. We summed the spectral power over the frequency band from 110 to 170 Hz to calculate the high gamma (HG) power. 110 Hz was chosen as the lower limit of this HG frequency range because, after training, low-frequency activity sometimes reached or even slightly exceeded 100 Hz in several channels (Supplementary Fig. 2). For each 100 ms increment, a 128-channel feature vector is generated for subsequent model training. We chose this frequency range because the time scale of HG modulation during grasp trials is small and because event-related synchronization (ERS) occurs immediately after event-related desynchronization (ERD), so we decided to exclude features in the lower frequency range (Supplementary Fig. 2). This pattern of low-frequency ERD followed by ERS occurs over a longer time period than HG activity, making it difficult to clearly identify the onset and offset times of trial-averaged neural activity and thus assign rest and grasp labels for model training (see Label assignment). Similarly, some channels contained low-frequency ERDs before the cue, which may be due to anticipatory activity (Supplementary Fig. 2). This may lead to large differences in the feature space of samples labeled as “rest”.
We assigned rest and grasp labels to each exemplar in the training dataset using the following steps. First, for each channel, we concatenated the HG power segments across trials, where the onset of each trial segment relative to the visual Go cue varied from -1 s to 2.5 s (Supplementary Fig. 3, cue alignment). To account for trial-to-trial differences in participants’ response latencies to the visual Go cue, we temporally realigned the HG power across trial segments using a bias-distortion model39 (Supplementary Fig. 3, realignment). The model was trained only on a subset of highly modulated channels (qualitatively defined; Supplementary Fig. 3 caption) to reduce the potential influence of artifactual patterns in low-modulated channels when realigning trial segments. Note that for each trial, the final temporal alignment was applied equally to all 128 channels. This rescaling resulted in an overall increase in HG power correlations across trials (Supplementary Figures 4, 5). We then calculated trial-averaged HG power traces using only the retuned trial segments of these highly modulated channels and visually determined the modulation onset and offset relative to the onset of the visual Go cue (Supplementary Figure 6). The onset and offset times were estimated to be 0.3 s and 1.1 s, respectively, relative to the onset of the Go cue. We therefore assigned grasp labels to the ECoG feature vectors for each trial that ranged from 0.3 s + tshift to 1.1 s + tshift (including 0.3 s + tshift) relative to the Go cue. Note that it is necessary to include the tshift term in the grasp label assignment boundaries to account for the shift applied to each trial. At all other time points, the rest label is applied to the feature vector. We adopted this labeling strategy because it is based only on visual inspection of neural signals, simulating the absence of real data when BCI users with locked-in syndrome (LIS) are waiting for action attempts.
We use a recurrent neural network (RNN) to classify immobility and grasping. Since this study is part of a larger clinical trial, we aim to structure the model training process in such a way that in the future we can train more complex models for tasks in which the time domain makes a significant contribution to the decoder performance. Additionally, our goal was to enable participants to use a high-performance AI interface as quickly as possible, and to this end, we expected that the nonlinear classifier would achieve higher performance than the linear model due to its advantage in identifying temporal patterns of neural activity.
We design a many-to-one RNN to learn the temporal dynamics of HG power for 1-second sequences, where each sequence is connected only to the label at the leading edge of the sequence. Each 128-channel HG power vector is input into a long short-term memory (LSTM) layer with 25 hidden units to model sequence dependence. From there, 2 consecutive fully connected (FC) layers with 10 and 2 hidden units respectively determine the probability of the residual or capture class (Supplementary Fig. 7). The former uses the eLU activation function, while the latter uses softmax to output normalized probability values. In total, the architecture consists of 17,932 trainable parameters.
Since 800 ms of data from each trial were labeled as grasp (see label assignment) and the rest of the trial (tremainder = tmin ISI + tGo – 800 ms ≥ 2300 ms) was labeled as rest, the rest class was overrepresented and hence randomly down-regulated so that the classification model was trained on a balanced dataset of rest and grasp trials. Note that the residual is at least 2300 ms since the minimum ISI is 3 s and the Go visual cue duration is 0.1 s.
We determined the hyperparameters of the classification model by evaluating the accuracy offline using 10-fold cross-validation and the data collected for training (see Cross-validation). For each cross-validation model, we limited training to 75 epochs, during which the classification accuracy of the validation set stabilized. We used categorical cross-entropy to calculate the error between the true and predicted labels for each batch of 45 samples and updated the weights using adaptive moment optimization (Adam optimizer)40. To prevent overfitting of the training data, we used 30% weight drop in the LSTM and FC layers. All weights are initialized according to the He normal distribution41. The model is implemented in Python 3.8 using Keras and the TensorFlow backend (v2.8.0).
We split the training data into 10 parts, each containing the same number of stationary and captured samples of the HG power feature vectors (stationary samples are randomly reduced in size to match the number of captured samples). To minimize the leakage of time-dependent data into the validation folds, all samples in a fold are consecutive, and each sample belongs to only one fold. Each fold is used for validation once, and the corresponding cross-validation model is trained on the remaining 9 folds.
We used the Python-based message passing framework ezmsg (https://github.com/iscoe/ezmsg)42 to create a directed acyclic graph of processing units in which all pre-processing, classification and post-processing steps were separated.
Neural data are transferred from the BCI2000 at 100 ms intervals over a ZeroMQ connection to an online pipeline hosted on a separate machine dedicated to online inference. The incoming data updated a running 256 ms buffer, from which a 128-channel HG power feature vector was then computed as described above (Figure 1c, d). This feature vector is stored in a working buffer of 10 feature vectors (Figure 1e), corresponding to 1 second of the LSTM input feature history (Figure 1f).
The FC layer generates a still or grasp classification every 100 ms, which is then fed into a working classification buffer, which is updated alternately with each new classification. This buffer is a voting window containing a predefined number of categories (10 and 7 for the medical communication board and spelling interface, respectively), where a given number of categories (voting threshold) must be reached to trigger a click (Figure 1g). This voting window and threshold are used to prevent sporadic grasp classifications from being interpreted as click intent. A click activated participants to select the desired row or column in the radio button scanning application (Figure 1h).
The switch-scanning application is an augmentative and alternative communication (AAC) technology that enables users with severe motor or cognitive impairments to navigate and select icons or letters by clicking on the desired row or column during a period in which the row or column is sequentially highlighted.20,21,22,23,24,25,26 Participants clicked while attempting to perform brief grasping movements described in the training task.
As an initial assessment of the click detector sensitivity and false alarm detection, we first asked participants to navigate to a key with a graphic symbol on the medical communication board and select a key (Figure 3a, Supplementary Figure 8). The graphic symbols are from https://communicationboard.io/43 and https://arasaac.org/44. We used a 10-vote voting window and a 10-vote threshold (all 10 categories in the current voting window had to be mastered to trigger a click), and set the row and column scan rate to 0.67 per second. Finally, after clicking on a row or a button within a row, we apply a 1-second lockout period during which no further clicks can be made (Figure 1g). This prevents repeated hits on the same scan attempt.
Participants were instructed to either select a graphic button presented by the experimenter (a) or spell a cue sentence (light gray text) by pressing the corresponding highlighted row or column at a specified time during a switch scanning cycle (b). For detailed descriptions of (a) and (b), see Supplementary Figures 8 and 9, respectively.
We then developed a switch-and-scan spell check application that asked participants to spell sentences letter by letter (Supplementary Figure 9). The buttons in the spell check interface are arranged in a grid that includes a central keyboard and letter and word autocomplete features. The letter and word autocomplete features are generated by the distilBERT45 language model hosted on a separate server, and the output is available through an API. The distilBERT model was chosen as the larger language model due to its higher output speed. We added three preselected rows at the beginning of each switch scan cycle and one preselected column at the beginning of the column scan cycle. This gives participants a little preparation time if they want to select the first row or the first column in the selected row. We chose to use a 7-vote voting window with a 7-vote threshold, which reduced the latency from scan attempt to click (see Click Latency) compared to the medical communication board, which used a 10-vote threshold. However, after several spelling exercises and participant feedback, on day 81 we lowered the voting threshold to 4 votes (any 4/7 categories in the voting window had to be mastered to trigger a button press). This is because the participant reported that he preferred the increased sensitivity despite the potential increase in false positives. We again use a 1-second lockout period.
Participants were instructed to navigate to one of the keys verbally named by the experimenter using the communication board and make a choice. If the participant selected an incorrect row, the suggested keypress was changed to that row. After selecting a key, the switch scanning cycle was resumed (Supplementary Movie 1, Figure 3a, Supplementary Figure 8). On day 21 after completing training data collection, we recorded the communication board session (Figure 2b).
To test online spelling skills using the click-fixation detector, participants were asked to type sentences using a switch-scan speller application. These sentences were taken from the Harvard Sentence Corpus46 and were displayed in light gray text at the top of the spell checker interface. If a participant accidentally pressed a wrong key, resulting in an incorrect letter or auto-completed word, the corresponding output text was highlighted in red. The participant then had to delete it using the DEL or A-DEL (auto-delete) key, respectively. After participants completed a sentence, they moved to the next sentence by pressing the ENTER key (Supplementary Movie 2, Figure 3b, Supplementary Figure 9). One spelling block consisted of 3–4 sentences, and participants completed 1–5 spelling blocks per session (Figure 2b). We recorded the blocks using the switch-scan speller application over 17 sessions.
Among them, in a session, \({N}_{{correct\; clicks}}\) is the total number of correct clicks, \({N}_{{attempts\; crawls}}\) is the total number of crawl attempts, and \({N}_{{correct\; clicks}}\le \,{N}_{{attempts\; crawls}}\). For a detected click to be correct (i.e., true positive), it had to appear on the user interface (as visual feedback to the participant) within 1.5 seconds from the start of the capture attempt. Scan attempts during which no clicks were performed during this period were considered false negatives. Positives occurring outside this time period are considered not related to any crawl attempt and are therefore treated as false positives. True positive and false positive rates (TPF and FPF, respectively) were measured per unit of time and defined for each session as follows:
where \({N}_{{TP}}\) and \({N}_{{FP}}\) are the number of true positives and false positives per session, respectively, and \(T\) is the total writing time for that session. Whether participants pressed the correct or incorrect key did not matter for sensitivity, TPF, or FPF, since these measures depend only on whether the click actually occurred after a grasp attempt.
The start and end of the movement are determined based on normalized reference trajectories calculated from the hand position. In particular, only finger references that make large movements during the grasp attempt are taken into account. The start and end times of the movement are then visually estimated for each grasp attempt.
For each correctly detected click, we calculated (a) the time interval between the onset of the movement and its detection by the algorithm, and (b) the time interval between the onset of the movement and the appearance of the click in the spell checker application’s user interface. The latency detected by the algorithm consisted mainly of the time required to reach the voting threshold (i.e., a threshold of 4 votes resulted in a latency of at least 400 ms for four consecutive capture classifications). The latency of the click appearing on the screen in the spell checker interface depends on the algorithm’s detection latency, as well as the additional network and computational overhead required to display the click. This additional overhead amounts to approximately 200 milliseconds (see Switch Scan Performance).
Typing speed is measured in correct characters per minute (CCPM) and correct words per minute (CWPM). If the written characters and words exactly match their positions in the prompted sentence, they are correct. For example, if a participant wrote a 30-character (5-word) sentence and 1 character was misspelled, only 29 characters (4 words) were counted in CCPM (CWPM). Spelling error rate is measured in incorrect characters per minute (WCPM). Participants were instructed to correct any errors before continuing to type the rest of the sentence. Note that all spellings were performed using the language model’s autocomplete feature, and all subsequent writing performance analyses were based on this auxiliary feature.
We study the relationship between the number of trials used to train a classification model and the modeled performance (sensitivity and FPF) of a click detector. This was done to determine whether results similar to those of an online spell checker can be achieved using a click detector model trained on fewer trials. To this end, we trained classification models using varying numbers of training trials and tested them offline on data collected from online spelling sessions. Using the training procedure described above, we trained six models, each trained using an additional portion of the data compared to the previous model (Supplementary Table 1). Thus, the six models were trained on data containing 50, 100, 150, 300, 390, or 480 trials (3.77, 7.56, 11.34, 25.43, 34.68, or 43.92 minutes, respectively). Note that the hit detector model for the online spell checker is trained on the same 480 trials (i.e., the entire original training dataset). The models were tested on data collected from each block of the online spell checker during which the click detector was operated with a 4-vote threshold. The model was not tested on spelling blocks using a 7-vote threshold because most of these sessions (days 46–56) did not have audio synchronization signals to align the neural data recorded by the Neuroport system (and the click detections obtained from the simulation analysis) with the recorded video frames. Therefore, it is not possible to accurately determine when the simulated click detection occurred relative to the start time of the scanning trial. Sensitivity and FPF were calculated as described in the Sensitivity and Click Rate section. We then calculated the median sensitivity and FPF across sessions for each specific number of training trials.
To assess whether the spelling task itself could serve as a way to collect additional training data, we trained an updated classification model using data from spelling chunks from previous sessions. We then simulated the performance of these models on spelling blocks from subsequent sessions. For example, the simulation performance of a click detector trained on data from all spelling blocks recorded since day d is evaluated across all spelling blocks since day d. We trained each updated model using essentially the same procedure as the original fixed model, with two differences. First, we determined trial-averaged traces of the HG power of the rearranged groups relative to the onset of the movement trial, rather than the onset and offset of the Go signal, which was absent during the spelling blocks. Second, because two grasp attempts could occur within a very short time of each other (e.g., clicking on a row and then clicking on the first column), we excluded from training all grasp attempts that occurred less than 3 s after the previous grasp attempt (minimum ISI jitter, Supplementary Table 1). As described in the Sensitivity and Click Rate section, sensitivity using the original fixation detector was calculated by counting the number of clicks that occurred within 1.5 s after a grasp attempt (occurring as visual feedback to the participant). Because the offline analysis simulated algorithmic detections (rather than screen clicks), we offset all click detections by 200 ms to account for the constant delay between algorithmic detections and screen clicks described above. The TPF and FPF calculations are described above. Again, we only use data from the online spell check sessions, during which the click detector was thresholded at 4 votes. The model with the updated seed was trained on the third block of the online spelling session on day +81 (Figure 2b).
Using a subset of samples labeled as grasps in the training data, we computed the importance of each channel for generating grasp classifications based on the model architecture. Specifically, we computed the integrated gradients of the 10 cross-validated classification models (see Cross-validation) with respect to the input features for each sample labeled as “grasp” in the corresponding validation convolution. This produces an attribution map for each sample,47 from which we compute the L2 norm of all 10 historical temporal feature vectors2, resulting in a 1×128 saliency vector. Due to the random initialization of weights in the RNN-FC network, models trained using features from the same set of convolutions are not guaranteed to converge to the final set of weights. Therefore, we retrained the 10 sets of cross-validation models 20 times and recomputed the saliency vector for each sample in a similar manner. The final saliency map is calculated by averaging the attribution maps of all replicates and normalizing the resulting mean to a value between 0 and 1. We repeated this procedure using the HG signatures of all channels except channel 112, which is located over the sensory cortex and showed relatively high activation compared to other channels during movement trials. We then repeated this procedure using the HG signatures from a subset of 12 electrodes over the cortical knob (anatomically identified as channels 92, 93, 94, 100, 101, 102, 108, 109, 110, 116, 117, 118; Fig. 5e, Supplementary Figs. 2, 3). Neither of these two model architectures has been deployed for online BCI.
To determine whether models trained using HG features from these smaller channel subsets can maintain robust clickthrough performance, we calculated offline classification accuracies using 10-fold cross-validation on the training data (see Cross-validation). We repeated cross-validation to obtain a set of 20 accuracy values for each of the 10 validation folds. We then took the average of these 20 values to obtain the final accuracy for each fold. For each channel subset, confusion matrices and accuracy values were generated using the true and predicted labels across all validation folds and all replications. We compare these results with the results obtained using features from all channels.
Finally, we computed the confusion matrix and classification accuracies for all spelling blocks, where the click detector was run with the original fixed model (trained on features from all channels) at a 7-vote threshold and a 4-vote threshold. As described in the Model Updating Using Previous Spelling Blocks section, a true grasp label was assigned to each trial within the range of the onset and offset of the reconstructed trial-averaged HG power trace relative to the onset of the attempted movement. Similarly, data corresponding to grasp attempts that occurred less than 3 seconds after the previous grasp attempt were excluded from labeling. All other samples were labeled as stationary. An equal number of stationary and grasped samples were used to compute the confusion matrix and the corresponding accuracies.
To assess participants’ experience using the Switch Scan Speller app, we asked them to complete the NASA Task Load Index (NASA-TLX)48,49 questionnaire, a widely used set of questions that assesses participants’ mental, physical, and time demands on a task, as well as perceived task performance, effort, and frustration, using the NASA-TLX iOS app. Scores for these categories range from 0 to 100, with lower and higher scores corresponding to fewer and more of the six characteristics mentioned above, respectively.
Spelling blocks with a given voting threshold were collected over a maximum of nine sessions. Given this small sample size, we cannot assume that the sample means are normally distributed for any of the performance measures (sensitivity, TPF, FPF, latency, CCPM, WCPM, and CWPM). Therefore, we calculated 95% confidence intervals (95% CI) of the means using 10,000 replicate bootstrapped analyses (bias-corrected and accelerated) for each performance measure sample. In addition, we used the nonparametric two-tailed Wilcoxon rank sum test to determine whether there were significant differences between spelling block performance measures when different voting thresholds were applied. P values < .05 were considered significant. Similarly, we used a two-tailed Wilcoxon rank sum test to determine whether there were significant differences in offline classification accuracy across different channel configurations based on model training and validation, as well as all simulated performance metrics. We also used the Holm-Bonferroni correction to adjust for multiple comparisons.

Post time: Mar-14-2025