Arya · Harrison · Palli Lab  ·  University of Kentucky

Single-Cell RNA-seq
in Insects:
Unlocking
Cellular Diversity

How scRNA-seq enables cell-type discovery in non-model insects—from the fall armyworm midgut to the principles driving a new era of insect genomics.

Surjeet Kumar Arya · Douglas A. Harrison · Subba Reddy Palli
Dept. of Entomology, University of Kentucky, Lexington KY

scRNA-seq Spodoptera frugiperda Insect Genomics Pest Management 10x Genomics Cell-Type Atlas
Introduction

One Pest. Millions of Cells. Countless Secrets.

The fall armyworm, Spodoptera frugiperda, devastates crops across sub-Saharan Africa, South Asia, and the Americas, causing annual yield losses exceeding 400 million US dollars. Its midgut—the primary site of pesticide exposure, nutrient absorption, and Bacillus thuringiensis (Bt) toxin action—has long been studied as a bulk tissue. But bulk studies average across wildly different cell types, masking the true molecular complexity driving digestion, detoxification, and insecticide resistance.

Single-cell RNA sequencing (scRNA-seq) changes this completely. By capturing the transcriptome of each individual cell, it creates a molecular identity card for every cell in a tissue. Recent pioneering work from the Palli Lab at the University of Kentucky has applied this technology directly to the FAW midgut—both from native larval tissue and from an established midgut cell line—generating the first single-cell transcriptomic atlases of a lepidopteran pest midgut.

18,794
Cells sequenced
(SfMG-0617 cell line)
~13,000
Cells sequenced
(FAW midgut tissue)
12
Cell-type clusters
identified in larval midgut
"Understanding cellular heterogeneity in non-model insects is essential for unraveling the complexities of their biology — with practical applications in disease control, agriculture, and biotechnology."
🔬
Foundations

What Is Single-Cell RNA Sequencing?

In conventional bulk RNA-seq, you grind up thousands of cells, extract RNA from the homogenate, and sequence it — getting a population-average transcriptome. Valuable, but deaf to cellular heterogeneity. The Arya et al. studies demonstrate exactly why this matters: even a cell line assumed to be homogeneous after 50+ passages harbors ten distinct transcriptional states.

scRNA-seq isolates individual cells and attaches a unique DNA barcode to each cell's mRNA before amplification and sequencing. Every read can be traced to its cell of origin, generating a cells × genes count matrix — in the FAW studies, matrices of ~13,000–18,000 cells × ~15,000 genes.

FeatureBulk RNA-seqscRNA-seq
ResolutionPopulation averageSingle-cell
Cell-type detectionInferred by deconvolutionDirect — from clustering
Rare cellsMaskedDetectable (e.g. EE at 3%)
Trajectory/pseudotimeNot possibleYes — Monocle 3.0
Insecticide target mappingTissue-level onlyCell-type resolution
Cost per sample~$100–300~$500–2,000
Applied by Arya et al.10x Chromium + Seurat + Monocle
🐛
Motivation

Why Apply scRNA-seq to Insects?

The Arya et al. studies illustrate three compelling reasons to apply scRNA-seq in insects, especially pest species with no prior single-cell data:

1. Cell-Type Discovery Without a Reference

For non-model insects like FAW, there are no validated antibodies for cell sorting and no single-cell atlases to reference. Arya et al. solved this by leveraging Drosophila melanogaster and Aedes aegypti marker gene homologs as starting anchors for cluster annotation, then extended the analysis to identify novel marker genes unique to FAW — including goblet cell markers (mucin-3A, peritrophin-1) with no equivalent in the fly.

2. Understanding Insecticide Resistance at the Cellular Level

The FAW has developed resistance to nearly every class of insecticide deployed against it, including Bt toxins used in transgenic maize. The Arya et al. studies reveal that the primary resistance machinery — cytochrome P450s, GSTs, ABC transporters — is concentrated in enterocytes and EC-like cells, not evenly distributed. This spatial specificity has direct implications for designing resistance-breaking insecticides that exploit cell-type vulnerabilities.

3. Validating Cell Lines as Research Models

Over 1,270 lepidopteran cell lines are used globally in bioassays and recombinant protein production. Paper 1 showed that these lines retain unexpected heterogeneity that impacts experimental interpretation — particularly for Bt bioassays, where the receptor expression profile depends entirely on which cell type you are testing. scRNA-seq is now a necessary QC tool for insect cell line characterization.

4. Lineage Biology in Pests

The midgut is the insect's primary interface with its food plant and with ingested pesticides. Understanding how intestinal stem cells maintain the epithelium, how they respond to damage, and how they give rise to specialized secretory and absorptive cells is fundamental to understanding gut homeostasis, host-plant adaptation, and pathogen susceptibility.

⚗️
Step-by-Step Protocol

The scRNA-seq Workflow for Insects

The two Arya et al. studies collectively describe two distinct but complementary preparation strategies. Below is the consolidated workflow drawing from both protocols.

01

Tissue Dissection / Cell Harvesting

Cell line (Paper 1): SfMG-0617 cells maintained in TNM-FH + 10% FBS, harvested at ~75–90% confluency, resuspended in 0.5% BSA-PBS, filtered through 40 µm strainer, checked with trypan blue.
Tissue (Paper 2): Day 1 sixth-instar larvae; alimentary canal pulled and midgut isolated in cold HBSS + 1% BSA (5 larvae/well). Midgut minced with scissors/forceps and pooled from 15 midguts per replicate.

02

Enzymatic Dissociation

Minced tissue incubated with 3 mg/mL collagenase + 2 mg/mL elastase at 27°C, 60 min on rotary shaker (50–80 rpm). Cells triturated by pipetting ~20–30× with wide-bore tip. Filtered through 100 µm then 40 µm strainers. Cell viability assessed with acridine orange / ethidium homodimer — consistently 85–90%.

03

OptiPrep™ Density Gradient (Paper 2 Innovation)

Cell suspension layered onto 2 mL of 60% iodixanol solution; centrifuged at 600×g, 25 min, 4°C. Top + middle layer (viable cells) collected, diluted in HBSS, washed with RNase inhibitors. This step dramatically removes debris and dead cell fragments — critical for insect midgut tissue which contains dense gut contents.

04

10x Genomics Chromium Library Preparation

Cells loaded into 10x Chromium Controller targeting ~10,000 cells per sample. GEM (Gel Bead-in-Emulsion) partitioning barcodes each cell. 3' Gene Expression Kit v3.1 used for cDNA synthesis, amplification, and library construction. Quality assessed by Agilent Bioanalyzer.

05

Sequencing

Illumina NovaSeq 6000. Paper 1: >350M reads per replicate, >36–42K read pairs/cell. Paper 2: 519M and 214M reads across two replicates; 71K and 35K reads/cell respectively. Mapping rates: 90–96% to FAW genome.

06

Cell Ranger Alignment & UMI Counting

Cell Ranger v6.1.1 used for demultiplexing, alignment to FAW reference genome, barcode/UMI filtering, and production of cell-by-gene count matrices. Both studies achieved >97% valid barcodes and 100% valid UMIs.

07

Seurat Analysis Pipeline

Count matrices imported into Seurat. QC filtering by nFeature_RNA, nCount_RNA, and percent.mt. 2,000 variable features selected by VST. PCA → top 20 PCs → Louvain–Jaccard clustering (resolution 0.5) → UMAP visualization. DoubletFinder removes multiplets. Paper 1 used Harmony for cross-replicate batch correction.

08

Cluster Annotation & Marker Discovery

FindAllMarkers (negative binomial distribution, min fold enrichment 0.5, Bonferroni-adjusted p < 0.05). Marker genes compared to known Drosophila/Aedes midgut markers. KOBAS 3.0 + clusterProfiler for KEGG/GO enrichment. Cell types validated by promoter-RFP reporter constructs and FACS.

💻
Computational Analysis

Reproducing the Arya et al. Analysis

The following code mirrors the exact pipeline used in the two FAW studies, adapted for clarity. Both papers used Seurat (R), Monocle 3.0 for trajectories, and DoubletFinder for doublet removal.

Step 1: Quality Control (as in Paper 1 & 2)

# ── Arya et al. 2024 — scRNA-seq pipeline (SfMG-0617 / FAW midgut) ── library(Seurat); library(dplyr); library(harmony) # Load Cell Ranger output counts <- Read10X("./cellranger_output/filtered_feature_bc_matrix/") faw <- CreateSeuratObject(counts = counts, project = "FAW_midgut", min.cells = 3, min.features = 200) # Mitochondrial genes in S. frugiperda — check your annotation faw[["percent.mt"]] <- PercentageFeatureSet(faw, pattern = "^mt-|^MT-|^ND") # QC violin plot (Paper 1: nFeature 200–5000, percent.mt <20) VlnPlot(faw, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol=3) # Filter (Paper 2 threshold: see Table S1) faw <- subset(faw, subset = nFeature_RNA > 200 & nFeature_RNA < 5000 & percent.mt < 20)

Step 2: Normalization, Scaling & Variable Features

# LogNormalize (scale factor 10,000 — both papers) faw <- NormalizeData(faw, normalization.method = "LogNormalize", scale.factor = 10000) # 2,000 variable features using VST — both papers faw <- FindVariableFeatures(faw, selection.method = "vst", nfeatures = 2000) # Scale + regress out mitochondrial reads (Paper 1) faw <- ScaleData(faw, vars.to.regress = "percent.mt")

Step 3: PCA, Clustering & UMAP

# PCA — top 20 PCs selected by JackStraw / ElbowPlot (both papers) faw <- RunPCA(faw); ElbowPlot(faw, ndims=40) # DoubletFinder — critical for insect data (both papers used v2.0.4) library(DoubletFinder) sweep.res <- paramSweep(faw, PCs = 1:20) faw <- doubletFinder(faw, PCs = 1:20, nExp = round(0.06 * ncol(faw))) faw <- faw[, faw$DF.classifications == "Singlet"] # Batch correction with Harmony (Paper 1) faw <- faw %>% RunHarmony("orig.ident") # Louvain-Jaccard clustering (resolution 0.5 — both papers) faw <- FindNeighbors(faw, dims = 1:20) faw <- FindClusters(faw, resolution = 0.5) faw <- RunUMAP(faw, dims = 1:20) DimPlot(faw, label = TRUE) + ggtitle("FAW Midgut — 12 Clusters")
▸ Interactive UMAP — FAW Midgut Cell Types (hover to explore)

Step 4: Marker Identification (FindAllMarkers)

# Both papers: negative binomial, min.pct=0.25, logfc.threshold=0.25 # Final markers require min fold enrichment 0.5 + Bonferroni p<0.05 markers <- FindAllMarkers(faw, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25, test.use = "negbinom" # negative binomial — as used in both papers ) # Top 5 per cluster for heatmap top5 <- markers %>% group_by(cluster) %>% top_n(5, avg_log2FC) DoHeatmap(faw, features = top5$gene) # Annotate with FAW-specific names (Paper 2, Table 2) faw <- RenameIdents(faw, "0" = "EB1", "1" = "EB2", # 45% enteroblasts "2" = "EC1", "3" = "EC2", "6" = "EC3", "4" = "EC-like1", "5" = "EC-like2", "9" = "EC-like3", "11" = "EC-like4", "8" = "EE", "7" = "SC", "10" = "GC" )

Step 5: Pseudotime Trajectories (Monocle 3.0)

# Monocle 3.0 — used in both papers for lineage reconstruction library(monocle3); library(SeuratWrappers) # Convert Seurat → CellDataSet cds <- as.cell_data_set(faw) cds <- cluster_cells(cds, resolution = 1e-3) # Learn principal graph cds <- learn_graph(cds) # Root at stem cells — Paper 1: SC cluster; Paper 2: Notch+ SC cluster cds <- order_cells(cds) # interactive; select SC as root # Paper 1 trajectories: SC→EB, EB→EE, EB→EC-like # Paper 2 trajectories: SC→EB1/2→EC-like2→{EC2/EC1/GC | EC-like3/1/4} plot_cells(cds, color_cells_by = "pseudotime", label_cell_groups = FALSE, label_leaves = TRUE) # Identify genes changing along pseudotime (generalized additive models) # Both papers used graph_test: p<0.05 for significant pseudotime genes pr_test <- graph_test(cds, neighbor_graph = "principal_graph", cores=4)
⚠️
Pitfalls & Solutions

Challenges Encountered & Solved

1. Insect Tissue Dissociation

The larval FAW midgut contains a tough peritrophic matrix, digestive enzymes, and gut contents that degrade RNA rapidly. Arya et al. solved this with cold HBSS dissection, rapid mincing, and a precisely timed enzymatic incubation (60 min at 27°C — the insect physiological temperature, not 37°C). The OptiPrep™ density gradient step was the key innovation that boosted viability to 85–90%.

2. Non-Model Genome Annotation

Marker gene annotation was anchored to Drosophila melanogaster and Aedes aegypti published midgut atlases, but many FAW genes lack functional annotation. The solution: use the LOC gene identifiers from the FAW genome annotation directly, cross-reference with KOBAS 3.0 KEGG pathway enrichment, and validate functionally with promoter-reporter constructs in live cells.

3. Doublet Detection in Heterogeneous Tissues

Both papers used DoubletFinder (v2.0.4) with carefully tuned pN, pK, and nExp parameters. Given the wide variation in cell size across midgut cell types (small stem cells vs. large enterocytes), doublet rates can be elevated. Post-filtering statistics are reported in Table S1 of each paper.

🌿 Practical Note

For FAW larval tissue, the authors recommend: dissect <30 min from CO₂ anesthesia; keep all solutions ice-cold; include RNase inhibitors (40 U/mL) at every wash step; use the iodixanol gradient before proceeding to 10x loading. For cell lines: passage 3 days prior, harvest at 75–90% confluency — not at full confluency where cells begin to die.

4. Batch Effects Between Replicates

Paper 1 employed Harmony (v1.2.0) for batch correction between the two SfMG-0617 replicates, demonstrating high reproducibility — the same 10 cell types were recovered in both replicates independently. Paper 2 used Seurat v3 integration (FindIntegrationAnchors + IntegrateData) to merge tissue replicates.

5. Cell Line Heterogeneity as a Source of Variability

Perhaps the most important message from Paper 1: researchers using SfMG-0617 (or any insect cell line) for bioassays must be aware that their "uniform" culture contains multiple cell types with radically different insecticide receptor profiles. A Bt bioassay on a culture that is 7% stem cells and 16% enteroblasts will give different results than one that is 40% stem cells. The paper provides the tools to sort and enrich specific populations.

⚠️ Critical Pitfall

Never interpret insect cell line bioassay data without first characterizing the cellular composition of your culture. As Arya et al. (2024, Genomics) demonstrated, even after >50 passages the SfMG-0617 line retains 10 cell types. Assuming homogeneity introduces systematic bias into dose-response calculations.

🧰
Resources & Toolbox

Tools Used & Recommended

ToolLanguageRole in Arya et al.
10x Genomics Cell Ranger v6.1.1CLIPrimary alignment, barcode/UMI counting, count matrix generation — both papers
Seurat v5.0.3 / v3.2.1RQC, normalization, PCA, clustering, UMAP, marker identification — both papers
Harmony v1.2.0RBatch correction between replicates — Paper 1
DoubletFinder v2.0.4RDoublet detection and removal — both papers
Monocle 3.0RPseudotime trajectory analysis and lineage reconstruction — both papers
SeuratWrappersRMonocle3 integration / data processing for trajectory analysis — Paper 2
KOBAS 3.0Web/CLIKEGG pathway enrichment analysis (gene ontology annotation) — both papers
clusterProfiler v3/4RGO enrichment visualization, redundancy reduction — both papers
SeabornPythonHeatmap generation for KEGG/GO enrichment scores — Paper 1
Custom Shiny AppR/ShinyUser-friendly scRNA-seq pipeline (Seurat + Monocle3 + DoubletFinder) — Paper 2; available at iucrc-camtech.org/research
FlyBaseDatabaseDrosophila marker gene reference for cluster annotation
IRAC ClassificationDatabaseInsecticide target gene classification — Paper 1
UK Morgan Compute ClusterHPCCell Ranger alignment for large FAW libraries — Paper 2
"scRNA-seq applied to insect cell lines and tissues is not merely a technical upgrade — it is a paradigm shift in how we understand the cellular machines that eat our crops and transmit our diseases."

What Comes Next?

The Arya et al. studies open multiple immediate research directions. The isolated stem cell populations from SfMG-0617 (Paper 1) are now being treated with 20-hydroxyecdysone to drive differentiation — creating a tractable in vitro system for studying midgut cell fate commitment. On the tissue side, extending scRNA-seq to earlier instar stages, comparing susceptible vs. resistant FAW strains, or profiling midguts challenged with Bt toxins would provide cell-type-resolved views of resistance mechanisms. Spatially resolved transcriptomics would add a third dimension — mapping not just what each cell expresses, but where along the anterior-posterior midgut axis it lives.