Problem Identification & Formulation
Session at a Glance
Problem identification; from topic to RQ; hypotheses; conceptual frameworks; variables; FINER criteria
RQ formulation workshop; One-page problem statement draft; Peer critique
2 hrs Lecture + 12 hrs Lab/Project
Problem Statement Draft Due
Learning Objectives
By the end of this session, you will be able to:
- Identify viable sources of research problems across literature, industry, and personal observation
- Narrow a broad topic into a focused, answerable research question using systematic scoping techniques
- Evaluate a research problem against the FINER criteria (Feasible, Interesting, Novel, Ethical, Relevant)
- Formulate well-structured research questions and, where appropriate, testable hypotheses
- Construct a conceptual framework identifying independent, dependent, moderating, mediating, and control variables
- Produce a one-page problem statement suitable for supervisor feedback
Session Planner
Suggested breakdown of the 4-hour contact session.
| Time | Segment | Activity | Mode |
|---|---|---|---|
| 0:00–0:08 | Opening | Recap Week 2 paradigms; connect paradigm choice to problem formulation | Whole class |
| 0:08–0:30 | Lecture 1 | Sources of research problems; from topic → problem → RQ; the FINER criteria | Lecture |
| 0:30–0:55 | Lecture 2 | Formulating RQs and hypotheses; conceptual frameworks; variable types; BBA + BCA examples | Lecture |
| 0:55–1:10 | Activity | RQ diagnosis: evaluate 6 RQs against FINER criteria; classify as good or problematic | Pairs |
| 1:10–1:25 | Discussion | Share evaluations; discuss what makes an RQ "researchable" | Whole class |
| 1:25–1:40 | Break | — | — |
| 1:40–2:00 | Lab Briefing | RQ formulation workshop instructions; problem statement template walkthrough; FINER self-check | Demo |
| 2:00–3:30 | Lab Work | Individual RQ refinement; one-page problem statement drafting; peer critique in pairs | Individual/Pairs |
| 3:30–3:50 | Discussion | Volunteers share problem statements; facilitator highlights strong examples | Whole class |
| 3:50–4:00 | Exit Ticket | Submit problem statement draft; self-assess against FINER | Individual |
1. Where Do Research Problems Come From?
Students often say: "I don't know what to research." But research problems are everywhere — you just need to know where to look and how to recognize them. The best research problems sit at the intersection of what interests you, what matters to a community, and what hasn't been answered yet.
Published papers end with "future research directions" or "limitations of this study." These are gold. A gap exists when: (a) a relationship hasn't been tested in a new context, (b) a method hasn't been applied to a problem, (c) findings conflict across studies, or (d) a population or technology is under-studied.
BBA: "Prior research studied CSR in large firms — what about SMEs?"
BCA: "Federated learning has been studied for image data — what about tabular healthcare data?"
Real organizations face real problems that lack evidence-based answers. Industry conferences, practitioner journals, and your own internship experience are rich sources. The question: "Is this just one company's problem, or does it point to a broader gap in knowledge?"
BBA: "The company I interned at struggled with employee retention after going remote — is this a generalizable phenomenon?"
BCA: "Our team kept introducing bugs during CI/CD — what practices actually reduce defect injection rates?"
Something you've noticed, experienced, or wondered about. These problems have the advantage of genuine motivation — you actually care about the answer. The challenge is moving from personal curiosity to a researchable question with broader significance.
BBA: "Why do my friends trust UPI for payments but not for investments?"
BCA: "Why does my app's battery drain spike on certain devices but not others?"
Your supervisor has spent years in a research area. They know which questions are timely, which methods are appropriate, and — crucially — which problems are too big, too small, or already solved. A supervisor-suggested topic is not "cheating" — it's leveraging expertise to avoid dead ends.
Ask your supervisor: "What are the open questions in your area that would be appropriate for an undergraduate capstone?"
The best capstone problems are personally motivating (you'll sustain interest for 30 weeks), intellectually significant (it contributes to a literature), and practically feasible (data exists or can be collected; you have or can acquire the necessary skills). If a problem lacks any of these three, it will cause trouble later.
2. From Topic to Problem to Research Question — Narrowing the Scope
The most common mistake in capstone proposals is starting too broad. "I want to study digital marketing" is not a research problem — it's a domain. "I want to study artificial intelligence" is not a research problem — it's a field. You must progressively narrow until you reach a question that can be answered within 30 weeks with the resources available to an undergraduate.
2.1 The Narrowing Funnel
"Digital Marketing" / "Machine Learning" / "Consumer Behaviour" / "Cybersecurity"
"Social media influencer marketing effectiveness among Gen Z consumers in Indian metros"
"Transformer model compression for low-resource Indian languages"
"While influencer marketing spend is growing rapidly in India, we don't know whether micro-influencers (10k–100k followers) or macro-influencers (100k+) generate higher engagement-to-conversion ratios for D2C brands — existing studies report conflicting findings and focus on Western markets."
RQ: "How does influencer tier (micro vs. macro) affect engagement rate and purchase conversion among Gen Z consumers of Indian D2C beauty brands on Instagram?"
2.2 BCA Narrowing Example
"Natural Language Processing for Indian languages"
"Sentiment analysis for Hindi-English code-mixed social media text"
"Existing multilingual models (mBERT, XLM-R) perform poorly on Hindi-English code-mixed text because they are pre-trained on monolingual corpora. There is no systematic comparison of fine-tuning strategies specifically for code-mixed sentiment analysis, and no publicly available benchmark for Hindi-English code-mixed sentiment."
RQ1: "How does fine-tuning strategy (full model vs. adapter-based vs. prefix-tuning) affect sentiment classification accuracy on Hindi-English code-mixed text?"
RQ2: "Does synthetic code-mixed data augmentation improve performance over fine-tuning on naturally occurring code-mixed data alone?"
3. The FINER Criteria — Is Your Problem Worth Pursuing?
Hulley et al. (2007) proposed the FINER framework for evaluating research problems. Before committing 30 weeks to a problem, run it through these five filters:
| Criterion | Meaning | Questions to Ask | Red Flags |
|---|---|---|---|
| F — Feasible | Can you actually do it within 30 weeks with undergraduate resources? | Do you have access to data/participants? Do you have the necessary skills (or can you learn them in time)? Is the scope manageable? Can you get ethical clearance if needed? | Data requires corporate partnerships you don't have; needs 2 years of fieldwork; requires a ₹50 lakh lab; needs 10,000 survey respondents with no budget |
| I — Interesting | Will you sustain motivation for 30 weeks? Will your supervisor, examiners, and peers find it engaging? | Do you genuinely care about the answer? Is the question intellectually stimulating — not just fact-gathering? Will you still want to work on this in Week 25? | "I'm doing this because it's easy" / "My supervisor told me to" (with no personal investment) / the question has an obvious answer |
| N — Novel | Does it contribute something new — even modestly — to existing knowledge? | Has someone already answered this exact question? If yes, what's different about your approach (context, method, population, time period)? Can you articulate the gap in one sentence? | A pure replication with no new context; a question that a Google search answers; "I'm going to prove what everyone already knows" |
| E — Ethical | Can the research be conducted without harming participants, violating privacy, or breaching integrity? | Does your study involve vulnerable populations? Will you need informed consent? Are there data privacy concerns (GDPR, DPDP Act 2023)? Could your findings be misused? | Deceptive experiments without debriefing; collecting personal data without consent; studying children/minors without special protocols; security research that could enable attacks |
| R — Relevant | Does the answer matter — to the literature, to practice, to society? | Who will care about your findings? What decision could your research inform? What problem does it help solve? Is the relevance beyond your personal interest? | "I'm curious" (with no broader relevance); studying a phenomenon that no longer exists; research that can't inform any decision or practice |
Most capstone problems fail on Feasibility (too ambitious) or Novelty (not a real gap). The art is finding a problem that is novel enough to contribute but feasible enough to complete. Your supervisor is your best guide on this trade-off — they've seen what works and what doesn't.
4. Formulating Research Questions and Hypotheses
4.1 Research Questions — The Engine of Your Capstone
A well-formed research question (RQ) is clear, focused, complex, and answerable. It is not a topic, not a yes/no question, not a statement of intent. It is the question your entire dissertation will answer. Most capstones have 1–3 RQs (not 10 — focus is a virtue).
| RQ Type | Purpose | Starter Phrase | BBA Example | BCA Example |
|---|---|---|---|---|
| Descriptive | Describes a phenomenon | "What are the..." "How prevalent is..." | "What are the primary barriers to digital payment adoption among street vendors in Mumbai?" | "What types of technical debt are most commonly reported in open-source microservice projects?" |
| Relational / Comparative | Examines relationships or differences | "How does X relate to Y?" "To what extent does A differ from B?" | "How does brand activism perception relate to purchase intention among Gen Z consumers in India?" | "How does container runtime (Docker vs. containerd vs. gVisor) affect cold-start latency in serverless functions?" |
| Causal | Tests cause-effect | "What is the effect of X on Y?" "Does X cause Y?" | "What is the effect of flexible work arrangements on employee retention in Indian IT SMEs?" | "What is the effect of code review checklist usage on defect detection rate in agile teams?" |
| Design / Construction | Creates and evaluates an artefact | "How can we design a... that...?" "Can a... improve...?" | "How can we design a predictive model for early-stage startup failure in the Indian fintech sector, and how accurately does it perform?" | "Can a lightweight transformer-based model achieve comparable accuracy to large models for code-mixed sentiment analysis while reducing inference latency by 50%?" |
4.2 Hypotheses — When and How to Use Them
Hypotheses are formal, testable predictions derived from theory — they belong primarily to the positivist paradigm. If you're doing interpretivist, pragmatist, or design science research, you may not need hypotheses at all (propositions or design goals may be more appropriate).
A hypothesis is a specific, testable prediction about the relationship between two or more variables. It is stated in a form that allows it to be rejected (falsified) by empirical evidence. A hypothesis is never "proved" — it is supported or not supported by the data.
| Element | Explanation | Example |
|---|---|---|
| Null Hypothesis (H₀) | States there is NO relationship/effect/difference. This is what you test statistically. | "There is no significant difference in purchase conversion between micro-influencer and macro-influencer campaigns." |
| Alternative Hypothesis (H₁) | States there IS a relationship/effect/difference. This is what you believe or hope to find. | "Micro-influencer campaigns generate significantly higher purchase conversion than macro-influencer campaigns." |
| Directional | Specifies the direction of the relationship (higher/lower/more/less). | "Micro-influencers generate higher engagement than macro-influencers." |
| Non-directional | Predicts a difference but not its direction. Used when literature is mixed or exploratory. | "There is a significant difference in engagement between micro- and macro-influencers." |
If your paradigm is positivist and you have strong theory predicting a relationship → use hypotheses. If your paradigm is interpretivist → use RQs only (hypotheses make no sense when you're exploring meanings). If pragmatist → RQs for the qualitative strand, hypotheses (optional) for the quantitative strand. If DSR → design goals or evaluation questions, not hypotheses in the statistical sense.
5. Conceptual Frameworks and Theoretical Grounding
A conceptual framework is a visual or narrative representation of the key concepts in your study and the relationships you expect to find among them. It shows what you're studying, what you think is connected to what, and — crucially — why (grounded in theory). It is the bridge between your literature review and your methodology.
A system of concepts, assumptions, expectations, beliefs, and theories that supports and informs your research. It explains — graphically or in narrative form — the main things to be studied: the key factors, constructs, or variables, and the presumed relationships among them.
5.1 Variables — The Building Blocks of Frameworks
| Variable Type | Role | BBA Example | BCA Example |
|---|---|---|---|
| Independent (IV) | The cause / predictor / what you manipulate | Influencer tier (micro vs. macro) | Quantization method (dynamic vs. static vs. QAT) |
| Dependent (DV) | The effect / outcome / what you measure | Purchase conversion rate | Model accuracy (F1-score) |
| Moderating (MV) | Changes the strength or direction of the IV→DV relationship | Product category (beauty vs. fashion vs. electronics) — influencer effect may differ by category | Dataset size — quantization effect may differ for small vs. large datasets |
| Mediating (MedV) | Explains the mechanism — WHY the IV affects the DV | Perceived authenticity — micro-influencers → higher authenticity → higher conversion | Representation capacity loss — quantization → loss of representational precision → lower accuracy |
| Control (CV) | Held constant to isolate the IV→DV relationship | Brand familiarity, post frequency, time of posting, follower count | Hardware platform, batch size, input sequence length, random seed |
5.2 Example Conceptual Framework (BBA)
Influencer Tier
(Micro vs. Macro)
Perceived Authenticity
Purchase Conversion
5.3 Example Conceptual Framework (BCA)
Fine-tuning Strategy
(Full / Adapter / Prefix)
Parameter Efficiency
(Trainable params / total)
Sentiment F1-Score
A conceptual framework must be grounded in theory. Don't just draw boxes and arrows because they look good — each arrow should represent a relationship that prior literature suggests exists. For every path in your framework, you should be able to answer: "Which theory or prior study supports this relationship?" If you can't, the arrow doesn't belong.
6. Same Structure, Different Discipline — A Dual Illustration
The structure of a good research problem is discipline-independent. Whether you study brand loyalty (BBA) or software adoption (BCA), the logical architecture is the same. What changes is the phenomenon, the data, and the method — not the underlying logic of inquiry.
| Element | BBA — Brand Loyalty | BCA — Software Adoption |
|---|---|---|
| Broad domain | Consumer behaviour | Software engineering / HCI |
| Problem | Indian D2C brands invest heavily in loyalty programs, but it's unclear which program features actually drive repeat purchase behaviour among Gen Z consumers. | Organizations invest in static analysis tools, but adoption by developers remains low despite evidence that these tools find real bugs — it's unclear which factors predict sustained tool usage. |
| RQ | "What factors influence brand loyalty program engagement and repeat purchase behaviour among Gen Z consumers of Indian D2C brands?" | "What factors influence sustained adoption of static analysis tools among professional software developers?" |
| Paradigm | Positivist (survey + statistical modelling) | Pragmatist (survey + follow-up interviews with developers who adopted AND abandoned tools) |
| IV(s) | Reward type, personalization, community features, redemption ease | Tool usability, result actionability, integration with IDE, organizational mandate |
| DV | Repeat purchase frequency | Tool usage frequency and duration |
| Moderator | Product category involvement (high vs. low) | Developer experience level (junior vs. senior) |
| Theory base | Expectation-Confirmation Theory (Oliver); Technology Acceptance Model (Davis) | Technology Acceptance Model (Davis); Unified Theory of Acceptance and Use of Technology (Venkatesh et al.) |
The ability to take a vague topic and craft it into a clear, researchable problem with defined variables, a theoretical base, and a justified paradigm is the most valuable skill this course teaches. It transfers to consulting (diagnosing client problems), product management (defining what to measure), data science (framing analysis questions), and academic research. The phenomenon changes; the thinking doesn't.
Think Deeper — Cross Questions
Discuss in pairs before sharing with the class.
A student says: "My research question is: 'How can Indian startups succeed?'" What is wrong with this as an RQ? Apply the narrowing funnel and rewrite it into a focused, answerable research question.
Apply the FINER criteria to a research problem you're considering. Which criterion is most at risk for your topic? What could you change — scope, method, data source — to strengthen the weakest criterion?
A BCA student's conceptual framework has 8 independent variables, 3 mediating variables, and 4 dependent variables. Is this a problem? What principle of good research design is being violated, and how would you advise this student?
"My research doesn't need theory — I'm just describing what's happening." Do you agree with this statement? When might descriptive research still require a conceptual framework? What does theory add even when you're not testing causal hypotheses?
Quick Check — RQ Diagnosis
Each "RQ" below has a problem. Diagnose it and select the most accurate description of the issue.
1. "This research aims to study the impact of digital transformation on business performance."
2. "Do Instagram ads work better than Facebook ads for Indian D2C brands?"
3. "What is the effect of 4-bit weight-only quantization versus 8-bit weight-and-activation quantization on BERT-base's GLUE benchmark scores, inference latency on NVIDIA T4 GPUs, and model size in megabytes?"
4. "How do first-generation women entrepreneurs in Tier-2 Indian cities experience and navigate gender-based challenges while building ventures, and what strategies do they develop to sustain their businesses?"
5. "To investigate the relationship between employee engagement and productivity in the manufacturing sector."
6. "H₁: Micro-influencers generate significantly higher Instagram engagement rates than macro-influencers for Indian D2C beauty brands. H₂: This effect is moderated by content type (tutorial vs. testimonial vs. unboxing), such that the micro-influencer advantage is strongest for testimonial content."
Knowledge Check — Interactive Quiz
Test your understanding of problem formulation concepts.
Q1. In the FINER criteria, what does the "F" stand for, and why is it the most common reason capstone proposals are rejected?
Q2. A researcher expects that "training data size will influence the effect of model architecture on classification accuracy — specifically, larger training sets will reduce the accuracy gap between simple and complex architectures." In this scenario, training data size is a:
Q3. Which of the following is the BEST example of a well-formulated research question (as opposed to a topic, statement of intent, or unanswerable question)?
Q4. A conceptual framework differs from a theoretical framework because:
Q5. A BCA student wants to study: "Can a novel caching strategy reduce database query latency?" What critical element is missing from this research question?
Lab Activity — RQ Formulation Workshop & Problem Statement Draft
Part A: RQ Formulation Workshop (60 min)
- Start with your tentative topic from Week 2's Topic Negotiation Worksheet.
- Apply the narrowing funnel: Write your topic at each level — Domain → Focused Area → Research Problem → Research Question. You should have four written statements of increasing specificity.
- FINER self-check: Rate your RQ on each FINER criterion (1–5). Identify the weakest criterion and write one sentence about how you'll strengthen it.
- Peer critique: Exchange RQs with a partner. Each partner gives feedback: Is the RQ clear? Is it answerable in 30 weeks? Are the key concepts defined? What's missing?
- Revise: Incorporate peer feedback and produce a revised RQ.
Part B: One-Page Problem Statement Draft
Using the template below, draft your one-page problem statement. This will be submitted to your supervisor and forms the foundation of your proposal (Week 8).
1. Working Title (descriptive, not clever — max 15 words)
2. Background (3–5 sentences: What is the broader domain? Why does it matter? What do we already know? What don't we know? Cite at least 2–3 key references that establish the context.)
3. Problem Statement (3–4 sentences: What specific gap, tension, or unanswered question exists in the literature or practice? Why is this gap significant? What is the consequence of NOT addressing it?)
4. Research Question(s) (1–3 focused, answerable RQs. For positivist studies, include hypotheses below the RQs.)
5. Proposed Paradigm & Method (2–3 sentences: Which paradigm? Which research strategy — survey, experiment, case study, DSR, etc.? Why is this appropriate for your RQ?)
6. Expected Contribution (2–3 sentences: Who will benefit from your findings? What decision, practice, or understanding could your research inform?)
7. Key References (5–8 references that establish the gap and ground your study. Use APA 7th for BBA, IEEE for BCA.)
Exit Ticket
Submit with your problem statement draft.
- State your refined RQ in one sentence.
- FINER self-rating: Score your RQ on each criterion (F __ / I __ / N __ / E __ / R __). Which is weakest?
- Identify your IV and DV (if applicable). If your study is not variable-based (interpretivist / DSR), identify your phenomenon of interest and how you'll study it.
- One concern you have about the feasibility of your topic:
- One question you'll ask your supervisor at your first meeting:
Key Takeaways — Week 3
The #1 problem in capstone proposals is excessive breadth. Use the narrowing funnel: Domain → Focused Area → Research Problem → RQ. If your RQ can't be answered in 30 weeks, it's still too broad.
Run every RQ through FINER. Feasibility and Novelty are where most proposals fail. A feasible but modest novel contribution beats an ambitious but impossible one every time.
IV = what you manipulate/predict with. DV = what you measure. Moderator = what changes the relationship. Mediator = why the relationship exists. Controls = what you hold constant. If you can't identify these, your RQ isn't ready.
A conceptual framework without theory is just boxes and arrows. Every relationship in your framework must be supported by prior literature. "I think these are related" is not a justification.
Facilitator Notes
Preparation Checklist
- Print or share the Problem Statement Template for each student.
- Prepare 3 worked examples of narrowing funnels — one BBA, one BCA, one cross-disciplinary — to walk through in Lecture 1.
- Prepare 3 examples of good vs. problematic RQs for the RQ diagnosis exercise (in addition to the 6 in the quick check).
- Have the FINER self-check rubric ready as a handout or digital form.
- Coordinate with supervisors: they should expect problem statement drafts from their allocated students by end of this week.
- For BCA students: prepare additional examples of DSR problem statements (which look different from positivist problem statements — problem-motivation rather than gap-hypothesis).
Common Student Difficulties
- Confusing a topic with an RQ: "I want to study fintech" or "My RQ is about blockchain." Keep pushing: "What about fintech? What question about blockchain?" The narrowing funnel exercise is the antidote.
- Making RQs that are yes/no questions: "Does CSR affect profitability?" can be answered with one word. Encourage RQs that ask "how," "to what extent," "under what conditions" — questions that require analysis, not just a verdict.
- Too many variables: Students often propose frameworks with 6+ IVs, 3 mediators, and 4 DVs. Teach the principle of parsimony: model the most important relationships, not every possible one.
- Theory anxiety: Students may not know which theory to use. This is normal at Week 3. Direct them to their supervisor and to review papers in their area (look at which theories those papers use).
- BCA students not identifying variables: In DSR, the "variables" may be design parameters rather than IVs/DVs in the traditional sense. Help them map DSR concepts (design goals, evaluation criteria) to the variable framework.
Pacing Tips
- The narrowing funnel is the most important concept this week — give it sufficient time. Walk through both the BBA and BCA examples slowly.
- The RQ diagnosis exercise works best when pairs discuss before the whole-class debrief. Give 10–12 min for pair discussion.
- If lab time is tight, prioritize the Problem Statement Draft over the full RQ formulation workshop. The draft is the milestone deliverable.
- For mixed BBA/BCA cohorts: pair cross-discipline students for peer critique — it forces them to explain their RQs to someone outside their domain, which reveals vagueness.