What is the therapeutic area you worked earlier?
There are so many diff.
therapeutic areas a pharmaceutical company can work on and few of them include,
anti-viral (HIV), Alzheimer’s, Respiratory, Oncology, Metabolic Disorders
(Anti-Diabetic), Neurological, Cardiovascular. Few more of them, include…
Central nervous system
Neurology
Gastroenterology
Ophthalmology
Orthopedics and pain
control
Pulmonary
Vaccines
Dermatology
Gene therapy
Immunology etc
What are your responsibilities?
Some of them include;
not necessarily all of them….
· Extracting the data
from various internal and external database (Oracle, DB2, Excel spreadsheets)
using SAS/ACCESS, SAS/INPUT.
· Developing programs in
SAS Base for converting the Oracle Data for a phase II study into SAS datasets using SQL
Pass through facility and Libname facility.
· Creating and deriving
the datasets, listings and summary tables for Phase-I and Phase-II ofclinical trials.
· Developing the SAS
programs for listings & tables for data review & presentation including ad-hoc reports, CRTs as per CDISC,
patients listing mapping of safety database and safety tables.
· Involved in mapping,
pooling and analysis of clinical study data for safety.
· Using the Base SAS
(MEANS, FREQ, SUMMARY, TABULATE, REPORT etc) and SAS/STAT procedures (REG, GLM,
ANOVA, and UNIVARIATE etc.) for summarization, Cross-Tabulations and
statistical analysis purposes.
· Developing the Macros
at various instances for automating listings and graphing of clinical data for analysis.
· Validating and QC of
the efficacy and safety tables.
· Creating the Ad hoc
reports using the SAS procedures and used ODS statements and PROC TEMPLATE to
generate different output formats like HTML, PDF and excel to view them in the
web browser.
· Performing data
extraction from various
repositories and pre-process data when applicable.
· Creating the
Statistical reports using Proc Report, Data _null_ and SAS Macro.
· Analyzing the data
according to the Statistical Analysis Plan (SAP).
· Generating the
demographic tables, adverse events and serious adverse events reports.
Can you tell me something about your last project study design?
If the interviewer asked
you this question, then you need to tell that your current project is on a
phase-1 study (or phase-2/Phase-3). You also need to tell about the name of the
drug and the therapeutic area. Here are some more details you need to lay
down in front of him…
a) Is it a single
blinded or double-blinded study?
b) Is it a randomized or
non-randomized study?
c) How many patients are
enrolled.
d) Safety parameters
only (if it is a phase-1)
e) Safety and efficacy
parameters if the study is either Phase-2,3or 4.
How many subjects were there?
Subjects are nothing but
the patients involved in the clinical study.
Answer to this question
depends on the type of the study you have involved in.
If the study is phase1
answer should be approx. between 30-100.
If the study is phase2
answer should be approx. between 100-1000.
If the study is phase3
answer should be approx. between 1000-5000.
Note: These are just typical and not exact numbers.
How many analyzed data sets did you create?
Again it depends on the study and the safety and efficacy parameters that are need to determined from the study. Approx. 20-30 datasets is required for a study to get analyzed for the safety and efficacy parameters. Here is some ex. of the datasets.
DM (Demographics),
MH (Medical History),
AE (Adverse Events),
PE (Physical Education),
EG (ECG),
VS (Vital Signs),
CM (Concomitant Medication),
LB (Laboratory),
QS (Questionnaire),
IE (Inclusion and Exclusion),
DS (Disposition),
DT (Death),
SV (Subject Visits),
SC (Subject Characteristics),
CO (Comments),
EX (Exposure),
PC (Pharmacokinetic Concentrations),
PP (Pharmacokinetic Parameters),
TI (Therapeutic Intervention),
and other Supplementary datasets like
SUPPCM, SUPPEX, SUPPLB, SUPPMH, SUPPXT, SUPPEG etc.
How did you create analyzed data sets?
Analysis datasets are
nothing but the datasets that are used for the statistical analysis of the
data. Analysis datasets contains the raw data and the variables derived from
the raw data. Variables, which are derived for the raw data, are used to
produce the TLG’s of the clinicalstudy. The
safety as well as efficacy endpoints (parameters) dictate the type of the
datasets are required by the clinical study for generating the statistical reports of
the TLG’s. Sometimes the analysis datasets will have the variables not
necessarily required to generate the statistical reports but sometimes they may
required to generate the ad-hoc reports.
Refer also http://www2.sas.com/proceedings/forum2008/207-2008.pdf to get the complete info about creation of
datasets:
How many tables, listings and graphs?
Can be in between 30-100
(including TLG’s)
What do you mean by treatment emergent and treatment emergent
serious adverse events?
Treatment emergent
adverse events and Treatment emergent serious adverse events are nothing but
the adverse events and serious adverse events which were happened after the
drug administration or getting worsen by the drug, if patients are already
having those adverse events before drug administration.
Can you explain little bit about the datasets?
DEMOGRAPHIC analysis dataset contains all subjects’ demographic data (i.e.,
Age, Race, and Gender), disposition data (i.e., Date patient withdrew from the
study), treatment groups and key dates such as date of first dose, date of last
collected Case Report Form (CRF) and duration on treatment. The dataset has the
format of one observation per subject.
LABORATORY analysis dataset contains all subjects’ laboratory data, in the
format of one observation per subject per test code per visit per accession
number. Here, we derive the study visits according to the study window defined
in the SAP, as well as re-grade the laboratory toxicity per protocol. For a
crossover study, both the visit related to the initial period and as it is
related to the beginning of the new study period will be derived. If the
laboratory data are collected from multiple local lab centers, this analysis
dataset will also centralize the laboratory data and standardize measurement units by using conversion factors.
EFFICACY analysis
dataset contains derived primary and secondary endpoint variables as defined in
the SAP. In addition, this dataset can contain other efficacy parameters of
interest, such as censor variables pertaining to the time to an efficacy event.
This dataset has the format of one record per subject per analysis period.
SAFETY can
be categorized into four analysis datasets:
VITAL SIGN analysis dataset captures all subjects’ vital signs collected
during the trial. This dataset has the format of one observation per subject
per vital sign per visit, similar to thestructure for the laboratory analysis dataset.
ADVERSE EVENT analysis dataset contains all adverse events (AEs) reported
including serious adverse events (SAEs) for all subjects. A treatment emergent
flag, as well as a flag to indicate if an event is reported within 30 days
after the subject permanently discontinued from the study, will be calculated.
This dataset has a format of one record per subject per adverse event per start
date. Partial dates and missing AEs start and/or stop dates will be imputed
using logic defined in the SAP.
MEDICATION analysis dataset contains the subjects’ medication records
including concomitant medications and other medications taken either prior to
the beginning of study or during the study. This dataset has a format of one
record per subject per medication taken per start date. Incomplete and missing
medication start or stop dates will be imputed using instructions defined in
the SAP.
SAFETY analysis
dataset contains other safety variables, whether they are defined in the SAP or
not. The Safety analysis dataset, similar to Efficacy analysis dataset in structure, consists of data with one record per
subject per analysis period to capture safety parameters for all subjects.
It is crucial to generate analysis datasets in a specific order, as some variables derived from one particular analysis dataset may be used as the inputs to generate other variables in other analysis datasets. For example, the time to event variables in the efficacy and safety analysis datasets are calculated based on the date of the first dose derived in the demographic analysis dataset.
Analysis datasets are generated in sequence
(Safety Datasets)
Demographic _______Laboratory __________Efficacy
Vital Sign
Adverse Event
Concomitant Medications
Source:www.thotwave.com/Document/.../GlobalArch/SUGI117-30_GlobalArchitecture.pdf
What is your involvement while using CDISC standards? What is mean
by CDISC where do you use it?
CDISC is nothing but an
organization (Clinical Data Interchange Standards Consortium), which implements
industrial standards for the pharmaceutical industries to submit the clinical
data to FDA.
There are so many
advantages of using CDISC standards: Reduced time for regulatory submissions,
more efficient regulatory reviews of submission, savings in time and money on
data transfers among business.
CDISC standards is used in following activities:
Developing CRTs for submitting them to FDA to get an NDA.
Mapping, pooling and
analysis of clinical study data for safety.
Creating the annotated
case report form (eCRF) using CDISC-SDTM mapping.
Creating the Analysis
Datasets in CDISC and non-CDISC Standards for further
SAS Programming.
What do you mean when you say you created tables, listings and
graphs for ISS and ISE?
How do you do data cleaning?
It is always important
to check the data we are using- especially for the variables what we are using.
Data cleaning is critical for the data we are using and preparing.
I use Proc Freq, Proc
SQL, MEANS, UNIVARIATE etc to clean the data.
I will use Proc Print with WHERE statement to get the invalid date
values.
Can you tell me CRT's??
Creating Case Report Tabulations (CRTs) for an NDA Electronic
Submission to the FDA
ABSTRACT:The Food and
Drug Administration (FDA) now strongly encourages all new drug applications
(NDAs) be submitted electronically. Electronic submissions could help FDA
application reviewers scan documents efficiently and check analyses by
manipulating the very datasets and code used to generate them.The potential
saving in reviewer time and cost is enormous while improving the quality of
oversight. In January 1999, the FDA released the Guidance for Industry:
Providing Regulatory Submissions in Electronic Format – NDAs. As described, one
important part of the application package is the case report tabulations
(CRTs), now serving as the instrument for submitting datasets. CRTs are made up
of two parts: first, datasets inSAS® transport file format and second, the
accompanying documentation for the datasets. Herein, we briefly review the
content and conversion of datasets to SAS transport file format, and then
elaborate on the code that makes easy work of theaccompanying dataset
documentation (in the form of data definition tables) using the SAS Output
Delivery System (ODS). The intended audience is SAS programmers with an
intermediate knowledge of the BASE product used under any operating system and
who are involved in the biotechnology industries.
Where do you use MEdDra and WHO? Can you write a code? How do you
use it?
What is MedDRA?
The Medical Dictionary
for Regulatory Activities (MedDRA) has been developed as a pragmatic,
clinically validated medical terminology with an emphasis on ease-of-use data
entry, retrieval, analysis, and display, with a suitable balance between
sensitivity and specificity, within the regulatory environment. MedDRA is
applicable to all phases of drug development and the health effects of devices.
By providing one source of medical terminology, MedDRA improves the
effectiveness and transparency of medical product regulation worldwide.
MedDRA is used to report
adverse event data from clinical trials, as well as post-marketing and
pharmacovigilance.
What are the structural elements of the terminology in MedDRA?
The structural elements
of the MedDRA terminology are as follows:
SOC (System Orgon Class) - Highest level of the terminology, and
distinguished by anatomical or physiological system, etiology, or purpose
HLGT( High Level Group Term) – Subordinate to SOC, superordinate descriptor for
one or more HLTs
HLT (High Level Term) – Subordinate to HLGT, superordinate descriptor for one or more PTs
PT (Preferred Term) – Represents a single medical concept
LLT (Lower Level Term) – Lowest level of the terminology, related to a
single PT as a synonym, lexical variant, or quasi-synonym (Note: All PTs have
an identical LLT).
In what format is MedDRA distributed?
MedDRA is distributed in
sets of flat ASCII delimited files. There is a different set of files for each
available language. The Czech translation is distributed in UTF-8 format. For
detail information as to file names, data record scheme, and record layout,
sees the MedDRA ASCII and Consecutive Files Documentation document, which can
be downloaded from the MedDRA MSSO Web site. MedDRA is delivered in text file
format. As of MedDRA Version 11.1, the total size of all ASCII files for the
English version is 12,459KB.
THE WHODRUG DICTIONARY:
The WHODrug dictionary
was started in 1968. The dictionary contains information on both single and
multiple ingredient medications. Drugs are classified according to the type of
drug name being entered, (i.e. proprietary/trade name, nonproprietary name,
chemical name, etc.). At present, 52 countries submit medication data to the
WHO Collaborating Center, which is responsible for the maintenance and
distribution of the drug dictionary. Updates to the dictionary are offered four
times per year.
What do you mean by used Macro facility to produce weekly and
monthly reports?
The SAS macro facility
can do lot of things and especially it is used to…
• reduce code repetition
• increase control over
program execution
• minimize manual
intervention
• create modular code.
to get more info about macro facility.
How did you validate table’s, listings and what are the other
things you validated?
First, the output from
the listing needs to be read into a SAS data set. Next, the validation results
need to be calculated (you need to do this anyway) and then turned into a SAS
data set with the same layout and properties as the one created from the
original output. Last, SAS compares the original versus validation data sets by
using PROC COMPARE. The results are concise, quick, accurate and 100% complete.
We have to use the same procedure to validate the Tables.
We will also validate
graphs made in SAS… but to do that we need to use SAS/GRAPH Network
Visualization Workshop and using it we can validate graphs made with SAS
automatically as well as manually.
Did you see anywhere that. Patient is randomized to one drug and
the patient is given another drug? if you get in which population would you put
that patient into?
I will consider that
patient in the group of the drug that he was given. Before I do anything, I
will make sure it is a data entry error or patient is actually given the other
drug.
What would you do if you had to pool the data related to one
parallel study and one cross over study?
Say If you have a same subject in two groups taking two different
drugs.. and If you had to pool these two groups how would you do it?
This situation arises
when the study is a cross over design study. I would consider the same patient
as two different patients of each treatment group.
What are the phases you are good at?
Phase-I,II and III.
How would you transpose dataset using data step?
Using Proc Transpose
Procedure.
Proc transpose data=old out=new prefix=DATE;
var date;
by name;
run;
The prefix= option
controls the names for the transposed variables (DATE1, DATE2, etc.) Without
it, the names of the new variables would be COL1, COL2, etc.
Actually, proc transpose
creates an extra variable, _NAME_, indicating the name of the transposed
variable. _NAME_ has a value of DATE on both observations. To eliminate the
extra variable, modify a portion of the proc statement:
out=new (drop=_name_);
The equivalent data step code using arrays could be:
data new (keep=name date1-date3);
set old;
by name;
array dates {3} date1-date3;
retain date1-date3;
if first.name then i=1;
else i + 1;
dates{i} = date;
if last.name;
run;
This program assumes
that each name has exactly three observations. If a name had more, the program
would generate an error message when hitting the fourth observation for that
name. When i=4, this statement encounters an array subscript out of range:
dates{i} = date;
If some patient misses one lab how would you assign values for
that missing values?? Can you write the code?
Same answer as the below
question….
How do you deal with missing values?
Whenever SAS encounters
an invalid or blank value in the file being read, the value is defined as
missing. In all subsequent processes and output, the value is represented as a
period (if the variable is numeric-valued) or is left blank (if the variable is
character-valued).
In DATA step programming, use a period to refer to missing numeric values.
For example, to recode
missing values in the variable A to the value 99, use the following statement:
IF a=. Then a=99;
Use the MISSING
statement to define certain characters to represent special missing values for
all numeric variables. The special missing values can be any of the 26 letters
of the alphabet, or an underscore. In the example below, the values 'a' and 'b'
will be interpreted as special missing values for every numeric variable.
MISSING a b ;
Did you ever create efficacy tables?
Yes, I have created
Efficacy tables. Efficacy tables are developed to get an the information about
primary objectives/parameters of the study.
What is the primary and secondary end point in your last project?
Primary and secondary endpoints of the clinical trial conducted is given under the SAP. You can download the sample protocol as well as trial SAP from my blog (http://www.studysas.blogspot.com/ ) or else go to http://www.clinicaltrials.gov/ , and then type the name of pharmaceutical company, it will give you the list of clinical trials conducted by that company, if you just click on any one study, you will be able to see the primary and secondary objectives and all other details.
Primary and secondary endpoints of the clinical trial conducted is given under the SAP. You can download the sample protocol as well as trial SAP from my blog (http://www.studysas.blogspot.com/ ) or else go to http://www.clinicaltrials.gov/ , and then type the name of pharmaceutical company, it will give you the list of clinical trials conducted by that company, if you just click on any one study, you will be able to see the primary and secondary objectives and all other details.
What are the stat procedures you used?
ANOVA, CATMOD, FREQ,
GLM, LIFEREG, LIFETEST, LOGISTIC, NPAR1WAY, REG, TTEST, UNIVARIATE, MEANS,
SUMMARY etc
Tell me something about proc mixed? (Sometimes they may ask you to
write the syntax)Syntax: http://ftp.sas.com/samples/A55235
PROC MIXED is a
generalization of the GLM procedure in the sense that PROC GLM fits standard
linear models, and PROC MIXED fits the wider class of mixed linear models. Both
procedures have similar CLASS, MODEL, CONTRAST, ESTIMATE, and LSMEANS
statements, but their RANDOM and REPEATED statements differ (see the following
paragraphs). Both procedures use the nonfull-rank model parameterization,
although the sorting of classification levels can differ between the two. PROC
MIXED computes only Type I -Type III tests of fixed effects, while PROC GLM
offers Types I - IV. The RANDOM statement in PROC MIXED incorporates random
effects constituting the vector in the mixed model. However, in PROC GLM,
effects specified in the RANDOM statement are still treated as fixed as far as
the model fit is concerned, and they serve only to produce corresponding
expected mean squares.
What would you do, if you have to use data step functions in macro
definition? Can you use all the functions in data step in macro definition?
Yes.
If I have a dataset with different subjid's and each subjid has
many records? How can I obtain last but one record for each patient?
Syntax:
Proc sort data=old;
By subjid;
Run;
Data new;
Set old;
By subjid;
If first.subjid;
Run;
Or
proc sort data=old out=new nodupkey;
by subjid;
run;
Can you get some value of a data step variable to be used in any
other program you do later in the same SAS session? How do you do that?
Use a macro… with a %PUT
statement.
What would you do if you have to access previous records values in
current record?
Using ampersand sign….
&var.
What is a p value? Why should u calculate that? What are the
procedures you can use for that?
If the p-value were
greater than 0.05, you would say that the group of independent variables does
not show a statistically significant relationship with the dependent variable,
or that the group of independent variables does not reliably predict the
dependent variable. Note that this is an overall significance test assessing
whether the group of independent variables, when used together reliably
predicts the dependent variable, and does not address the ability of any of the
particular independent variables to predict the dependent variable. Using the
PROC FREQ, PROC ANOVA, PROC GLM and PROC TTEST we cal
calculate the p-value.
What do you usually do with proc life test?
PROC LIFETEST is
used to obtain Kaplan-Meier and life table survival estimates (and plots).
Using a strata statement in Proc Lifetest, which compare survival estimates for
different groups.
Can you get survival estimates with any other procedures?
PROC LIFEREG and PROC
PHREG can also be used to get the survival estimates along with PROC
LIFETEST.
Can you write a code to get the survival estimates?
proc lifetest data=data method=km outsurv=newdata;
time survival*status(0);
strata study;
run;
What is the difference between stratum and by statement in Proc
Lifetest?
You can specify a BY
statement with PROC LIFETEST to obtain separate analyses on observations in
groups defined by the BY variables.
The BY statement is more
efficient than the STRATA statement for defining strata in large data sets.
However, if you use the BY statement to define strata, PROC LIFETEST does not
pool over strata for testing the association of survival time with covariates
nor does it test for homogeneity across the BY groups.
The STRATA statement
indicates which variables determine strata levels for the computations.
The strata are formed
according to the non-missing values of the designated strata variables. The
MISSING option can be used to allow missing values as a valid stratum level.
Which procedure do you usually use to create reports?
Proc Report, proc
Tabulate and Data _null_.
What do you do, if you had to get the column names and some title
in every page of your report when you create it using data_null_?
Give your data _null_ titles the "proc print" and
"proc report" feel
The more you can make
your "data _null_" behave like "proc print" or "proc
report", when it comes to titles, the better. If the "byline"
option is set then put out a dashed "byline". If not, then don't.
Does your "by" variable have a label? If so, then your dashed byline
should have the text of your variable label in it on the left of the equals
sign. If the variable has no label then it should just be the variable name. If
that's the way "proc report" or "proc print" does it then
do it that way with your "data _null". Get it to interface with
#byval and #byvar entries if they exist. Give people the feel that "data
_null_" reporting is no different to using "proc print" or
"proc report" and you will have less opposition to your "data
_null_" reports. How you do this is already in those two pages. You are
going to find yourself in a situation whereby you really must do the report
using data _null_ but other people are not comfortable with it because they
feel it is "too different" than using "proc report". The
more you can give it the same feel, the more easily you can dip into "data
_null_" when you have to without people worrying.
How do you use the macro which is created by some other people and
which is in some other folder other than SAS?
With SAS Autocall
library using the SAS Autos system.
Can you tell me something regarding the macro libraries?
Macro libraries are the
libraries, which stores all the macros required for developing TLG’s of the
clinical trial. These are very are necessary in controlling and managing the
macros. With the help of a %INCLUDE statement; the stored macros in the macro
library can be automatically called.
Can you show me how the efficacy table looks like?
Can you show me how the safety table looks like?
Did you use ODS?
Yes, I have used the
ODS(Output Delivery System), which normally used to make the output from the
Tables, Listings and graphs looks pretty. ODS creates the outputs in html, pdf
and rtf formats.
General syntax:
Start the output with:
Ods output---format ;
SAS statements……………..
…..
Ods output-format close;
Your resume says you created HTML, RTF, PDF? Why you had to create
three?? Can you tell me in specific why each form is used?
There are several ways
of format to create the SAS output.
To publish or to place
the output on the Internet we need to create the output in HTML format, by
converting the output into HTML files. We generally create the SAS output in
RTF, because the RTF can be opened in Word or other word processors. If we need
to send the printable reports through email, we need to create the output in
PDF. PDF output is also needed when we send documents required to file an NDA
to FDA.
What are the graphs you created?
Survival estimate
graphs.
What are the procedures you used to create them?
PROC LIFETEST, PROC
GCHART, PROC GPLOT, PROC GREPLAY etc.
Can you generate statistics using Proc SQl?
Yes, we can generate the
statistics like N, Mean, Median, Max, Min, STD & SUM using PROC SQL. But
SQL procedure cannot calculate all the above statistics by default, as it is
the case with PROC MEANS.
When do you prefer Proc SQl? Give me some situation?
The SQL procedure supports almost all the functions available in the DATA step for the creation of data as well as the manipulation of the data.
The SQL procedure supports almost all the functions available in the DATA step for the creation of data as well as the manipulation of the data.
When we compare the same
result, obtained from SQL and with the Data step, PROC SQL requires less code
and, more importantly it requires less time to execute the code.
How do you delete a macro variable?
If the macro variable is
stored in the library then it is a easy to delete it. Multiple variables may be
deleted by placing the variable names in the DELETE statement:
Why do you have to use proc import and proc export wizards? Give me the situation?
Safety Datasets Examples:
Following 16 datasets
are the examples for safety datasets.........
·
Adverse
Events,
·
(Prior
and) Concomitant Medications,
·
Comments,
·
Demographics,
·
Disposition/End
of Study,
·
Drug
Accountability,
·
ECG,
·
Exposure,
·
Inclusion
and Exclusion Criteria,
·
Lab,
·
Medical
History,
·
Physical
Examination,
·
Protocol
Violations,
·
Subject
Characteristics,
·
Substance
Use, and
·
Vital
Signs.
No comments:
Post a Comment