My views for SAS Programming in Pharmaceutical Industry

Featured

SUPPQUAL and SDTM merge for ADAM

For an ADAM data set it is necessary to use data in SDTM domain along with its respective SUPPQUAL domain. The programmer needs to follow necessary steps to transpose, rename and transform the IDVARs mentioned in the SUPPQUAL. This is a repetitive task and based on the power bestowed by CDISC through the SDTM metadata this process can be converted to a generic and reusable macro. I have tried to do the same here. This macro merges the SUPPQUAL domain with the SDTM domain for each different idvars and idvarvals combinations present in the SUPPQUAL

/Macro to merge SDTM data set with respective SUPP dataset for all distinct IDVAR in SUPP dataset INPUTS = DS1 is the SDTM dataset DS2 is the SUPP dataset ASSUMPTIONS: Both datasets should be in work library Look For a dataset with fin suffix as the final output WARNING Not to be run where IDVAR is blank example DM SDTM dataset/This enhancement can be added within the macro

%macro mergesupp(ds1=,ds2=);
/Creates a macro variable with distinct IDVARs from the SUPP dataset in the macro variable/
proc sql ;
select distinct idvar into :idvar separated by ” “
from &ds2;
quit;
/iterations for the number of words in the macro i.e. number of distinct IDVARs in the SUPP dataset/
%do i=1 %to %sysfunc(countw(&idvar));
%let seq=%scan(&idvar,&i);
proc sort data=&ds2 out=&ds2.2 ;
by usubjid idvar idvarval;
where idvar =”&seq”;
run;
data &ds2.2&i(rename =(idvarval2=&seq));
set &ds2.2;
idvarval2= input(idvarval,best.);
drop idvarval ;
run;
proc sort data=&ds2.2&i;
by usubjid &seq;
run;
/* Transposing for each IDVAR and creates a separate dataset for each distinct idvar*/
proc transpose data=&ds2.2&i out=&ds2.tr&i(drop =name label) ;
by usubjid &seq;
var qval;
id qnam;
idlabel qlabel;
run;

proc sort data=&ds1;
by usubjid &seq;
run;
proc sort data=&ds1 out=&ds1.&i;
by usubjid &seq;
run;

proc sort data=&ds2.tr&i;
by usubjid &seq;
run;
%end;
/* for first iteration the a dataset is created that is a product of merging of first
IDVAR transposed dataset and SDTM dataset*/
%do i=1 %to %sysfunc(countw(&idvar));
%let seq=%scan(&idvar,&i);

%if &i=1 %then %do;
data &ds1.1;
merge &ds1 &ds2.tr&i;
by usubjid &seq;
run;
%end;
/* for the second and greater iterations datasets are created for each iterations based on
number of distinct idvars greater than 1*/
%else %if &i ne 1 %then %do;
data &ds1.&i;
merge &ds1.%eval(&i-1) &ds2.tr&i;
by usubjid &seq;
run;

%end;
%end;
%let boom=%sysfunc(countw(&idvar));
/* final data set is with suffix fin*/
%if &boom = 1 %then %do;
data &ds1.2fin;
set &ds1.1;
run;
%end;
%else %if &boom ne 1 %then %do;
data &ds1.&boom.fin;
set &ds1.&boom;
run;
%end;
run;
%mend;
%mergesupp(ds1=lb,ds2=supplb);

Excel Sheet to Demog Table

Demographics and baseline characteristics table is common for any Clinical Summary Report (CSR). Without this table a CSR is incomplete at many levels for many customers (internal & external). Each Demographics summary table has some common elements for which the same code is needed to copied and modified according to analysis plan.

I got two suggestions from two different people. One was to add features in the macro to catch if user enters something wrong as input. Second one was to try and explore excel sheets as input for macros and metadata. I have tried to combine both and restricted the user inputs through the excel sheet. I am still at a beginner level for realizing full power of excel. Have to go a long way though.

This macro uses Excel sheet’ s data as inputted by the user to create a standardized Demographics table. The user has to input names of variable needed, the labels, formats, type (categorical or numeric) and which output delivery system to be activated (Default, PDF, RTF). For categorical variables it performs a simple frequency count and for continuous variable it does a proc means to get N, Mean, Median, StdDev, Min and Max. For sake of standardized inputs excel sheet has certain restricted data values. It also creates an “All Subjects” column along with the by values.

For any number of parameters mentioned in the excel sheet for each value in the by variable as well as over this macro calculates counts or summary statistics. Optimal labels as mentioned by the user are assigned to the parameters the final report. If any formats are needed, they need to be created before the macro executes. The formats can be mentioned in another column. The macro takes in each parameter, performs proc means or freq for each by value as well as overall. It stacks up all datasets created and creates a report. If any ODS is specified it directs the report to concerned destination.

A lot of enhancements can be added to this macro. These are some that I could think of

0s can be created instead of missing in proc report
Mean and StdDev can be appended with each other to have one less row
Percentages could be added to counts instead of just counts. Example 10(30.0%)
Cannot handle numeric inputs as by variables for all subjects but can be added.
Have to add capital “N=”

All these changes can be made based on the organizational requirements and standards. I am unable to attach Excel sheet due to WordPress restrictions

****Program: Reporter.sas;
****Programmer: Shrey Virmani;
****Date: 9th October 2019;
****Description: Program to create Demographics summary table from parameters passed in excel(Book2)
sheet;
****Documentation: Reporter.docx;
**LIBNAME statement to connect to the dataset which is in XPT format;
libname xptfile2 xport “/folders/myfolders/analysis/adSL.xpt” access=readonly;
**Copy the dataset from the library reference;
proc copy inlib=xptfile2 outlib=work;
run;
**creates a table using proc sql which has a structre that can be used in proc format to create a
format from the dataset;

proc sql noprint;
create table disagefrmt as
select distinct “$agegrpn” as fmtname,strip(put(agegrpn,best.)) as start,agegrp as label
from adsl;
quit;
** additional formats to be created for reporting by user. cntlin option is used to create
format from dataset;
proc format cntlin=disagefrmt;
value $yesno “Y”=”Yes”
“N”=”No”;
value $sex “M”=”Male”
“F”=”Female”;
value $allsub “ZZZZZZ”=”All subjects”;
run;
**Importing the metadata needed by the macro to create the summary table;
proc import datafile=”/folders/myfolders/BOOK2″
out=macro_meta DBMS=xlsx replace;
getnames=yes;
run;
**deleting missing records that can be created while SAS imports excel sheet;
data macro_meta;
set macro_meta;
if missing(parameter) then delete;
run;
**taking all parameters needed for summary table along with by variable needed;
proc sql noprint;
select distinct parameter, byvar
into: parameters separated by ” “,:byvar
from macro_meta;
quit;
**keeping the variables that we need for reporting;
data adsl2;
set adsl;
keep &parameters &byvar;
run;
***for each variable specified in parameter column in excel sheet the macro does the summary statistics;
**based on input whether it is a Catogorical or Continuous specified in Type in excel sheet; %macro reporter(); %do i=1 %to %sysfunc(countw(¶meters)); %let tempvar=%scan(¶meters,&i); /using symputx to create a macro variable named type with value for type so that
freq or means are activated accordingly/ data null;
set macro_meta;
if upcase(parameter)=strip(upcase(“&tempvar”));
call symputx (“TYPE”,TYPE);
run;
/for continiuous variables proc means with default statistics is done. type represents statistics
for all values in a by variable (type=1) and overall (type=0);/ %if &type=CONTINUOUS %THEN %do; proc means data = adsl2 (keep=&byvar &tempvar) noprint; class &byvar; var &tempvar; output out=adsl_temp_&i(where=(type in (1,0)) )n= std= mean= median= min= max=/autoname autolabel;
run;
/ ZZZZZZ value is updated for type=0 so that we can have it in last column;/ data adsl_temp_&i; set adsl_temp_&i; if &byvar=”” then &byvar=”ZZZZZZ”; run; proc sort data=adsl_temp_&i; by &byvar; run; / unneeded rows are subsetted;/ proc transpose data=adsl_temp_&i out=adsl_temp_&i (where=(upcase(name) not in (“TYPE”,”FREQ”)));
by &byvar;
run;
%end;
/Proc freq is used to make counts or frequencies for each by value as well as overall;/ %else %if &type=CATEGORICAL %THEN %do; proc sort data=adsl2; by &byvar; run; proc freq data=adsl2 noprint; by &byvar; tables &tempvar/out=adsl_temp_&i (rename=( &tempvar =name count=col1 )); run; proc freq data=adsl2 noprint; tables &tempvar/out=adsl_temp_all_&i (rename=( &tempvar =name count=col1 )); run; data adsl_temp_&i; set adsl_temp_&i adsl_temp_all_&i; if missing(&byvar) then &byvar=”ZZZZZZ”; run; data adsl_temp_&i; set adsl_temp_&i; &tempvar.2=”&tempvar”; run; /putc or putn activated at run time based on type of variable (Character or numeric);/ data adsl_temp_&i.2; set adsl_temp_&i; temp_cn=vtype(name);
if temp_cn=”N” then do;
c=”best”;
name2=strip(putn(name,c));
put “Numeric here”;
end;
else if temp_cn=”C” then do;
c=”32767″;
name2=putc(name,c);
put “character here”;
end;
drop name temp_cn;
run;
/* variables renamed to use the same variable and not create new variables;/ data data adsl_temp_&i; set adsl_temp_&i.2; rename &tempvar.2=&tempvar name2=name;
run;
/* temporary datasets matching a pattern are deleted so that won’t cause problems with reporting
dataset;/ proc datasets lib=work nolist ; delete adsl_temp_&i.2 adsl_temp_all_&i; quit; %end; %end; /* using dictionary tables to get our needed datasets and appending them under each other for
reporting;*/
proc sql noprint;
select memname
into :members separated by ” ”
from dictionary.tables
where memname contains (“ADSL_TEMP_”);
quit;

data adsl_final;
    length _name_ $32767;
    set &members;
run;

/** performing table look up and getting parameters with names of categorical variables;/ proc sql noprint; select distinct parameter into:parameter2 separated by “,” from macro_meta where strip(upcase(type))=”CATEGORICAL”; quit; /* coalesce function to create a single variable with are needed data as the structure of the
dataset is not reporting friendly;/ /* concatenating the variable created with coalesce to sub categories(freq) or statistics (means)
within our parameter;/ data adsl_final; set adsl_final; catgrp=COALESCEc (¶meter2); if not missing(catgrp) then name=strip(catgrp)||””||strip(_name);
keep name &byvar label col1;
run;
/* rounding the statistics, getting associated formats and labels using proc sql inner join;/ proc sql; create table adsl_final2 as select a.,b.label,scan(name,2,”“) as stat,left(put(round(col1,0.01),best.)) as col2,b.format from adsl_final as a , macro_meta as b where scan(a._name,1,”_”)=upcase(b.parameter);
quit;
/** using putc for runtime execution of formats;/ data adsl_final3; length stat2 $200; set adsl_final2; stat2=putc(strip(stat),strip(format)); run; /* getting ODS needed (None, PDF, RTF);/ proc sql noprint; select distinct ods into :ods from macro_meta; quit; /* conditionally executing if ODS not equal to none;/ /* if PDF or RTF then finds the path of current library creates output with same name but
different extension in the same folder;/ %if &ODS ne NONE %then %do; %let ext=%scan(%sysfunc(pathname(xptfile2)),2,.); %let path=%sysfunc(tranwrd(%sysfunc(pathname(xptfile2)),&ext,&ods)); ods &ods file=”&path”; %end; /* using across in proc report so that data does not need to transposed at dataset level;*/
proc report data=adsl_final3 missing;
format &byvar $allsub.;
columns label stat2 &byvar,col2 ;
define label /group ” “;
define stat2/group order=data ” “;
define &byvar/across order=data;
define col2/group ” “;
run;
%if &ODS ne NONE %then %do;
ods &ods close;
%end;
%mend;
%reporter;

To Transpose or not!

For any type of data, transposition is quite an important and frequently needed utility. In SAS this can be achieved in many ways including but not limited to Proc transpose and arrays. Proc transpose has a wonderful ID statement by which you can create varialbles that can house the transposed data. In some cases there can be repeats of the ID variable in a particular by-group. For example in clinical data there can be multiple ID values within a visit for a subject for a parameter with unsceduled assessments. Or there can be a data issue with SUPPQUAL dataset which we realize while creating ADAM and things get out of time. If we want to include a proc transpose with ID functionality in macro it becomes necessary to put a check before the transposition takes place. If the data passes the check then transposition occurs. Look at the examples belowFor any type of data, transposition is quite an important and frequently needed utility. In SAS this can be achieved in many ways including but not limited to Proc transpose and arrays. Proc transpose has a wonderful ID statement by which you can create variables that can house the transposed data. In some cases, there can be repeats of the ID variable in a particular by-group. For example, in clinical data there can be multiple ID values within a visit for a subject for a parameter with unscheduled assessments. Or there can be a data issue with SUPPQUAL dataset which we realize while creating ADAM and things get out of time. If we want to include a proc transpose with ID functionality in macro it becomes necessary to put a check before the transposition takes place. If the data passes the check then transposition occurs. Look at the examples belowFor any type of data, transposition is quite an important and frequently needed utility. In SAS this can be achieved in many ways including but not limited to Proc transpose and arrays. Proc transpose has a wonderful ID statement by which you can create varialbles that can house the transposed data. In some cases there can be repeats of the ID variable in a particular by-group. For example in clinical data there can be multiple ID values within a visit for a subject for a parameter with unsceduled assessments. Or there can be a data issue with SUPPQUAL dataset which we realize while creating ADAM and things get out of time. If we want to include a proc transpose with ID functionality in macro it becomes necessary to put a check before the transposition takes place. If the data passes the check then transposition occurs. Look at the examples below

%macro t_no_t(by=,idvar=,var=,ds=);
%let tranwrd=%sysfunc(tranwrd(&by,%nrstr( ),%nrstr(||)));
%put &tranwrd;
proc sort data=&ds.;
by &by;
run;
data &ds.;
set &ds.;
ab=compress(&tranwrd);
run;

proc sql noprint;
create table trancheck as
select distinct ab as ab,count(ab)as abcn,count( &idvar) as a2, count( distinct &idvar) as a1
from &ds
group by ab;
quit;

data trancheck2;
set trancheck;
if abcn=a2 then do;
if a2=a1 then x=1;
else x=2;
end;
else x=2;
run;

data trancheck3;
merge trancheck2 &ds;
by ab;
if x=2;
run;

proc sort data=trancheck3;
by ab &idvar;
run;

data trancheck4;
set trancheck3;
by ab &idvar;
if first.&idvar ne last.&idvar then output;
run;
%let dsnid = %sysfunc(open(trancheck4));

%if &dsnid %then %do;
%let nobs=%sysfunc(attrn(&dsnid,nlobs));
%let rc =%sysfunc(close(&dsnid));
%end;
%else %do;
%put Unable to open &dsn – %sysfunc(sysmsg());
%end;
%if %symexist(nobs) and &nobs >0 %then %do;
data null;
set trancheck4;
put “Repeats for combination=” ab &idvar= ;
run;
%end;
%else %do;
%put No issues found proceed with transpose;
%end;
%mend;
%t_no_t(by=a b C,idvar=cparm,ds=trans);

Mean (SD) Plots for ADAM BDS structure

Mean (SD) plots are a very important part of a clinical SAS programmer’s work. With ADAM in place for standardization an organization can develop a generic common code to create common types of summaries and graphs using the ADAM BDS structure. The below macro creates Mean (SD) plots for all parameters distinct (PARAMCD and PARAM) for a visit (or rather AVISIT 😉 using AVAL. The macro uses PROC SGPLOT to create this graph. Some considerations to follow before using the macro.

This macro works for planned analysis day and considers AWTARGET for x-axis in the ADAM data set. ANL01FL is used here which flags visits based on time window and difference from the expected day. In case user wishes to change it ANL02FL then macro has to changed. Or modifications can be made to number of input parameters to add this. The macro is not applicable to time point assessments less than a day. Any how input data set can be tweaked to incorporate this variable too. example AWTARGET can contain expected study with time point on that day. Refline can be added for ADLB with ANRLO and ANRHI. Or any other criteria in CRIT or MCRIT.

The macro creates formats from the data set inputted by the user. I have used trtpn to start from 0. Modification in code is necessary if trtpn starts from 1. The macro creates gaps between two treatments for which scale is decided from within the macro. The user does not need to provide any info for the gap. It is expected that missing aval are not supplied and visits are also not unscheduled in the data set. Population should be flagged accordingly as part of pre-processing.

%macro meansd(ds=);
proc sql noprint;
select distinct paramcd,param
into:params separated by ” “,:param separated by “{}”
from &ds.;
quit;
proc sql;
create table trtfmt as
select distinct “trt” as fmtname,trtpn as start, trtp as label
from &ds.;
quit;
proc format cntlin=trtfmt;
run;

%do i=1 %to %sysfunc(countw(&params));
%put check %scan(&params,&i) %scan(&param,&i,{});
data &ds.2;
set &ds;
where paramcd=”%scan(&params,&i)”;
run;
proc sort data=&ds.2;
by trtpn paramcd param avisitn avisit awtarget;
run;
proc means data=&ds.2 noprint;
by trtpn paramcd param avisitn avisit awtarget;
var aval;
output out=meansd mean= std= /autoname;
run;
proc sql noprint;
select max(awtarget) as max, 0.01*calculated max as factor
into :max,:factor from meansd;
select count(distinct trtpn) into :trtcnt from &ds;
quit;

data meansd2;
set meansd;
std1=aval_mean+aval_stddev;
std2=aval_mean-aval_stddev;
if trtpn ne 0 then do;
%do j = 1 %to &trtcnt;
if trtpn=&j then awtarget=awtarget + (&j*&factor);
%end;
end;
run;
proc sql noprint;
create table disvis as
select distinct “vis” as fmtname,awtarget as start, avisit as label
from meansd;
quit;

data disvis;
set disvis end=last;
output;
if last then do;
start=start + (&trtcnt*&factor) ;
label=””;
output;
end;
run;

proc sql noprint;
select distinct label, start
into: visitn,:visitc separated by ” “
from disvis;
quit;
proc format cntlin=disvis;
run;
proc sgplot data=meansd2 ;
format awtarget vis. trtpn trt.;
label trtpn =”ARM” awtarget=”Visit”;
%let label=”%scan(&param,&i,{})”;
label aval_mean=&label;
xaxis values=(&visitc ) display=(noticks) valueattrs=(size=8) TICKSTYLE=outside;
YAXIS valueattrs=(size=8) labelattrs=(size=8);
scatter x=awtarget y=aval_mean / yerrorlower=std2
yerrorupper=std1 group =trtpn;
series x=awtarget y=aval_mean/group=trtpn;
title1 ‘Mean (SD) Plot’;
run;
%end;
%mend;
%meansd(ds=adlbc);

SDTM codelist->XML mapper-> SAS formats

I was just experimenting with XML mapper. The CDISC website has codelists for CDASH, SDTM and ADAM in different formats including XML. Once I downloaded this file and loaded it into XML mapper I got .map file. This XML and .map files were read in SAS. It created SAS datasets with all the codelists provided be CDISC. Note: I tried it only for SDTM. The below macro creates SAS format based on the names of the codelist provided by user. For example if you want a SAS format for all values of LBTESTCD with its label then you can get it through this macro. Benefit: The codelist gets updated regularly. You can download a fresh copy with every update of the codelist in form of XML, run the mapper, read it in SAS and run the macro to get your needed formats. Mention name of any number of codelist separated by a space

libname xx xmlv2 ‘/folders/myfolders/tabulations/SDTM Terminology.xml’ automap=replace
xmlmap=’/folders/myfolders/tabulations/SDTM’;

proc copy in=xx out=work;
run;
%macro cod2form(clname=);
%do i=1 %to %sysfunc(countw(&clname));
data codelist2;
set codelist;
if strip(cdiscsubmissionvalue)=”%scan(&clname,&i)”;
drop preferredterm;
run;

data code(rename=(enumerateditem_codedvalue=start preferredterm=label));
merge codelist2(in=a) enumerateditem(in=b);
by codelist_ordinal;
if a;
fmtname=”$”||strip(cdiscsubmissionvalue);
keep fmtname enumerateditem_codedvalue preferredterm;
run;

proc format cntlin = code;
run;
%end;
%mend;
%cod2form(clname=LBTESTCD VSTESTCD EGTESTCD);

Significant Digits

With ADAM implementation we can be sure that BDS structure is same for each sponsor. And that is a blessing I must tell you. We can be sure about the structure of data variables and values within it. We are sure that paramcd contains name of parameter not exceeding 8 characters and aval will contain numeric value of result. Using both these I have created a macro that creates and rounds of each parameter of summary statistics based on input. The user needs to input name of dataset, summary stats needed (n mean median stddev min max) and significant digits associated with respective summary stats in that exact order. The proc means dataset should be named “New” if you want to use this macro as is. This can be changed if the user wants it to be changed. I will also show the proc means code so that the variables generated by you remains same when you use this macro along with proc means so that little or no modification needs to be done in macro

proc means data=adlbc noprint;
class trtpn visit paramcd;
where upcase(visit) ne “UNSCHEDULED”;
var aval;
output out=new(where =(type in (3,7))) n= mean= median= std= min= max= / autoname;
run;

%macro sigdig(ds=,sumpam=,sigdig=);
data &ds.dec &ds.nodec;
set &ds;
if find(put(aval,best.),”.”) then output &ds.dec;
else output &ds.nodec;
run;

proc sql noprint;
select distinct paramcd
into: decmvr
separated by ‘” “‘
from &ds.dec;
quit;

data &ds.nodec;
set &ds.nodec;
where paramcd not in (“&decmvr”);
run;

proc sql ;
create table parmdec as
select paramcd, max(length(strip(put(aval,best.)))) as tot,
max(length(scan(strip(put(aval,best.)),2,”.”)) ) as afterdec

from &ds.dec
group by paramcd;
create table parmnodec as
select paramcd, max(length(strip(put(aval,best.)))) as tot,
0 as afterdec
from &ds.nodec
group by paramcd;

quit;

data parm;
set parmdec parmnodec;
%do j=1 %to %sysfunc(countw(&sigdig));
tot%sysfunc(strip(%scan(&sumpam,&j)))=tot+(%scan(&sigdig,&j))+1;
tot%sysfunc(strip(%scan(&sumpam,&j)))d=afterdec+(%scan(&sigdig,&j));
ft%sysfunc(strip(%scan(&sumpam,&j)))=strip(put(tot%sysfunc(strip(%scan(&sumpam,&j))),best.))||”.”||strip(put(tot%sysfunc(strip(%scan(&sumpam,&j)))d,best.));
call symputx(strip(paramcd)||”%sysfunc(strip(%scan(&sumpam,&j)))”,ft%sysfunc(strip(%scan(&sumpam,&j))));
%put check here strip(paramcd)||”%sysfunc(strip(%scan(&sumpam,&j)))”,ft%sysfunc(strip(%scan(&sumpam,&j)));
putlog ft%sysfunc(strip(%scan(&sumpam,&j))) =;
%end;
run;

proc sql;
create table new2 as
select a.*, b.tot,b.afterdec
from new as a,parm as b
where a.paramcd=b.paramcd;
quit;

proc sql noprint;
select distinct paramcd
into: params
separated by ” “
from &ds.;
quit;
data new3;
%do a=1 %to %sysfunc(countw(&sumpam)); length avalr%sysfunc(strip(%scan(&sumpam,&a))) $200; %end;

set new2;
%do i=1 %to %sysfunc(countw(&params));

if paramcd= “%scan(&params,&i)” then do;
*putlog “%scan(&params,&i)” “=” “%superq(%sysfunc(strip(%scan(&params,&i)))mn)”;
%do l=1 %to %sysfunc(countw(&sumpam));
avalr%sysfunc(strip(%scan(&sumpam,&l)))=put(aval_%sysfunc(strip(%scan(&sumpam,&l))),%superq(%sysfunc(strip(%scan(&params,&i)))%sysfunc(strip(%scan(&sumpam,&l)))));
%end;

end;

%end;
run;

%mend;
%sigdig(ds=adlbc,sumpam=mean stddev median min max,sigdig=1 2 1 0 0);

Maximum length of character variable

In the CDISC guidelines it can be read that the authority frequently gets character variables in data which are 200 in length and handling this large variable size has become an issue in XPT files. Such issues won’t arise if the data is in XML format. Till the industry completely shifts to Dataset.xml from XPT files something needs to be done regarding the length of character variables. The following macro assigns length to each character variable in the data set based on the maximum number of characters in an observation within that variable. Example if there are 5 distinct AEBODSYS with lengths of 20,25,40,67,40 then the macro will assign 67 length to AEBODSYS. If you wish to accommodate uni-code characters sufficient space may be added in the macro. This macro changes the order of variables so it is preferred to run SDTM order macro after this.

%macro maxlen(ds);
libname xptfile xport “/folders/myfolders/tabulations/&ds..xpt” access=readonly;

proc copy inlib=xptfile outlib=work;
run;

proc contents data=&ds out=&ds.cont(where =(type=2));
run;

proc sql noprint;
select distinct name into :&ds.vars separated by ” ” from &ds.cont ;
run;
%put &&&ds.vars;

%do i=1 %to %sysfunc(countw(&&&ds.vars));
%put word %sysfunc(countw(&&&ds.vars));
proc sql;
select “length %scan(&&&ds.vars,&i) $ “||strip(put(max(length(strip(%scan(&&&ds.vars,&i)))),best.) )
into :lengthvar separated by ” ” from &ds;
quit;
%put &lengthvar;
data &ds;
&lengthvar;
set &ds;
run;
%end;

%mend;
%maxlen(ds=AE);

Dataset.xml (Not Define.xml)

From CDISC Webiste – “CDISC developed Dataset-XML v1.0 as a drop-in replacement for SAS V5 XPORT to enable testing using existing processes. Dataset-XML is a truly non-proprietary, global standard, removing many SAS V5 Transport file restrictions (the current file format required by the FDA and PMDA), such as 8-character variable names and 200-character text fields. Dataset-XML and Define-XML are complementary standards; Define-XML metadata describes the Dataset-XML dataset content “

Instead of using XPT files CDISC suggests use of xml files for data. This makes it convenient to check if the dataset adheres to define.xml and also removes the restrictions of XPT files (length of variable name and variable label) and XML is open source. An XSL file can be created and linked to the dataset.xml file so that it can be viewed and printed as a listing. First you need to run SDTM order macro mentioned in the earlier post so that variables can be ordered as per required SDTM standard otherwise the xml file will follow alphabetical order or variables.

The macro creates an output such of the same name as dataset with XML file extension.

Note: There can be unicode characters in the data for example in labs or free text field. Those need to be pre or post processed. example < or> signs can be replaced with &#60 or &#62 followed by a semi-colon of-course . This dataset in XML format can be read in SAS using XML mapper and libname statement.

I am placing an image for the code as wordpress interprets our needed tags as html and omits them. if you need the code drop an email to me

Ordering, labeling and sorting according to SDTM submission

In order for your submission to pass many checks need to be taken care of. Some of those are ordering the variables according to SDTM 1.4 (as emulated by Pinnacle 21), label, codelists, sort variables etc. I have tried to create a generic macro that makes that data compliant on three levels (labels, sort and order). Other functionality can be added example codelists.

The macro takes in your dataset and metadata (excel) and makes the dataset compliant for aforementioned 3 parameters. Orders as per requirement of SDTM submission, Labels each variable in your dataset with label in SDTM metadata and creates sort variables based on variables mentioned in SDTM metadata. For ordering, SDTM certain Role types to precede others in the dataset eg. Identifier->topic->Qualifier-> timing. Within the roles too there is an order that has to be followed (SDTM 1.4).

First let us look at metadata.

/* IMport the excel sheet from the Excel file based on the specified domain in the macro
as input
Output= Orders the variables based on SDTM requirement and creates data set
Sorts the dataset based on the identifier variables as requried by SDTM
Labels each variable based on excel sheet*/
%macro order_sdtm(ds=);
proc import datafile=”/folders/myfolders/SDTM_META”
out=&ds._meta DBMS=xlsx replace;
sheet=”&ds”;
getnames=yes;
run;
libname xptfile xport “/folders/myfolders/tabulations/&ds..xpt” access=readonly;

proc copy inlib=xptfile outlib=work;
run;

proc contents data=&ds out=&ds._cont noprint;
run;

Rules from SDTM documentation 1.4;
data &ds._meta;
set &ds._meta;
if upcase(role)=”IDENTIFIER” then do;
order=1;
if find(upcase(var_nam),”STUDYID”) then order=1.1;
if find(upcase(var_nam),”DOMAIN”) then order=1.2;
if find(upcase(var_nam),”USUBJID”) then order=1.3;
if find(upcase(var_nam),”APID”) then order=1.31;
if find(upcase(var_nam),”POOLID”) then order=1.4;
if find(upcase(var_nam),”IDVAR”) then order=1.41;
if find(upcase(var_nam),”IDVARVAL”) then order=1.42;
if find(upcase(var_nam),”RELTYPE”) then order=1.43;
if find(upcase(var_nam),”RELID”) then order=1.44;
if find(upcase(var_nam),”SPDEVID”) then order=1.5;
if find(upcase(var_nam),”SEQ”) then order=1.6;
if find(upcase(var_nam),”GRPID”) then order=1.7;
if find(upcase(var_nam),”REFID”) then order=1.8;
if find(upcase(var_nam),”SPID”) then order=1.9;
if find(upcase(var_nam),”LNKID”) then order=1.91;
if find(upcase(var_nam),”LNKGRP”) then order=1.92;
end;
else if upcase(role)=”TOPIC” then order=2;
else if find(upcase(role),”QUALIFIER”) then do;
order=3;
if upcase(class)=”FINDING” then do;
if find(upcase(var_nam),”TEST”) then order=3.1;
if find(upcase(var_nam),”OBJ”) then order=3.11;
if find(upcase(var_nam),”MODIFY”) then order=3.2;
if find(upcase(var_nam),”TSTDTL”) then order=3.3;
if find(upcase(var_nam),”CAT”) then order=3.4;
if find(upcase(var_nam),”SCAT”) then order=3.5;
if find(upcase(var_nam),”POS”) then order=3.6;
if find(upcase(var_nam),”BODSYS”) then order=3.7;
if find(upcase(var_nam),”ORRES”) then order=3.8;
if find(upcase(var_nam),”ORRESU”) then order=3.9;
if find(upcase(var_nam),”ORNRLO”) then order=3.91;
if find(upcase(var_nam),”ORNRHI”) then order=3.92;
if find(upcase(var_nam),”STRESC”) then order=3.93;
if find(upcase(var_nam),”STRESN”) then order=3.94;
if find(upcase(var_nam),”STRESU”) then order=3.95;
if find(upcase(var_nam),”STNRLO”) then order=3.96;
if find(upcase(var_nam),”STNRHI”) then order=3.97;
if find(upcase(var_nam),”STNRC”) then order=3.98;
if find(upcase(var_nam),”NRIND”) then order=3.99;
if find(upcase(var_nam),”RESCAT”) then order=3.991;
if find(upcase(var_nam),”STAT”) then order=3.992;
if find(upcase(var_nam),”REASND”) then order=3.993;
if find(upcase(var_nam),”XFN”) then order=3.994;
if find(upcase(var_nam),”NAM”) then order=3.995;
if find(upcase(var_nam),”LOINC”) then order=3.996;
if find(upcase(var_nam),”SPEC”) then order=3.997;
if find(upcase(var_nam),”ANTREG”) then order=3.998;
if find(upcase(var_nam),”SPCCND”) then order=3.999;
if find(upcase(var_nam),”SPCUFL”) then order=3.9991;
if find(upcase(var_nam),”LOC”) then order=3.9992;
if find(upcase(var_nam),”LAT”) then order=3.9993;
if find(upcase(var_nam),”DIR”) then order=3.9994;
if find(upcase(var_nam),”PORTOT”) then order=3.9995;
if find(upcase(var_nam),”METHOD”) then order=3.9996;
if find(upcase(var_nam),”RUNID”) then order=3.9997;
if find(upcase(var_nam),”ANMETH”) then order=3.9998;
if find(upcase(var_nam),”LEAD”) then order=3.9999;
if find(upcase(var_nam),”CSTATE”) then order=3.99991;
if find(upcase(var_nam),”BLFL”) then order=3.99992;
if find(upcase(var_nam),”FAST”) then order=3.99993;
if find(upcase(var_nam),”DRVFL”) then order=3.99994;
if find(upcase(var_nam),”EVAL”) then order=3.99995;
if find(upcase(var_nam),”EVALID”) then order=3.99996;
if find(upcase(var_nam),”ACPTFL”) then order=3.99997;
if find(upcase(var_nam),”TOX”) then order=3.99998;
if find(upcase(var_nam),”TOXGR”) then order=3.99999;
if find(upcase(var_nam),”SEV”) then order=3.999991;
if find(upcase(var_nam),”DTHREL”) then order=3.999992;
if find(upcase(var_nam),”LLOQ”) then order=3.999993;
if find(upcase(var_nam),”ULOQ”) then order=3.999994;
if find(upcase(var_nam),”EXCLFL”) then order=3.999995;
if find(upcase(var_nam),”REASEX”) then order=3.999996;
end;
if upcase(class)=”INTERVENTION” then do;
if find(upcase(var_nam),”MODIFY”) then order=3.2;
if find(upcase(var_nam),”DECOD”) then order=3.3;
if find(upcase(var_nam),”MOOD”) then order=3.31;
if find(upcase(var_nam),”CAT”) then order=3.4;
if find(upcase(var_nam),”SCAT”) then order=3.5;
if find(upcase(var_nam),”PRESP”) then order=3.6;
if find(upcase(var_nam),”OCCUR”) then order=3.7;
if find(upcase(var_nam),”STAT”) then order=3.8;
if find(upcase(var_nam),”REASND”) then order=3.9;
if find(upcase(var_nam),”INDC”) then order=3.91;
if find(upcase(var_nam),”CLAS”) then order=3.92;
if find(upcase(var_nam),”CLASCD”) then order=3.93;
if find(upcase(var_nam),”DOSE”) then order=3.94;
if find(upcase(var_nam),”DOSTXT”) then order=3.95;
if find(upcase(var_nam),”DOSU”) then order=3.96;
if find(upcase(var_nam),”DOSFRM”) then order=3.97;
if find(upcase(var_nam),”DOSFRQ”) then order=3.98;
if find(upcase(var_nam),”DOSTOT”) then order=3.99;
if find(upcase(var_nam),”DOSRGM”) then order=3.991;
if find(upcase(var_nam),”ROUTE”) then order=3.993;
if find(upcase(var_nam),”LOT”) then order=3.994;
if find(upcase(var_nam),”LOC”) then order=3.995;
if find(upcase(var_nam),”LAT”) then order=3.996;
if find(upcase(var_nam),”DIR”) then order=3.997;
if find(upcase(var_nam),”PORTOT”) then order=3.998;
if find(upcase(var_nam),”FAST”) then order=3.999;
if find(upcase(var_nam),”PSTRG”) then order=3.9991;
if find(upcase(var_nam),”PSTRGU”) then order=3.9992;
if find(upcase(var_nam),”TRTV”) then order=3.9993;
if find(upcase(var_nam),”VAMT”) then order=3.9994;
if find(upcase(var_nam),”VAMTU”) then order=3.9995;
if find(upcase(var_nam),”ADJ”) then order=3.9996;
end;
if upcase(class)=”EVENT” then do;
if find(upcase(var_nam),”MODIFY”) then order=3.2;
if find(upcase(var_nam),”LLT”) then order=3.3;
if find(upcase(var_nam),”LLTCD”) then order=3.31;
if find(upcase(var_nam),”DECOD”) then order=3.4;
if find(upcase(var_nam),”PTCD”) then order=3.5;
if find(upcase(var_nam),”HLT”) then order=3.6;
if find(upcase(var_nam),”HLTCD”) then order=3.7;
if find(upcase(var_nam),”HLGT”) then order=3.8;
if find(upcase(var_nam),”HLGTCD”) then order=3.9;
if find(upcase(var_nam),”CAT”) then order=3.91;
if find(upcase(var_nam),”SCAT”) then order=3.92;
if find(upcase(var_nam),”PRESP”) then order=3.93;
if find(upcase(var_nam),”OCCUR”) then order=3.94;
if find(upcase(var_nam),”STAT”) then order=3.95;
if find(upcase(var_nam),”REASND”) then order=3.96;
if find(upcase(var_nam),”BODSYS”) then order=3.97;
if find(upcase(var_nam),”BODSYSCD”) then order=3.98;
if find(upcase(var_nam),”SOC”) then order=3.99;
if find(upcase(var_nam),”SOCCD”) then order=3.991;
if find(upcase(var_nam),”LOC”) then order=3.993;
if find(upcase(var_nam),”LAT”) then order=3.994;
if find(upcase(var_nam),”DIR”) then order=3.995;
if find(upcase(var_nam),”PORTOT”) then order=3.996;
if find(upcase(var_nam),”PARTY”) then order=3.997;
if find(upcase(var_nam),”PRTYID”) then order=3.998;
if find(upcase(var_nam),”SEV”) then order=3.999;
if find(upcase(var_nam),”SER”) then order=3.9991;
if find(upcase(var_nam),”ACN”) then order=3.9992;
if find(upcase(var_nam),”ACNOTH”) then order=3.9993;
if find(upcase(var_nam),”ACNDEV”) then order=3.9994;
if find(upcase(var_nam),”REL”) then order=3.9995;
if find(upcase(var_nam),”RELNST”) then order=3.9996;
if find(upcase(var_nam),”PATT”) then order=3.9997;
if find(upcase(var_nam),”OUT”) then order=3.9998;
if find(upcase(var_nam),”SCAN”) then order=3.9999;
if find(upcase(var_nam),”SCONG”) then order=3.99991;
if find(upcase(var_nam),”SDISAB”) then order=3.99992;
if find(upcase(var_nam),”SDTH”) then order=3.99993;
if find(upcase(var_nam),”SHOSP”) then order=3.99994;
if find(upcase(var_nam),”SLIFE”) then order=3.99995;
if find(upcase(var_nam),”SOD”) then order=3.99996;
if find(upcase(var_nam),”SMIE”) then order=3.99997;
if find(upcase(var_nam),”CONTRT”) then order=3.99998;
if find(upcase(var_nam),”TOX”) then order=3.99999;
if find(upcase(var_nam),”TOXGR”) then order=3.999991;
end; end;
else if upcase(role)=”TIMING” then do;
order=4;
if find(upcase(var_nam),”VISITNUM”) then order=4.1;
if find(upcase(var_nam),”VISIT”) and not find(upcase(var_nam),”VISITNUM”) then order=4.2;
if find(upcase(var_nam),”VISITDY”) then order=4.3;
if find(upcase(var_nam),”TAETORD”) then order=4.4;
if find(upcase(var_nam),”EPOCH”) then order=4.5;
if find(upcase(var_nam),”DTC”) then order=4.6;
if find(upcase(var_nam),”STDTC”) then order=4.7;
if find(upcase(var_nam),”ENDTC”) then order=4.8;
if find(upcase(var_nam),”DY”) then order=4.9;
if find(upcase(var_nam),”STDY”) then order=4.91;
if find(upcase(var_nam),”ENDY”) then order=4.92;
if find(upcase(var_nam),”DUR”) then order=4.93;
if find(upcase(var_nam),”TPT”) then order=4.94;
if find(upcase(var_nam),”TPTNUM”) then order=4.95;
if find(upcase(var_nam),”ELTM”) then order=4.96;
if find(upcase(var_nam),”TPTREF”) then order=4.97;
if find(upcase(var_nam),”RFTDTC”) then order=4.98;
if find(upcase(var_nam),”STRF”) then order=4.99;
if find(upcase(var_nam),”ENRF”) then order=4.991;
if find(upcase(var_nam),”EVLINT”) then order=4.992;
if find(upcase(var_nam),”EVINTX”) then order=4.993;
if find(upcase(var_nam),”STRTPT”) then order=4.994;
if find(upcase(var_nam),”STTPT”) then order=4.995;
if find(upcase(var_nam),”ENRTPT”) then order=4.996;
if find(upcase(var_nam),”ENTPT”) then order=4.997;
if find(upcase(var_nam),”STINT”) then order=4.998;
if find(upcase(var_nam),”ENINT”) then order=4.999;
if find(upcase(var_nam),”DETECT”) then order=4.9991;
end;
run;
proc sort data=&ds._meta;
by order var_nam;
run;
*selecting only those variables that are collected from the metadata(excel);
proc sql noprint;
create table &ds._m_c as
select * from &ds._meta
where var_nam in (
select name from
&ds._cont)
order by order;
*macro varaible creation to order the variables;
select distinct order,var_nam
into :order separated by ” “,:vars separated by ” “
from &ds._m_c
order by order;
macro variable creation to create sort variable;
select order,var_nam into :order2 separated by ” “,:varsort separated by ” ”
from &ds._m_c
where upcase(role)=”IDENTIFIER”
order by order;
quit;
%put
run;
options symbolgen=yes;
*ordering;
data &ds.;
retain &vars;
set &ds;
run;
sorting;
proc sort data=&ds.;
by &varsort;
run;
*contents to check;
proc contents data=&ds. out=&ds._cont;
run;
*labeling;
proc sql noprint;
select left(trim(var_nam))||” ='”||left(trim(label))||”‘”,order
into :varlab separated by ” “,:order3
from &ds._m_c
order by order
;
data &ds;
set &ds;
label &varlab;;
run;
quit;

%mend ;
%order_sdtm(ds=relrec);

Mapping to SDTM

This macro requires metadata in form of excel sheet. example

And SDTM meta data example

The program performs three types of mapping. Rename, straight and transpose. Please have a look at the input and output LB datset

%macro mapper(ds=);
proc datasets lib=work
nolist kill;
quit;
run;
proc import datafile=”/folders/myfolders/SDTM_META”
out=&ds._meta DBMS=xlsx replace;
sheet=”&ds”;
getnames=yes;
run;
proc import datafile=”/folders/myfolders/MAPPING”
out=&ds._map DBMS=xlsx replace;
sheet=”&ds._LEG”;
getnames=yes;
run;

data one;
set &ds._map;
where upcase(DERIVATION)=”ONE”
and upcase(left(trim(SDTM))) ne upcase(left(trim(OLD)));
run;

proc contents data=&ds out=&ds._CONT;
run;

proc sql noprint;
select left(trim(a.old)) ||”=”|| left(trim(a.SDTM))
into :rename separated by ” “
from one as a
where a.old in (select distinct name from &ds._cont) ;
quit;

%put check &rename;
data &ds._new;
set &ds;
rename &rename;
run;
data &ds._many;
set &ds._map;
where upcase(DERIVATION)=”MANY”;
run;
%let dsid=%sysfunc(open(&ds._many));
%let nobs=%sysfunc(attrn(&dsid,nlobs));
%let dsid=%sysfunc(close(&dsid));

%put nobs= &nobs;

%if &nobs ne 0 %then %do;

proc sql noprint;
select count(distinct SDTM) into :transcn from &ds._MANY;
quit;
run;

proc sql noprint;
select count(distinct SDTM) into :transcn from &ds._MANY;
quit;
proc sort data=&ds._many;
by SDTM;
run;
data &ds._many;
retain id;
set &ds._many;
by SDTM;
if first.sdtm then ID+1;
run;

%do i=1 %to &transcn;
data tran_where_&ds.;
set &ds.many; where id=&i; call symputx(“byvar”,key); call symputx(“sdtm”,sdtm); run; proc sql noprint; select distinct old into:vartotr separated by ” ” from tran_where&ds.
where id=&i;
quit;
proc transpose data=&ds out=trans_&ds.&i(rename=(col1=&sdtm)) ; by &byvar; var &vartotr; run; proc sql noprint; select distinct id_var into :ID_VARs from tran_where&ds;
quit;
%if &id_VARS ne %then %do;
data trans_&ds.&i(rename=( _name=&id_vars));
set trans_&ds.&i; run; %end; %else %if &id_vars eq %nrstr() %then %do; data trans&ds.&i(drop=_name);
set trans_&ds.&i; run; %end; %end; options mprint; proc sql noprint; select distinct memname into :mergevars separated by ” ” from dictionary.columns where memname contains (“TRANS&ds.“); quit; proc sql noprint; select distinct key_SDTM into: byvar2 separated by ” ” from &ds._MAP ; quit; %do j=1 %to &transcn; data trans&ds.&j (rename=(%do k=1 %to &transcn; %scan(&byvar,&k)=%scan(&byvar2,&k)%end ;)); set trans&ds._&j;
run;
%end;
data &ds._new;
merge &ds._new &mergevars;
by &byvar2;
run;
%end;
proc sql noprint;
select distinct OLD into:DROPS separated by ” ”
from &ds._MAP
where upcase(derivation)=”MANY”;
quit;
data &ds._new;
set &ds._new;
%if %symexist(drops) %then %do;
drop &drops;
%end;
run;
quit;
%mend;
%mapper(ds=LB);

Log Check

As a SAS programmer in a regulated environment we all know the importance of Log checks. I have written a utility macro to do that. In the code you need to mentioned what kind of text string needs to be checked. Input name of log and path. Internal macro modification necessary.

%let pgmnane=Program1.sas;
%put &pgmnane;
%let pgmpath=/folders/myfolders/;

%macro logprint(logname=,path=);
data log;
length onevar $32767;
infile “&pgmpath&pgmnane log” dlm=”[][][][][][][][]” truncover;
input onevar $;
run;
DATA LOG2 ;
set log;
if not find (upcase(onevar),”WHERE UPCASE(ONEVAR)”) ;
if find(upcase(onevar),”ERROR”) or
find(upcase(onevar),”NOTE”) or
find(upcase(onevar),”WARNING”);
run;
proc print data =log2;
label onevar=” “;
run;
%mend;

%logprint(logname=&pgmnane,path=&pgmpath);