Roper Center for Public Opinion Research

Site   Datasets    Advanced Search
University of Connecticut

IDEAS Tutorial

I.D.E.A.S. the Roper Center Data Analysis Tool

The Roper Center provides this tool for your online assistance in analyzing certain datasets. By accessing this feature, you may do your own basic analyses without the aid of expensive statistical software packages.  You will be able to analyze the opinions, attitudes and experiences of the respondents as well as the relationships between them.

 

What does it mean to analyze a dataset?

Whenever a survey is conducted, question responses are recorded in numerical form.  For instance, if a survey asks, “How often would you say you vote—always, nearly always, part of the time, or seldom?” the response “always” might be recorded as 1, “nearly always” as 2, and so forth.  Once the interviewing is completed, the results are entered into a computer-readable collection of these coded responses called a dataset.

 

Codebooks

You can get more detailed information about the survey by clicking on “View codebook in separate window.”  The codebook contains the actual questionnaire, or instrument, giving the complete wording for each variable, the labels and codes for all response categories, and other pertinent information about the survey.

 

A Quick Tutorial:

Let’s say, for example, you want to analyze the dataset “CBS News Poll: Kennedy Assassination [May 1998],” which is located in Elections, Political Parties/Figures (Click on the survey located under Date Analysis Tool) of Topics at a Glance section of the Roper Center website.

 

Frequencies

The pull-down menus used to set up the analysis provide descriptive labels for all the questions in the survey.  In the analysis of a poll, each question in the survey is referred to as a variable.  The most basic form of dataset analysis produces a frequency distribution or topline for each variable. Toplines tell us what percentage of all the respondents gave each response to that particular question.

In the Kennedy survey, let’s look at the results of the question: “Do you think one man--Lee Harvey Oswald--was responsible for the assassination of President (John) Kennedy or do you think there were others involved?”

To run the frequency distribution using IDEAS:

  1. First choose the variable you want to analyze. Use the pull-down menu across from the Row variable to select its descriptive label (Think Oswald was responsible for JFK assassination.)
  2. Below the variables, you also have the option of viewing different percentage totals ( In this case, click on Include Column Percents).
  3. Click on “Weight tables” (a technique used to statistically account for a variable’s relative importance –see below for more details.)
  4. Then, click on “Run the table.”   The frequency distribution along with the complete wording for the question and citation information for the survey will appear on your screen just like below.

IDEAS 1.2: Tables

CBS News Poll, May, 1998 - Kennedy Assassination
Nov 23, 2005 (Wed 11:11 AM Eastern Standard Time)

Variables

Role

Name

Label

Range

MD

Row

R29

Think Oswald was responsible for JFK assassination

1-9

 

Weight

CasWgt

Weight

.168-5.962

 

Frequency Distribution

Cells contain:
-N of cases

Distribution

R29

1: One man, Oswald

82

2: Others involved

608

9: Don't know/No answer

133

COL TOTAL

823

Text for 'R29'

Do you think one man--Lee Harvey Oswald--was responsible for the assassination of President (John) Kennedy or do you think there were others involved?

Allocation of cases (unweighted)

Valid cases

823

Total cases

823

 

Bivariate analysis –“Crosstabulations”

A second level of dataset analysis available in IDEAS is bivariate analysis. In this case, two variables are crosstabulated against one another.  Cross tabs can be run for any variables in the survey you’d like to correlate. Independent variables are placed in columns. Dependent variables are placed in rows. A typical two-way crosstab looks at how responses to a survey question differ by demographic group.  For instance, to use IDEAS to find out how “gender” might have influenced opinions on the Kennedy assassination question do the following:

  1. First, select the descriptive label for that question in the Row pull-down menu, just as you did when running the frequency distribution. 
  2. Then, use the pull-down menu next to Column to select the “Gender” variable.
  3. Click on “Weight tables.”
  4. Then, click on “Run the table.”  The results for the bivariate crosstab, along with the complete wording for both questions will appear on your screen just like the table below.

IDEAS 1.2: Tables

CBS News Poll, May, 1998 - Kennedy Assassination

Nov 23, 2005 (Wed 11:13 AM Eastern Standard Time)

Variables

Role

Name

Label

Range

MD

Row

R29

Think Oswald was responsible for JFK assassination

1-9

 

Column

Gender

Gender

1-2

 

Weight

CasWgt

Weight

.168-5.962

 

Frequency Distribution

Cells contain:
-N of cases

Gender

1
Male

2
Female

ROW
TOTAL

R29

1: One man, Oswald

42

40

82

2: Others involved

289

318

608

9: Don't know/No answer

57

76

133

COL TOTAL

388

435

823

Color coding:

<-2.0

<-1.0

<0.0

>0.0

>1.0

>2.0

T

N in each cell:

Smaller than expected

Larger than expected

 

Text for 'R29'

Do you think one man--Lee Harvey Oswald--was responsible for the assassination of President (John) Kennedy or do you think there were others involved?

Text for 'Gender'

Respondent gender

 

Allocation of cases (unweighted)

Valid cases

823

Total cases

823

 

CSM, UC Berkeley

To understand the color coding system, read below.

 

Control variables – Three-way Crosstabs

Provided a survey sample is large enough to yield a significant result, it is also possible in IDEAS to run a three-way cross tabulation by including a control variable in your analysis. Let’s say, for example, you want to find out how the results of the previous bivariate analysis differ by “Education.”

  1. First, select the descriptive label for assassination question in the Row pull-down menu, just as you did when running the frequency distribution.
  2. Then, use the pull-down menu next to Column to select the “Gender” variable.
  3. Click on “Weight tables.”
  4. To control for “Education,” consult the codebook (Click on “View Codebook” at the top of the screen) to get the mnemonic variable name from the Standard Variable List. The mnemonic name is a short, often non-descriptive identifying label that was assigned to each question in the course of creating the dataset. In this case, it’s “Educ.” Type “Educ” in the Control box.
  5. Then, click on “Run the table.”  The results for the three-way cross tab will appear on your screen just like the table below.

IDEAS 1.2: Tables

CBS News Poll, May, 1998 - Kennedy Assassination

Nov 23, 2005 (Wed 11:20 AM Eastern Standard Time)

Variables

Role

Name

Label

Range

MD

Row

R29

Think Oswald was responsible for JFK assassination

1-9

 

Column

Gender

Gender

1-2

 

Control

Educ

Education

1-9

 

Weight

CasWgt

Weight

.168-5.962

 

 

Statistics for Educ = 1(LT HS grad)

Cells contain:
-N of cases

Gender

1
Male

2
Female

ROW
TOTAL

R29

1: One man, Oswald

1

4

5

2: Others involved

28

36

65

9: Don't know/No answer

5

13

18

COL TOTAL

34

53

88

Color coding:

<-2.0

<-1.0

<0.0

>0.0

>1.0

>2.0

T

N in each cell:

Smaller than expected

Larger than expected

 

Statistics for Educ = 2(HS graduate)

Cells contain:
-N of cases

Gender

1
Male

2
Female

ROW
TOTAL

R29

1: One man, Oswald

13

16

29

2: Others involved

133

137

270

9: Don't know/No answer

18

34

52

COL TOTAL

164

187

351

Color coding:

<-2.0

<-1.0

<0.0

>0.0

>1.0

>2.0

T

N in each cell:

Smaller than expected

Larger than expected

 

 

Statistics for Educ = 3(Some college/trade/business)

Cells contain:
-N of cases

Gender

Male

2
Female

ROW
TOTAL

R29

1: One man, Oswald

11

9

21

2: Others involved

67

89

156

9: Don't know/No answer

17

17

34

COL TOTAL

96

115

210

Color coding:

<-2.0

<-1.0

<0.0

>0.0

>1.0

>2.0

T

N in each cell:

Smaller than expected

Larger than expected

 

 

Statistics for Educ = 4(College graduate)

Cells contain:
-N of cases

Gender

1
Male

2
Female

ROW
TOTAL

R29

1: One man, Oswald

10

7

17

2: Others involved

37

35

72

9: Don't know/No answer

10

8

18

COL TOTAL

56

51

107

Color coding:

<-2.0

<-1.0

<0.0

>0.0

>1.0

>2.0

T

N in each cell:

Smaller than expected

Larger than expected

 

 

Statistics for Educ = 5(Post-graduate)

Cells contain:
-N of cases

Gender

1
Male

2
Female

ROW
TOTAL

R29

1: One man, Oswald

7

4

11

2: Others involved

20

20

40

9: Don't know/No answer

6

4

10

COL TOTAL

32

28

60

Color coding:

<-2.0

<-1.0

<0.0

>0.0

>1.0

>2.0

T

N in each cell:

Smaller than expected

Larger than expected

 

 

Statistics for Educ = 9(Refused)

Cells contain:
-N of cases

Gender

1
Male

2
Female

ROW
TOTAL

R29

2: Others involved

4

1

5

9: Don't know/No answer

1

0

1

COL TOTAL

6

1

7

Color coding:

<-2.0

<-1.0

<0.0

>0.0

>1.0

>2.0

T

N in each cell:

Smaller than expected

Larger than expected

 

 

Statistics for all valid cases

Cells contain:
-N of cases

Gender

1
Male

2
Female

ROW
TOTAL

R29

1: One man, Oswald

42

40

82

2: Others involved

289

318

608

9: Don't know/No answer

57

76

133

COL TOTAL

388

435

823

Color coding:

<-2.0

<-1.0

<0.0

>0.0

>1.0

>2.0

T

N in each cell:

Smaller than expected

Larger than expected

 
 

Text for 'R29'

Do you think one man--Lee Harvey Oswald--was responsible for the assassination of President (John) Kennedy or do you think there were others involved?

Text for 'Gender'

Respondent gender

Text for 'Educ'

What was the last grade in school you completed?

 

Allocation of cases (unweighted)

Valid cases

823

Total cases

823

CSM, UC Berkeley

 

Filter(s)

Sometimes, a researcher might need to study a specific group of respondents in the survey.  IDEAS’s filter function allows you to specify which respondents you wish to include in your analysis.  The subsample, or subset, you select might be respondents in a particular demographic group, such as women or African Americans. A subsample could also consist of individuals who responded in a particular way to one of the questions in the survey. 

For instance, let’s say you’re interested in knowing more about the respondents in the Kennedy survey. You’re interested in finding out about the men in the survey, where they live, and their political ideology.

  1. First, choose Political ideology [R50] in the Row pull down option box.
  2. In the Column box, select Census Region.
  3. In the Selection Filter box, type the mnemonic “Gender (1).” Why? Because Men were coded as “1” in the survey. You’ll be filtering out the Women (2) in this case.
  4. Check the box “Include Column Percents.”
  5. Check the “Weight tables” box.
  6. Run the table.

Your table should look like this:

IDEAS 1.2: Tables

CBS News Poll, May, 1998 - Kennedy Assassination

Nov 23, 2005 (Wed 11:53 AM Eastern Standard Time)

Variables

Role

Name

Label

Range

MD

Row

R50

Political ideology

1-9

 

Column

Region

Census Region

1-4

 

Weight

CasWgt

Weight

.168-5.962

 

Filter

Gender(1)

Gender(=Male)

1-2

 

Frequency Distribution

Cells contain:
-Column percent
-N of cases

Region

1
East

2
North Central

3
South

4
West

ROW
TOTAL

R50

1: Liberal

25.1
24

11.5
10

26.6
34

21.6
17

21.8
85

2: Moderate

35.1
33

51.1
45

41.7
54

35.4
28

41.0
159

3: Conservative

31.7
30

28.6
25

30.7
39

38.6
30

32.1
125

9: Don't know/No answer

8.1
8

8.8
8

.9
1

4.4
3

5.1
20

COL TOTAL

100.0
95

100.0
87

100.0
128

100.0
78

100.0
388

 

Color coding:

<-2.0

<-1.0

<0.0

>0.0

>1.0

>2.0

T

N in each cell:

Smaller than expected

Larger than expected

 
 

Text for 'R50'

How would you describe your views on most political matters? Generally, do you think of yourself as liberal, moderate, or conservative?

Text for 'Gender'

Respondent gender

 

Allocation of cases (unweighted)

Valid cases

353

Cases excluded by filter or weight

470

Total cases

823

CSM, UC Berkeley

Weighting

Survey firms apply a technique called weighting to adjust the poll results to account for possible sample biases caused by specific groups of individuals not responding to a survey.  The weighting mechanism uses known estimates of the total population provided by the US Census Bureau to adjust the final results.  It's not uncommon to weight data by, for instance, age, gender, education, or race in order to achieve the correct demographic proportions.  In IDEAS, you can look at and analyze both weighted and unweighted survey results by checking or unchecking the box labeled Weight Tables.  The default setting is for weighted results.

 

Color coding

In the analysis of survey data, the smaller a sample is, the less likely it is to be representative of the population as a whole; and, therefore, the less reliable the survey results will be based on that sample.  A crosstabulation divides a survey sample, in effect, into a number of smaller samples, each of which occupies a cell in the table produced by the crosstab.  These numbers can become quite low if the questions included in the crosstabs have a lot of response categories, or if the demographic variables include more than two or three groups.  In IDEAS, color coding is used to indicate the relative reliability of the data appearing in each cell of a table.

 

Descriptive statistics

Some users of IDEAS will wish to have more detailed information for their frequencies and crosstabs than the program’s default setting provides.  To obtain a summary of descriptive statistics for an analysis, simply check off the Descriptive Statistics box before running the table. Examples of descriptive statistics include: Mean and Standard Deviation.  Summary statistics provide the statistical significance of the relationship. Let’s find out the descriptive statistics for the relationship between “Gender” and “Think Oswald was responsible for JFK’s assassination”:

  1. In the Row box, choose “Think Oswald was responsible for JFK’s assassination
  2. In the Column box, choose “Gender.”
  3. In Options, click on: “Include Column Percents” and “Include Descriptive Statistics.”
  4. Check “Weight Tables.
  5. Click on Run the table.

The table should look like this:

IDEAS 1.2: Tables

CBS News Poll, May, 1998 - Kennedy Assassination

Nov 23, 2005 (Wed 12:18 PM Eastern Standard Time)

Variables

Role

Name

Label

Range

MD

Row

R29

Think Oswald was responsible for JFK assassination

1-9

 

Column

Gender

Gender

1-2

 

Weight

CasWgt

Weight

.168-5.962

 

 

Frequency Distribution

Cells contain:
-Column percent
-N of cases

Gender

1
Male

2
Female

ROW
TOTAL

R29

1: One man, Oswald

10.8
42

9.3
40

10.0
82

2: Others involved

74.4
289

73.3
318

73.8
608

9: Don't know/No answer

14.7
57

17.4
76

16.2
133

COL TOTAL

100.0
388

100.0
435

100.0
823

Means

2.92

3.13

3.03

Std Devs

2.55

2.72

2.64

Unweighted N

353

470

823

Color coding:

<-2.0

<-1.0

<0.0

>0.0

>1.0

>2.0

T

N in each cell:

Smaller than expected

Larger than expected

 

 

Summary Statistics

Eta* =

.04

 

Gamma =

.09

 

Chisq(P) =

1.47

(p= 0.48)

R =

.04

 

Tau-b =

.04

 

Chisq(LR) =

1.47

(p= 0.48)

Somers' d* =

.04

 

Tau-c =

.04

 

df =

2

 

*Row variable treated as the dependent variable.

 

 

 

Text for 'R29'

Do you think one man--Lee Harvey Oswald--was responsible for the assassination of President (John) Kennedy or do you think there were others involved?

Text for 'Gender'

Respondent gender

 

Allocation of cases (unweighted)

Valid cases

823

Total cases

823

CSM, UC Berkeley

(Download Adobe PDF Version)