|
The Roper Center provides this tool for your online assistance in analyzing
certain datasets. By accessing this feature, you may do your own basic analyses
without the aid of expensive statistical software packages. You will be able
to analyze the opinions, attitudes and experiences of the respondents as well
as the relationships between them.
What does it mean to analyze a dataset?
Whenever a survey is conducted, question responses are recorded in numerical form. For
instance, if a survey asks, “How often would you say you vote—always, nearly
always, part of the time, or seldom?” the response “always” might be recorded
as 1, “nearly always” as 2, and so forth. Once the interviewing is completed, the
results are entered into a computer-readable collection of these coded
responses called a dataset.
Codebooks
You can get more detailed information about the survey by clicking on “View
codebook in separate window.” The codebook contains the actual questionnaire,
or instrument, giving the complete wording for each variable, the labels
and codes for all response categories, and other pertinent information about
the survey.
A Quick Tutorial:
Let’s say, for example, you want to analyze the dataset “CBS News Poll: Kennedy
Assassination [May 1998],” which is located in Elections, Political
Parties/Figures (Click on the survey located under Date Analysis Tool)
of Topics
at a Glance section of the Roper Center website.
Frequencies
The pull-down menus used to set up the analysis provide descriptive labels for all the questions in the survey. In the analysis of a poll, each question in the survey is referred to as a variable. The most basic form of dataset analysis produces a frequency distribution or topline for each variable. Toplines tell us what percentage of all the respondents gave each response to that particular question.
In the Kennedy survey, let’s look at the results of the question: “Do you think one man--Lee Harvey Oswald--was responsible for the assassination of President (John) Kennedy or do you think there were others involved?”
To run the frequency distribution using IDEAS:
- First choose the variable you want to analyze. Use the
pull-down menu across from the Row variable to select its
descriptive label (Think Oswald was responsible for JFK assassination.)
- Below the variables, you also have the option of
viewing different percentage totals ( In this case, click on Include
Column Percents).
- Click on “Weight tables” (a technique used to
statistically account for a variable’s relative importance –see below for
more details.)
- Then, click on “Run the table.” The frequency
distribution along with the complete wording for the question and citation
information for the survey will appear on your screen just like below.
IDEAS 1.2: Tables
CBS News Poll, May, 1998 - Kennedy Assassination
Nov 23, 2005 (Wed 11:11 AM Eastern Standard Time) |
Variables |
Role |
Name |
Label |
Range |
MD |
Row |
R29 |
Think Oswald was responsible for JFK assassination |
1-9 |
|
Weight |
CasWgt |
Weight |
.168-5.962 |
|
|
Frequency Distribution |
Cells contain:
-N of cases |
Distribution |
R29 |
1: One man, Oswald |
82 |
2: Others involved |
608 |
9: Don't know/No answer |
133 |
COL TOTAL |
823 |
|
Text for 'R29'
Do you think one man--Lee Harvey Oswald--was responsible for the assassination of President (John) Kennedy or do you think there were others involved?
Allocation of cases (unweighted) |
Valid cases |
823 |
Total cases |
823 |
|
Bivariate analysis –“Crosstabulations”
A second level of dataset analysis available
in IDEAS is bivariate analysis. In this case, two variables are crosstabulated
against one another. Cross tabs can be run for any variables in the survey
you’d like to correlate. Independent variables are placed in columns.
Dependent variables are placed in rows. A typical two-way crosstab looks
at how responses to a survey question differ by demographic group. For
instance, to use IDEAS to find out how “gender” might have influenced opinions
on the Kennedy assassination question do the following:
- First, select the descriptive label for that question
in the Row pull-down menu, just as you did when running the frequency
distribution.
- Then, use the pull-down menu next to Column to
select the “Gender” variable.
- Click on “Weight tables.”
- Then, click on “Run the table.” The results for
the bivariate crosstab, along with the complete wording for both questions
will appear on your screen just like the table below.
IDEAS 1.2: Tables
CBS News Poll, May, 1998 - Kennedy Assassination |
Nov 23, 2005 (Wed 11:13 AM Eastern Standard Time) |
Variables |
Role |
Name |
Label |
Range |
MD |
Row |
R29 |
Think Oswald was responsible for JFK assassination |
1-9 |
|
Column |
Gender |
Gender |
1-2 |
|
Weight |
CasWgt |
Weight |
.168-5.962 |
|
|
Frequency
Distribution |
Cells contain:
-N of cases |
Gender |
1
Male |
2
Female |
ROW
TOTAL |
R29 |
1: One man, Oswald |
42 |
40 |
82 |
2: Others involved |
289 |
318 |
608 |
9: Don't know/No answer |
57 |
76 |
133 |
COL TOTAL |
388 |
435 |
823 |
|
Color coding: |
<-2.0 |
<-1.0 |
<0.0 |
>0.0 |
>1.0 |
>2.0 |
T |
N in each cell: |
Smaller than
expected |
Larger than
expected |
|
|
Text for 'R29'
Do you think one man--Lee Harvey Oswald--was responsible for the assassination of President (John) Kennedy or do you think there were others involved?
Text for 'Gender'
Respondent gender |
|
Allocation of cases (unweighted) |
Valid cases |
823 |
Total cases |
823 |
|
CSM, UC Berkeley
To
understand the color coding system, read
below.
Control variables – Three-way Crosstabs
Provided
a survey sample is large enough to yield a significant result, it is also
possible in IDEAS to run a three-way cross tabulation by including a control
variable in your analysis. Let’s say, for example, you want to find out how
the results of the previous bivariate analysis differ by “Education.”
- First, select the descriptive label for assassination question in the Row pull-down menu,
just as you did when running the frequency distribution.
- Then, use the pull-down menu next to Column to select the “Gender” variable.
- Click on “Weight tables.”
- To control for “Education,”
consult the codebook (Click on “View Codebook” at the top of the screen)
to get the mnemonic variable name from the Standard Variable List. The
mnemonic name is a short, often non-descriptive identifying label that was
assigned to each question in the course of creating the dataset. In this case,
it’s “Educ.” Type “Educ” in the Control box.
- Then, click on “Run
the table.” The results for the three-way cross tab will appear on your
screen just like the table below.
IDEAS 1.2: Tables
CBS News Poll, May, 1998 - Kennedy Assassination |
Nov 23, 2005 (Wed 11:20 AM Eastern Standard Time) |
Variables |
Role |
Name |
Label |
Range |
MD |
Row |
R29 |
Think Oswald was responsible for JFK assassination |
1-9 |
|
Column |
Gender |
Gender |
1-2 |
|
Control |
Educ |
Education |
1-9 |
|
Weight |
CasWgt |
Weight |
.168-5.962 |
|
|
Statistics for
Educ = 1(LT HS grad) |
Cells contain:
-N of cases |
Gender |
1
Male |
2
Female |
ROW
TOTAL |
R29 |
1: One man, Oswald |
1 |
4 |
5 |
2: Others involved |
28 |
36 |
65 |
9: Don't know/No answer |
5 |
13 |
18 |
COL TOTAL |
34 |
53 |
88 |
|
Color coding: |
<-2.0 |
<-1.0 |
<0.0 |
>0.0 |
>1.0 |
>2.0 |
T |
N in each cell: |
Smaller than
expected |
Larger than
expected |
|
|
| Statistics for
Educ = 2(HS graduate) |
Cells contain:
-N of cases |
Gender |
1
Male |
2
Female |
ROW
TOTAL |
R29 |
1: One man, Oswald |
13 |
16 |
29 |
2: Others involved |
133 |
137 |
270 |
9: Don't know/No answer |
18 |
34 |
52 |
COL TOTAL |
164 |
187 |
351 |
|
Color coding: |
<-2.0 |
<-1.0 |
<0.0 |
>0.0 |
>1.0 |
>2.0 |
T |
N in each cell: |
Smaller than
expected |
Larger than
expected |
|
| |
|
Statistics for
Educ = 3(Some college/trade/business) |
Cells contain:
-N of cases |
Gender |
| Male
|
2
Female |
ROW
TOTAL |
R29 |
1: One man, Oswald |
11 |
9 |
21 |
2: Others involved |
67 |
89 |
156 |
9: Don't know/No answer |
17 |
17 |
34 |
COL TOTAL |
96 |
115 |
210 |
|
Color coding: |
<-2.0 |
<-1.0 |
<0.0 |
>0.0 |
>1.0 |
>2.0 |
T |
N in each cell: |
Smaller than
expected |
Larger than
expected |
|
|
Statistics for
Educ = 4(College graduate) |
Cells contain:
-N of cases |
Gender |
1
Male |
2
Female |
ROW
TOTAL |
R29 |
1: One man, Oswald |
10 |
7 |
17 |
2: Others involved |
37 |
35 |
72 |
9: Don't know/No answer |
10 |
8 |
18 |
COL TOTAL |
56 |
51 |
107 |
|
Color coding: |
<-2.0 |
<-1.0 |
<0.0 |
>0.0 |
>1.0 |
>2.0 |
T |
N in each cell: |
Smaller than
expected |
Larger than
expected |
|
| |
|
Statistics for
Educ = 5(Post-graduate) |
Cells contain:
-N of cases |
Gender |
1
Male |
2
Female |
ROW
TOTAL |
R29 |
1: One man, Oswald |
7 |
4 |
11 |
2: Others involved |
20 |
20 |
40 |
9: Don't know/No answer |
6 |
4 |
10 |
COL TOTAL |
32 |
28 |
60 |
|
Color coding: |
<-2.0 |
<-1.0 |
<0.0 |
>0.0 |
>1.0 |
>2.0 |
T |
N in each cell: |
Smaller than
expected |
Larger than
expected |
|
|
Statistics for
Educ = 9(Refused) |
Cells contain:
-N of cases |
Gender |
1
Male |
2
Female |
ROW
TOTAL |
R29 |
2: Others involved |
4 |
1 |
5 |
9: Don't know/No answer |
1 |
0 |
1 |
COL TOTAL |
6 |
1 |
7 |
|
Color coding: |
<-2.0 |
<-1.0 |
<0.0 |
>0.0 |
>1.0 |
>2.0 |
T |
N in each cell: |
Smaller than
expected |
Larger than
expected |
|
| |
|
Statistics for all
valid cases |
Cells contain:
-N of cases |
Gender |
1
Male |
2
Female |
ROW
TOTAL |
R29 |
1: One man, Oswald |
42 |
40 |
82 |
2: Others involved |
289 |
318 |
608 |
9: Don't know/No answer |
57 |
76 |
133 |
COL TOTAL |
388 |
435 |
823 |
|
Color coding: |
<-2.0 |
<-1.0 |
<0.0 |
>0.0 |
>1.0 |
>2.0 |
T |
N in each cell: |
Smaller than
expected |
Larger than
expected |
|
|
Text for 'R29'
Do you think one man--Lee Harvey Oswald--was responsible for the assassination of President (John) Kennedy or do you think there were others involved?
Text for 'Gender'
Respondent gender
Text for 'Educ'
What was the last grade in school you completed? |
|
Allocation of cases (unweighted) |
Valid cases |
823 |
Total cases |
823 |
|
CSM, UC Berkeley
Filter(s)
Sometimes,
a researcher might need to study a specific group of respondents in the
survey. IDEAS’s filter function allows you to specify which respondents you
wish to include in your analysis. The subsample, or subset, you select might
be respondents in a particular demographic group, such as women or African
Americans. A subsample could also consist of individuals who responded in a
particular way to one of the questions in the survey.
For
instance, let’s say you’re interested in knowing more about the respondents in
the Kennedy survey. You’re interested in finding out about the men in the
survey, where they live, and their political ideology.
- First, choose Political ideology [R50] in the Row pull down option box.
- In the Column box, select Census Region.
- In the Selection Filter box, type the mnemonic “Gender
(1).” Why? Because Men were coded as “1” in the survey. You’ll
be filtering out the Women (2) in this case.
- Check the box “Include Column Percents.”
- Check the “Weight tables” box.
- Run the table.
Your table should look like this:
IDEAS 1.2: Tables
CBS News Poll, May, 1998 - Kennedy Assassination |
Nov 23, 2005 (Wed 11:53 AM Eastern Standard Time) |
Variables |
Role |
Name |
Label |
Range |
MD |
Row |
R50 |
Political ideology |
1-9 |
|
Column |
Region |
Census Region |
1-4 |
|
Weight |
CasWgt |
Weight |
.168-5.962 |
|
Filter |
Gender(1) |
Gender(=Male) |
1-2 |
|
|
Frequency
Distribution |
Cells contain:
-Column percent
-N of cases |
Region |
1
East |
2
North Central |
3
South |
4
West |
ROW
TOTAL |
R50 |
1: Liberal |
25.1
24 |
11.5
10 |
26.6
34 |
21.6
17 |
21.8
85 |
2: Moderate |
35.1
33 |
51.1
45 |
41.7
54 |
35.4
28 |
41.0
159 |
3: Conservative |
31.7
30 |
28.6
25 |
30.7
39 |
38.6
30 |
32.1
125 |
9: Don't know/No answer |
8.1
8 |
8.8
8 |
.9
1 |
4.4
3 |
5.1
20 |
COL TOTAL |
100.0
95 |
100.0
87 |
100.0
128 |
100.0
78 |
100.0
388 |
|
Color coding: |
<-2.0 |
<-1.0 |
<0.0 |
>0.0 |
>1.0 |
>2.0 |
T |
N in each cell: |
Smaller than
expected |
Larger than
expected |
|
|
Text for 'R50'
How would you describe your views on most political matters? Generally, do you think of yourself as liberal, moderate, or conservative?
Text for 'Gender'
Respondent gender |
|
Allocation of cases (unweighted) |
Valid cases |
353 |
Cases excluded by filter or weight |
470 |
Total cases |
823 |
|
CSM, UC Berkeley
Weighting
Survey
firms apply a technique called weighting to adjust the poll results to
account for possible sample biases caused by specific groups of individuals not
responding to a survey. The weighting mechanism uses known estimates of the
total population provided by the US Census Bureau to adjust the final results.
It's not uncommon to weight data by, for instance, age, gender, education, or
race in order to achieve the correct demographic proportions. In IDEAS, you
can look at and analyze both weighted and unweighted survey results by checking
or unchecking the box labeled Weight Tables. The default setting is for
weighted results.
Color coding
In the analysis of survey data, the smaller a sample is, the
less likely it is to be representative of the population as a whole; and,
therefore, the less reliable the survey results will be based on that sample.
A crosstabulation divides a survey sample, in effect, into a number of smaller
samples, each of which occupies a cell in the table produced by the crosstab.
These numbers can become quite low if the questions included in the crosstabs
have a lot of response categories, or if the demographic variables include more
than two or three groups. In IDEAS, color coding is used to indicate the
relative reliability of the data appearing in each cell of a table.
Descriptive statistics
Some
users of IDEAS will wish to have more detailed information for their
frequencies and crosstabs than the program’s default setting provides. To
obtain a summary of descriptive statistics for an analysis, simply check off
the Descriptive Statistics box before running the table. Examples of
descriptive statistics include: Mean and Standard Deviation. Summary
statistics provide the statistical significance of the relationship. Let’s
find out the descriptive statistics for the relationship between “Gender” and “Think Oswald was responsible for JFK’s assassination”:
- In the Row box, choose “Think Oswald was
responsible for JFK’s assassination”
- In the Column box, choose “Gender.”
- In Options, click on: “Include Column Percents”
and “Include Descriptive Statistics.”
- Check “Weight Tables.
- Click on Run the table.
The table should look like this:
IDEAS 1.2: Tables
CBS News Poll, May, 1998 - Kennedy Assassination |
Nov 23, 2005 (Wed 12:18 PM Eastern Standard Time) |
Variables |
Role |
Name |
Label |
Range |
MD |
Row |
R29 |
Think Oswald was responsible for JFK assassination |
1-9 |
|
Column |
Gender |
Gender |
1-2 |
|
Weight |
CasWgt |
Weight |
.168-5.962 |
|
|
Frequency
Distribution |
Cells contain:
-Column percent
-N of cases |
Gender |
1
Male |
2
Female |
ROW
TOTAL |
R29 |
1: One man, Oswald |
10.8
42 |
9.3
40 |
10.0
82 |
2: Others involved |
74.4
289 |
73.3
318 |
73.8
608 |
9: Don't know/No answer |
14.7
57 |
17.4
76 |
16.2
133 |
COL TOTAL |
100.0
388 |
100.0
435 |
100.0
823 |
Means |
2.92 |
3.13 |
3.03 |
Std Devs |
2.55 |
2.72 |
2.64 |
Unweighted N |
353 |
470 |
823 |
|
Color coding: |
<-2.0 |
<-1.0 |
<0.0 |
>0.0 |
>1.0 |
>2.0 |
T |
N in each cell: |
Smaller than
expected |
Larger than
expected |
|
|
Summary Statistics |
Eta* = |
.04 |
|
Gamma = |
.09 |
|
Chisq(P) = |
1.47 |
(p= 0.48) |
R = |
.04 |
|
Tau-b = |
.04 |
|
Chisq(LR) = |
1.47 |
(p= 0.48) |
Somers' d* = |
.04 |
|
Tau-c = |
.04 |
|
df = |
2 |
|
*Row variable treated as the dependent variable. |
|
|
|
Text for 'R29'
Do you think one man--Lee Harvey Oswald--was responsible for the assassination of President (John) Kennedy or do you think there were others involved?
Text for 'Gender'
Respondent gender |
|
Allocation of cases (unweighted) |
Valid cases |
823 |
Total cases |
823 |
|
CSM, UC Berkeley
(Download Adobe PDF Version)
|