Managing weights within the WVS

The Values Survey’s dataset gets bigger over time as new waves are added with new or repeating countries. As a result, researchers get not only a source of information for comparing societies with increasing value, but also a sophisticated tool that requires some mastering.

The WVS archive has to deal every day with many questions from researchers around the world. Most of these questions deal with weighting. Which variable should be used? Why are there so many? How are they calculated within countries? 

The purpose of this paper is to provide answers to all these questions and introduce readers to some new concepts such as the population balanced weight. We will use syntax examples to illustrate the procedures .

Working with weighted samples of the Values Studies

The longitudinal aggregate of the World Values Survey, that can be downloaded from the WVS site, includes six different weighting variables. With such a number, one immediately wonders which one to use.

The existing variables are:

S017 – S017A

Original country weight

S018 – S018A

1000-equilibrated weight

S019 – S019A

1500-equilibrated weight

Original country weight

S017 is the original weight provided by each participant country. The purpose of S017 is to compensate for small deviations in the resulting sample with respect to one or several dimensions considered important to get reliable results. These dimensions can be the sex-age distribution, the rural-urban distribution, or even the respondent education’s distribution. The weighting method is a decision of each participant country. In general, it is not good practice to use weighting to compensate important deviations from the target figures.

Whatever the criteria chosen, the procedure to compute weighting factors is similar. Usually, a matrix is defined with the estimated proportion of each combination of categories that the sample should present. This estimation may come from Census, country statistics and so on. This is the target distribution matrix. Then, the actualdistribution of each combination of categories is calculated for the fielded sample. The weight is, by definition, the matrix of factors that should be multiplied by the fielded sample matrix to get the target distribution matrix.

An example illustrates this for a simple Sex-Age weighting.

  • First, we write down the sex-age distribution according to census (for example):

Target sample

Male

Female

18-29

11,8%

11,3%

30-49

18,5%

18,3%

50-64

9,5%

9,9%

65 and more

8,8%

12,0%

  • Second, we write down the sex-age distribution of the actual sample, after field-work:

Actual sample

Male

Female

18-29

10,1%

9,8%

30-49

20,7%

19,3%

50-64

9,9%

10,4%

65 and more

8,8%

11,0%

  • And, third, we calculate the weighting matrix to be applied in order to correct the sample:

Weight factors

Male

Female

18-29

1.1710

1.1491

30-49

0.8902

0.9482

50-64

0.9559

0.9570

65 and more

0.9940

1.0927

In this example, the value 1.1710 obtained for (18-29,Male) is the result of dividing 11.8% (the real percentage of Males of 18-29 years) by 10.1%, the actual percentage, as calculated on the data file. So each respondent record corresponding to males aged 18 to 29 should count as 1.1710 instead of 1 case.

We will not prove it here because it is outside the scope of the chapter, but the above method preserves the sample size. This means that if the file had an N of 1200 cases, then using the weighting factors calculated above will still give an N of 1200 cases.

The WVS site (www.worldvaluessurvey.org) includes the Technical Informationand Documentation of Data special sections with methodology information at the country sample level. You may find there the specific criteria followed for calculating the weight, if any.

Original country weight 1000-balanced and 1500-balanced

S018 and S019 are both weighting factors derived from S017 whose goal is to transform the sample’s N to 1000 or 1500.  Making all sample’s N equal may serve several purposes.

The first one is to make all samples count the same in a combined analysis. In WVS2005 India’s sample is 2001 while US’s sample is 1249. Using S017 as a weight in a combined analysis India would count more than the US because their total N is bigger. Using S018, which is proportional to S017, both samples would show 1000 cases (and 1500 if using S019).

The second reason for creating S018 is that it helps building a population scaled weight. A population scaled weight is a one that gives an N for each country equal to the population size of the region covered by the sample. With such a weight, India’s N would now be 1,165,720,000 and US’s N would be 306,790,000 (using 2009’s projections of their respective populations). In this way, US and India would count differently in a combined analysis, but at least there would be a criteria for this difference in weight.

In general, S018 can be used to compute any special weight that ranks countries in any quantified dimension (economic, cultural, and so on).

The formula for such a weight is simple:

Wcountry=S018/1000 x Population country

Note that for most statistical analysis the influence of the country Ns will not matter much. Also, when comparing relative figures by country there is no impact at all of the country N, since relative figures will not change at the country level.

Building a population based weight for the five-wave dataset

The five-wave aggregate that can be downloaded from the World Values Survey site doesn’t include any population oriented weight. It’s up to the researcher to build it.

In order to help users, here is the syntax, in SPSS style for building such a variable.

COMPUTE POPWEIGHT=0.

FORMAT POPWEIGHT (F15.5).

IF (S003=8) POPWEIGHT=S018/1000*3639459. /*Albania*/

IF (S003=12) POPWEIGHT=S018/1000*33769669. /*Algeria*/

IF (S003=20) POPWEIGHT=S018/1000*887000. /*Andorra*/

IF (S003=31) POPWEIGHT=S018/1000*8238672. /*Azerbaijan*/

IF (S003=32) POPWEIGHT=S018/1000*40135000. /*Argentina*/

IF (S003=36) POPWEIGHT=S018/1000*21831000. /*Australia*/

IF (S003=40) POPWEIGHT=S018/1000*8356707. /*Austria*/

IF (S003=50) POPWEIGHT=S018/1000*150448340. /*Bangladesh*/

IF (S003=51) POPWEIGHT=S018/1000*3230100. /*Armenia*/

IF (S003=56) POPWEIGHT=S018/1000*10741000. /*Belgium*/

IF (S003=70) POPWEIGHT=S018/1000*3981239. /*Bosnia and Herzegovina*/

IF (S003=76) POPWEIGHT=S018/1000*191403000. /* Brazil*/

IF (S003=100) POPWEIGHT=S018/1000*7602100. /* Bulgaria*/

IF (S003=112) POPWEIGHT=S018/1000*9671900. /*Belarus*/

IF (S003=124) POPWEIGHT=S018/1000*33698000. /*Canada*/

IF (S003=152) POPWEIGHT=S018/1000*16929000. /*Chile*/

IF (S003=156) POPWEIGHT=S018/1000*1331540000. /*China*/

IF (S003=158) POPWEIGHT=S018/1000*23027672. /*Taiwan*/

IF (S003=170) POPWEIGHT=S018/1000*44981000. /*Colombia*/

IF (S003=191) POPWEIGHT=S018/1000*4432000. /*Croatia*/

IF (S003=196) POPWEIGHT=S018/1000*8016. /*Cyprus*/

IF (S003=203) POPWEIGHT=S018/1000*10474600. /*Czech Republic*/

IF (S003=208) POPWEIGHT=S018/1000*5515287. /*Denmark*/

IF (S003=214) POPWEIGHT=S018/1000*9365818. /*Dominican Republic*/

IF (S003=222) POPWEIGHT=S018/1000*7185218. /*El Salvador*/

IF (S003=231) POPWEIGHT=S018/1000*79221000. /*Ethiopia*/

IF (S003=233) POPWEIGHT=S018/1000*1340341. /*Estonia*/

IF (S003=246) POPWEIGHT=S018/1000*5337719. /*Finland*/

IF (S003=250) POPWEIGHT=S018/1000*65073482. /*France*/

IF (S003=268) POPWEIGHT=S018/1000*4382100. /*Georgia*/

IF (S003=276) POPWEIGHT=S018/1000*82062200. /*Germany*/

IF (S003=288) POPWEIGHT=S018/1000*23416500. /*Ghana*/

IF (S003=300) POPWEIGHT=S018/1000*11262500. /*Greece*/

IF (S003=320) POPWEIGHT=S018/1000*13000000. /*Guatemala*/

IF (S003=344) POPWEIGHT=S018/1000*7008900. /*Hong Kong*/

IF (S003=348) POPWEIGHT=S018/1000*10029900. /*Hungary*/

IF (S003=352) POPWEIGHT=S018/1000*319326. /*Iceland*/

IF (S003=356) POPWEIGHT=S018/1000*1165720000. /*India*/

IF (S003=360) POPWEIGHT=S018/1000*230512000. /*Indonesia*/

IF (S003=364) POPWEIGHT=S018/1000*70495782. /*Iran*/

IF (S003=368) POPWEIGHT=S018/1000*31234000. /*Iraq*/

IF (S003=372) POPWEIGHT=S018/1000*4517800. /*Ireland*/

IF (S003=376) POPWEIGHT=S018/1000*7411500. /*Israel*/

IF (S003=380) POPWEIGHT=S018/1000*60090400. /*Italy*/

IF (S003=392) POPWEIGHT=S018/1000*127580000. /*Japan*/

IF (S003=400) POPWEIGHT=S018/1000*6198677. /*Jordan*/

IF (S003=410) POPWEIGHT=S018/1000*48379392. /*South Korea*/

IF (S003=417) POPWEIGHT=S018/1000*5356869. /*Kyrgyzstan*/

IF (S003=428) POPWEIGHT=S018/1000*2256400. /*Latvia*/

IF (S003=440) POPWEIGHT=S018/1000*3350400. /*Lithuania*/

IF (S003=442) POPWEIGHT=S018/1000*4917000. /*Luxembourg*/

IF (S003=458) POPWEIGHT=S018/1000*27730000. /*Malaysia*/

IF (S003=466) POPWEIGHT=S018/1000*12000000. /*Mali*/

IF (S003=470) POPWEIGHT=S018/1000*4126000. /*Malta*/

IF (S003=484) POPWEIGHT=S018/1000*109955400. /*Mexico*/

IF (S003=498) POPWEIGHT=S018/1000*3572700. /*Moldova*/

IF (S003=504) POPWEIGHT=S018/1000*31491578. /*Morocco*/

IF (S003=528) POPWEIGHT=S018/1000*16517532. /*Netherlands*/

IF (S003=554) POPWEIGHT=S018/1000*4314100. /*New Zealand*/

IF (S003=566) POPWEIGHT=S018/1000*140003542. /*Nigeria*/

IF (S003=578) POPWEIGHT=S018/1000*4825300. /*Norway*/

IF (S003=586) POPWEIGHT=S018/1000*166783500. /*Pakistan*/

IF (S003=604) POPWEIGHT=S018/1000*29180900. /*Peru*/

IF (S003=608) POPWEIGHT=S018/1000*90500000. /*Philippines*/

IF (S003=616) POPWEIGHT=S018/1000*38130300. /*Poland*/

IF (S003=620) POPWEIGHT=S018/1000*10631800. /*Portugal*/

IF (S003=630) POPWEIGHT=S018/1000*3916632. /*Puerto Rico*/

IF (S003=642) POPWEIGHT=S018/1000*21496700. /*Romania*/

IF (S003=643) POPWEIGHT=S018/1000*141837000. /*Russian Federation*/

IF (S003=646) POPWEIGHT=S018/1000*8648248. /*Rwanda*/

IF (S003=682) POPWEIGHT=S018/1000*27601038. /*Saudi Arabia*/

IF (S003=702) POPWEIGHT=S018/1000*4839400. /*Singapore*/

IF (S003=703) POPWEIGHT=S018/1000*5413548. /*Slovakia*/

IF (S003=704) POPWEIGHT=S018/1000*86116559. /*Vietnam*/

IF (S003=705) POPWEIGHT=S018/1000*2053355. /*Slovenia*/

IF (S003=710) POPWEIGHT=S018/1000*48697000. /*South Africa*/

IF (S003=716) POPWEIGHT=S018/1000*13349000. /*Zimbabwe*/

IF (S003=724) POPWEIGHT=S018/1000*45828172. /*Spain*/

IF (S003=752) POPWEIGHT=S018/1000*9276509. /*Sweden*/

IF (S003=756) POPWEIGHT=S018/1000*7725200. /*Switzerland*/

IF (S003=764) POPWEIGHT=S018/1000*63389730. /*Thailand*/

IF (S003=780) POPWEIGHT=S018/1000*1047366. /*Trinidad and Tobago*/

IF (S003=792) POPWEIGHT=S018/1000*71517100. /*Turkey*/

IF (S003=800) POPWEIGHT=S018/1000*29592700. /*Uganda*/

IF (S003=804) POPWEIGHT=S018/1000*46143700. /*Ukraine*/

IF (S003=807) POPWEIGHT=S018/1000*2048900. /*Macedonia*/

IF (S003=818) POPWEIGHT=S018/1000*76797512. /*Egypt*/

IF (S003=826) POPWEIGHT=S018/1000*59175000. /*Great Britain*/

IF (S003=834) POPWEIGHT=S018/1000*40213160. /*Tanzania*/

IF (S003=840) POPWEIGHT=S018/1000*306790000. /*United States*/

IF (S003=854) POPWEIGHT=S018/1000*13228000. /*Burkina Faso*/

IF (S003=858) POPWEIGHT=S018/1000*3477778. /*Uruguay*/

IF (S003=862) POPWEIGHT=S018/1000*28359313. /*Venezuela*/

IF (S003=891) POPWEIGHT=S018/1000*10832545. /*Serbia and Montenegro*/

IF (S003=894) POPWEIGHT=S018/1000*11862740. /*Zambia*/

IF (S003=900) POPWEIGHT=S018/1000*61131000. /*Germany West*/

IF (S003=909) POPWEIGHT=S018/1000*1800000. /*Northern Ireland*/

EXECUTE.

If you want to check the results, you have to run frequencies for variable S021, which is the (Country - wave - study - set – year) variable. This is a variable that has a different value for each different sample.

Not surprisingly, the above frequencies show that several samples exist for each country. This is normal because most of them have fielded surveys in the different waves, and even more than once in the same wave. This is the case for Spain or Turkey that have data for both WVS and EVS, or for Morocco, where two surveys were fielded in 2001 to measure the impact of the 11-S terrorist strike in New York.

When doing combined analysis using more than one sample per country, it is important to consider the impact of using such a weight. Corrections may be introduced to the mentioned syntax by dividing the weight by the number of times the country appears in the combined set to be analyzed. In general, this is left to the criteria of the researcher.