marcusmarcusrc: (Default)
marcusmarcusrc ([personal profile] marcusmarcusrc) wrote2006-10-26 04:25 pm

Citing Sources Properly


I am slowly growing to loathe people who do not properly cite their sources. Well, not really loathe. I do not really loathe anyone, not even certain political figures. But I decided that for a paragraph of the chapter I'm working on that it would be good to state where the pollution in the city I was discussing fell in terms of other cities throughout the world. I thought that the "Most polluted cities in the world" list would be easy to find. Well, I found 20 sources stating that China had 16 of the top 20 most polluted cities in the world, including the Economist, CNN, the BBC website, and even Congressional Testimony. They all cited a World Bank study, but gave no specifics. Well, after significant searching, I found two databases at the World Bank:

Database 1 was a group of 106 "selected cities", of which 13 out of 20 of the most polluted were in China (note: 13, not 16, but close enough. And the top 4 matched, too). Database 1 cites Database 2 as the source of its data.

Database 2 is a much more complete list, with some 3000 cities. No matter what population cutoff I used, China had nowhere near the majority of the top 20. At a cutoff of 1 million, which seemed a reasonable number, China had 3 out of the top 20.

Of course, Database 1 and Database 2 don't actually exactly agree: for one thing, Database 1 seems to use metropolitan area population where Database 2 uses city population. And the PM concentration in Database 1 for a given city was close to, but not exactly, equal to Database 2. And Database 2 has an attached paper which is "forthcoming in 2006" but not yet available.

So frustration all around. But really, I'm sort of miffed at the Economist and BBC for not looking up their original source properly. (In fact, all but the BBC used the exact same wording, leading me to believe that one person made up a quotation based on this subsample database, and then everyone else copied him)

(For the curious: the most polluted cities in Database 1 are Delhi, Cairo, Calcutta, and Tianjin. In Databsae 2, several cities are more polluted than Delhi, including Baghdad - pre-Iraq war - and Karachi)


The moral of the story: whenever you write something, please give a proper citation to the data source you use, and you will make graduate students the world over happier people!

[identity profile] miraclaire.livejournal.com 2006-10-26 10:16 pm (UTC)(link)
in a class I'm taking next semester, there is a "follow the footnote" project where you pick your favorite footnote in one of the readings and follow it as far as you can and talk in the paper about how the information changed.

[identity profile] marcusmarcusrc.livejournal.com 2006-10-26 11:53 pm (UTC)(link)
Telephone footnote-ary!

[identity profile] rifmeister.livejournal.com 2006-10-26 11:45 pm (UTC)(link)
Oh, don't get me started. In machine learning, you can give a proper pointer to the data, you can release your source code, and people still can't reproduce your experiments. Who knows why? Maybe you just made it all up in the first place. Maybe extra dimensions and demons are involved.

[identity profile] marcusmarcusrc.livejournal.com 2006-10-26 11:52 pm (UTC)(link)
Heh. One of the advantages I saw in moving from benchtop chemistry to computational modeling was reproducibility. Because everyone knows that reproducing a published chemistry reaction always takes tweaking and work... but I thought that with code, it should be straightforward, right?

Hah! I should have known better. (Though I think that for the most part, computer stuff _is_ much more reproducible than chemistry was, just... not as 100% reproducible as I would really like it to be)

[identity profile] nuclearpolymer.livejournal.com 2006-10-27 12:40 pm (UTC)(link)
I've been trying to track down the second Easter Island moai statue in America. I keep seeing references to how the one in the Smithsonian is "one of the only two" in America, but can't seem to find any reference to where the other one is. I wonder if the whole thing is just a strange rumor that got institutionalized.

[identity profile] astra-nomer.livejournal.com 2006-10-27 02:03 pm (UTC)(link)
There's apparently one in the lobby of the building here at work. I walk by it nearly every day, but never looked at it really closely...

[identity profile] astra-nomer.livejournal.com 2006-10-27 02:08 pm (UTC)(link)
Journalists are too lazy to use primary sources. No doubt somebody wrote up a press release, and that formed the basis for all those articles you're looking at.

Whenever I read a news article about some astronomical discovery, half the time I can't even figure out what the big news is because it's so poorly written and under-researched. Better to look up the paper myself, which is never cited in the article, of course - gotta go look for it on my own.

[identity profile] marcusmarcusrc.livejournal.com 2006-10-27 02:50 pm (UTC)(link)
Yeah. It worries me when I find fundamental journalistic errors in my field, because then I wonder about all the other fields where I _don't_ have the basic knowledge to tell if a statement is reasonable or not.

[identity profile] kirisutogomen.livejournal.com 2006-10-27 07:01 pm (UTC)(link)
You would be absolutely right to wonder. You might think that a magazine called The Economist would have accurate reporting on economics. Well, it's better than the reporting in a lot of other major news outlets, but it's still woefully poor.

Reproducability...

[identity profile] mjperson.livejournal.com 2006-10-27 02:34 pm (UTC)(link)
I had a fun experience trying to reproduce someone else's solar system integrations. Every time I did it, I got exactly the opposite of what they predicted. SoI started iteratively converging on their solution by being a pest.

First I asked for their initial conditions. Then I asked for their actual initial conditions, as in the actual file. Then I asked for their integration parameters, followed by asking for their actual integration parameters, as in their actual file. Then I gave up as asked for their code. Then I asked for their actual executable, as in the exact file they executed...

At EVERY single step of the path, the stuff they said, and the stuff in the files were different. In the end, yeah, I reproduced their results, but there's little tying what "was done to that stuff they talked about in the paper.

Re: Reproducability...

[identity profile] marcusmarcusrc.livejournal.com 2006-10-27 02:38 pm (UTC)(link)
Ouch! That's not good...