VII Jornadas de Usuarios de R

Los días 5 y 6 de noviembre de 2015 tendrán lugar en la Universidad de Salamanca las VII Jornadas de Usuarios de R. Organizan estas Jornadas la Comunidad R Hispano, la Universidad de Salamanca y el Parque Científico de esta misma Universidad. La asistencia es gratuita, aunque es necesario registrase previamente. Más información AQUI



Obtaining macroclimate data with R to model species’ distributions

In this post I show how to extract climate data from WorldClim for analysing species’ distributions along environmental gradients. I also show how to get «background» data, that is, a set of values of the predictor variable(s) for a random sample of locations taken from the full extent of a given region, and how to combine all data together (those where a species is present, and those corresponding to the background point locations) into a single dataframe which would be ready for modelling. As in previous posts, the target species is the dwarf palm (Chamaerops humilis) and the study area is the Iberian Peninsula. So I provide the R code and the necessary data values to create a mask for excluding all areas out of its bounds.

♣ ♣ ♣


WorldClim is a set of global climate layers, or climate grids, for global land areas excluding Antarctica. It was developed by Robert J. Hijmans and others (1), and allows us to download data for current climate conditions (relative to the 1950-2000 time period), future conditions, that is, climate projections from global circulation models (in particular, those of the IPPC5/CMIP5), and some past climate reconstructions (for example, those of the Last glacial maximum, around 21,000 years ago). For current conditions, where I am going to focus in this post, the available climate data include monthly precipitation and mean, minimum and maximum monthly temperature, plus 19 derived bioclimatic variables representing annual trends (e.g., mean annual temperature, annual precipitation), seasonality (e.g., annual range in temperature and precipitation), and extreme environmental factors that often impose limits to the distribution of organisms (for instance, minimum temperature of the coldest month, maximum temperature of the warmest month). In all these cases, the data can be downloaded at several spatial resolutions: 30 arc-seconds (about 1 km), 2.5 arc-minutes, 5 arc-minutes and 10 arc-minutes, and are in the latitude/longitude coordinate reference system (not projected); the datum is WGS84.

To get these data and read them with R is easy. If, for example, we download the bioclimatic variables at the resolution of 2.5 arc-minutes, and are interested in the minimum temperature of the coldest month (BIO6), first, we may create a RasterLayer object and plot the data:

tminCM <- raster("C:/.../WorldClim/Bio_2-5m/bio6.bil")


Minimum temperature of the coldest month


We may then use the function drawExtent() to visually determine the bounding box of our study area:

newext <- drawExtent()


tminCM.WM <- crop(tminCM, newext)


Or may focus on a given region by fixing the coordinates of the bounding box, for example, around the Iberian Peninsula:

newext <- c(-10, 4, 35, 45)
tminCM.IP <- crop(tminCM, newext)


Minimum temperature of the coldest month in the Iberian Peninsula



If we have species occurrence data, such as these displayed here for the dwarf palm (Chamaerops humilis) in the Iberian Peninsula (see also here), we may plot these data over the climate data:

chamaerops <-read.table("C:/.../Chamaerops_Anthos.txt", header=T, row.names=1)
points(chamaerops$lon, chamaerops$lat, col='blue', pch=20)


Dwarf palm presences in blue, over the values of minimum temperature of the coldest month in the Iberian Peninsula


We may also extract the values of the climatic variable(s) at the locations of the “points” where the species is found. With the function extract() from the «raster» package this is not difficult, but we must be aware that the extracted values may vary depending on the optional function arguments. Because in this case the coordinates of the points (latitude, longitude) correspond to the centroids of 100 km2 grid cells (see here), rather than obtaining the raw values of the raster (climate) cells where these points fall in, it seems preferable to get mean values from all those raster cells found in a radius of 5 km around each point location (if the data are not projected, i.e. if they are given as latitude/longitude, note that the unit should be metres).

coordinates(chamaerops)<-c("lon", "lat")
tminCM.IP.sim <- extract(tminCM.IP, chamaerops, method='simple', buffer=5000, fun=mean, df=TRUE) ## If df=TRUE, the results are returned as a 'dataframe'
head(tminCM.IP.sim) ## Temperature data are in °C × 10 to reduce the file sizes
  ID     bio6
1  1 57.40000
2  2 50.16667
3  3 57.60000
4  4 43.80000
5  5 47.75000
6  6 46.50000

Other options may be to extract interpolated values from the values of the four nearest raster cells by using the method='bilinear' (see below), the use of the function aggregate() before extracting the data (see the reference manual and the vignette of the «raster» package for more information on how to do this), or to download the climate data at a lower resolution (e.g. 5 arc-minutes).

tminCM.IP.bil<- extract(tminCM.IP, chamaerops, method='bilinear', buffer=NULL, fun=NULL, df=TRUE)
  ID     bio6
1  1 56.97496
2  2 49.14000
3  3 57.64173
4  4 45.44764
5  5 50.76502
6  6 47.14752

After extracting the data, whatever the method used to obtain them, we may then combine the temperature values with the coordinates of the points and save them with the function write.table().

cham.tminCM.dat<-cbind(chamaerops, tminCM.IP.bil)
       lon      lat ID     bio6
1 -0.90395 37.61351  1 56.97496
2 -0.89631 37.88372  2 49.14000
3 -0.78267 37.88164  3 57.64173
4 -1.00753 37.97578  4 45.44764
5 -1.00509 38.06585  5 50.76502
6 -1.00263 38.15592  6 47.14752
write.table(cham.tminCM.dat, file = "C:/.../Chamaerops_tminCM_coords.txt",
            quote = TRUE, sep = "\t", eol = "\n",
            na = "NA", dec = ",", row.names = TRUE,
            qmethod = c("escape", "double"))



Most modelling techniques used to investigate the distribution of species and their response to environmental gradients, as well as to define the shapes of species response curves, are ideally based on presence/absence data or performance measures that are recorded at sampled locations (2, 3, 4). Early studies using this approach were done by Mike Austin and colleagues (5, 6); more recent examples are those by Ole Vetaas (7), and Sonia Rabasa and others (8). Collection of these data is expensive, and may be difficult, if not impossible, at broad spatial scales. However, there are vast sets of occurrence data based on presence-only records from herbarium collections and/or systematic surveys without planned sampling schemes, that are increasingly accessible online (9, 10); it is the case of the dwarf palm data used in this blog.

Some algorithms operate using only presences (see for example here), but to be able to use most of the currently available modelling techniques, if absences were not recorded, it will be necessary to get some «background» data, that is, a set of values of the predictor variable(s) for a random sample of locations taken from the full extent of the study area (9, 11, 12). With R, there are several ways to obtain these point locations. One of them is with the function randomPoints() from the «dismo» package, as explained by Robert Hijmans and Jane Elith (12). Another option is by using the function spsample() from the «sp» package (it samples point locations within square areas, grids, polygons, or spatial lines, using regular or random sampling methods).

For the Iberian Peninsula, I was based on the worldHires database from the «mapdata» package to get a mask for excluding all areas out of this domain. But the map returned from this database cannot be used to get random points as it is because it is neither a Polygon nor a SpatialLines object. It is made of Lines, that is, sequences of points with a NA (datum Not Available) at the end of each line. Therefore, I converted these lines to vectors, plotted the vectors to identify them, and arranged the vectors in anticlockwise direction. I then linked all lines together, got a Polygon object with the function Polygon() from the «sp» package (the code and values obtained for constructing this polygon are available here), selected 500 random points and saved them.

IberiaPol = Polygon(cbind(longI,latI)) ## 'Polygon' object to exclude all areas out of the Iberian Peninsula
set.seed(3925) ## This is unnecessary to get random points. However, it is useful if we want to get exactly the same point locations more than just one time 
random.Spoints<-spsample(IberiaPol, 500, type="random")

iberia <-map("worldHires", regions=c("Spain", "Spain:Cabo de Palos", "Portugal", "Andorra"), exact=TRUE, interior=FALSE, col='blue')
plot(random.Spoints, add=TRUE)


             x        y
[1,] -3.331095 39.81229
Coordinate Reference System (CRS) arguments: NA

random.coords<-coordinates(random.Spoints) ## Returns a matrix with the spatial coordinates, where x and y are the longitude and latitude of the locations, respectively
             x        y
[1,] -3.331095 39.81229
[2,] -2.178605 39.14185
[3,] -2.691741 39.56952
[4,] -4.745309 42.74008
[5,] -1.802870 41.25185
[6,] -7.640102 38.37539

write.table(random.coords, file = "C:/.../Ibe_random_coords.txt",
            quote = TRUE, sep = "\t", eol = "\n",
            na = "NA", dec = ".", row.names = TRUE,
            qmethod = c("escape", "double")) 

Once obtained the background points, we are already able to extract the corresponding values of the predictor variable(s) as done above. However, it must be noted that the data should all have the same spatial resolution and projection (12). Thus, by considering that the original occurrence data of the dwarf palm were projected to the Military Grid Reference System (MGRS) and that their spatial resolution is of 10 km, I edited these coordinates to get new ones that are comparable to those of the dwarf palm presences (for explanations see here).

Iberia.background <-read.table("C:/.../Ibe_backgr_newcoords.txt", header=T, row.names=1)
chamaerops <-read.table("C:/.../Chamaerops_Anthos.txt", header=T, row.names=1)
iberia <-map("worldHires", regions=c("Spain", "Spain:Cabo de Palos", "Portugal", "Andorra"), exact=TRUE, interior=FALSE, col='linen', fil=1)
points(Iberia.background$lon, Iberia.background$lat, col='red2', pch=20, cex=0.9)
points(chamaerops$lon, chamaerops$lat, col='darkgreen')


Dwarf palm presences in green (open circles) and background point locations (red) for the Iberian Peninsula



When we use a grid-cell approach, we must be aware of the unrealistic assumption that both species and climate conditions are uniformly distributed in the grid cells (13). Accordingly, we also assume that at each location (grid cell) a species is either present or absent and, if so, all those background points that coincide with those others where the species is present should be removed (cf. 11, 12). So we may combine all data together into a single dataframe (a list of variables of the same number of rows with unique row names) in which the first column (variable chamhum) indicates whether the species is present in a given point location (designated by 1) or the data correspond to a background point (0). The combination of recorded presences and the background sample of locations has been referred to as presence-only data (11, cf. 10, 12). We may then remove all those background rows where the species has been observed, and after this we would be already able to analyse the relationships between the species distribution and the environmental variable(s).

IbeB.tminCM.dat <-cbind(Iberia.background, tminCM.IP.ran)
       lon      lat ID     bio6
1 -9.17210 38.53328  1 82.77216
2 -9.05780 39.07409  2 75.90320
3 -8.83061 37.36160  3 82.40478
4 -8.60285 37.72159  4 75.06693
5 -8.60237 37.81172  5 74.98047
6 -8.37516 37.81074  6 72.44032

chamhum <- c(rep(1, nrow(cham.tminCM.dat)), rep(0, nrow(IbeB.tminCM.dat)))
chamhum.tminCM.Tdata <- data.frame(cbind(chamhum, rbind(cham.tminCM.dat, IbeB.tminCM.dat)))
  chamhum      lon      lat ID     bio6
1       1 -0.90395 37.61351  1 56.97496
2       1 -0.89631 37.88372  2 49.14000
3       1 -0.78267 37.88164  3 57.64173
4       1 -1.00753 37.97578  4 45.44764
5       1 -1.00509 38.06585  5 50.76502
6       1 -1.00263 38.15592  6 47.14752

    chamhum     lon      lat  ID      bio6
767       0 1.85859 41.68111 470  46.42652
768       0 1.96712 42.40265 471 -18.98208
769       0 2.45393 42.31594 472   9.71472
770       0 2.57767 41.95621 473  29.13797
771       0 2.69876 41.86652 474  44.19068
772       0 3.06076 42.40728 475  36.10725



(1) Hijmans, R.J., Cameron, S.E. , Parra, J.L., Jones, P.G. & Jarvis, A. (2005) Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology, 25, 1965‒1978.

(2) Ferrer-Castán, D., Calvo, J.F., Esteve-Selma, M.A., Torres-Martínez, A. & Ramírez-Díaz, L. (1995) On the use of three performance measures for fitting species response curves. Journal of Vegetation Science, 6, 57‒62.

(3) Pearson, R.G. (2007) Species’ distribution modeling for conservation educators and practitioners. Synthesis. American Museum of Natural History.

(4) Hastie, T. & Fithian, W. (2013) Inference from presence-only data; the ongoing controversy. Ecography, 36, 864–867.

(5) Austin, M.P. & Cunningham, R.B. (1981) Observational analysis of environmental gradients. The Proceedings of the Ecological Society of Australia, 11, 109–119.

(6) Austin, M.P., Cunningham, R.B. & Fleming, P.M. (1984) New approaches to direct gradient analysis using environmental scalars and statistical curve-fitting procedures. Vegetatio, 55, 11-27.

(7) Vetaas, O.R. (2002) Realized and potential climate niches: a comparison of four Rhododendron tree speciesJournal of Biogeography, 29, 545–554.

(8) Rabasa, S.G., Granda, E., Benavides, R., Kunstler, G., Espelta, J.M., Ogaya, R.,  Peñuelas, J., Scherer-Lorenzen, M., Gil, W., Grodzki, W., Ambrozy, S., Bergh, J., Hódar, J.A., Zamora, R. & Valladares, F. (2013) Disparity in elevational shifts of European trees in response to recent climate warmingGlobal Change Biology19, 2490–2499.

(9) Elith, J., Graham, C.H., Anderson, R.P., Dudík, M., Ferrier, S., Guisan, A., Hijmans, R.J., Huettmann, F., Leathwick, J.R., Lehmann, A., Li, J., Lohmann, L.G., Loiselle, B.A., Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J. McC., Peterson, A.T., Phillips, S.J., Richardson, K.S., Scachetti-Pereira, R., Schapire, R.E., Soberón, J., Williams, S., Wisz, M.S. & Zimmermann, N.E. (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography, 29, 129–151.

(10) Elith J., Leathwick J.R. (2009) Species distribution models: ecological explanation and prediction across space and time. Annual Review of Ecology, Evolution, and Systematics, 40, 677–697.

(11) Ward, G. (2007) Statistics in ecological modeling; presence-only data and boosted mars. PhD Thesis. Stanford University, Palo Alto, CA.

(12) Hijmans, R.J. & Elith, J. (2013) Species distribution modeling with R.

(13) Vetaas, O.R. & Ferrer-Castán, D. (2008) Patterns of woody plant richness in the Iberian Peninsula: environmental range and spatial scaleJournal of Biogeography35, 1863-1878.

Species occurrence data: converting UTM/MGRS coordinates to geographic coordinates

In my previous post I showed how to import occurrence data of individual species from the Global Biodiversity Information Facility (GBIF), how to clean these data and how to map them with R. The target species was the dwarf palm (Chamaerops humilis). For the purpose of this new post, I downloaded the available data for this species from Anthos, converted these data (UTM/MGRS) to geodetic coordinates (longitude and latitude) using MSP GEOTRANS and obtained a new map.

♣ ♣ ♣

Many times, as in most of the references recorded by Anthos, a Spanish plants information system, the occurrence data of species are projected to the Universal Transverse Mercator (UTM) coordinate system and the derived Military Grid Reference System (MGRS). If so, it must be noted that these data do not refer to points on the surface of the Earth. They usually refer to grid cells of e.g. 10 km × 10 km (i.e. to surface areas), as it is case of the Anthos data. Thus, if the raw UTM/MGRS coordinates are converted to geodetic coordinates (longitude and latitude) and we represent them as “points”, there will be a deviation from the centroids of the grid cells to the botton left corner of the cells. Standard geographic information systems take this into account. In our case, to avoid such a deviation we may increase the resolution of the grid to e.g. 1 km and reference a central cell of the finer grid within the 100 km2 grid cells.


Grid cells of 100 km2 (black borders) and 1 km2 (orange borders, solid lines) projected to the Military Grid Reference System (MGRS). For identifying 100 km2 grid cells, in this case the green cell, the right part of the MGRS coordinate will include two numbers only (see further below) and, if this coordinate is converted to geodetic coordinates (longitude, latitude), the new coordinates will reference the botton left corner of the cell. To point at the centre of the green cell we may increase the resolution to e.g. 1 km by adding two additional numbers to the MGRS coordinate, and reference the botton left corner of the smaller central cell (blue square) within the 100 km2 grid cell

Thus, if our target species was observed at, for example, 30TXL43 (datum WGS84), and we replace this coordinate by 30TXH4535 (same datum), the new geographic coordinates (latitude and longitude) will coincide with those of the botton left corner of the smaller cell (1 km2) located in the centre of the 100 km2 grid square. [Note that the first part of the coordinate (30T) indicates the grid zone designation (all European MGRS grid zones are shown here), whereas the following two letters (XL) identify a 10,000 km2 grid square within the grid zone); if there are just two numbers in the right part of the coordinate, they reference a 100 km2 grid cell within the 10,000 km2 grid square (Easting value = 4, Northing value = 3); when there are four numbers they identify 1 km2 grid cells (Easting = 45, Northing = 35), and so on up to 1 m resolution; for explanations on  geodetic datums see here]


Grid for the Iberian Peninsula with cells of c. 50 km × 50 km (a grid often used in macroecological studies). Four of these cells form a 10,000 km2 MGRS grid square, which is labeled by two letters (these two letters generally appear in the botton left 2,500 km2 cell of the 10,000 km2 squares). To view the labels clearly, please click on the picture


So I downloaded the available data for the dwarf palm from Anthos (October 3, 2013), removed the duplicated observations as I did here, excluded two records (one of them located in Cantabria, northern Spain, because the species was introduced in a mine reclamation area out of its natural range, and the other located in the province of Alicante, Anthos code=2208837, because the point is found in the Mediterranean Sea), and homogenized the original coordinates using MGRS grid squares of 10 km × 10 km. Note that they should be verified by looking at the original sources (the references used to get the occurrence data are provided by Anthos). Then I increased the resolution of the MGRS coordinates up to 1 km as explained above, converted the new coordinates to geodetic coordinates (longitude and latitude) using the geographic translator MSP GEOTRANS (see also here) and, finally, plotted the map with R.

iberia <-map("worldHires", regions=c("Spain", "Spain:Cabo de Palos", "Portugal", "Andorra"), exact=TRUE)
points(chamaerops$lon, chamaerops$lat, col='red', pch=20)


Obtaining (macroecological) data with R: species occurrence data

In 1989, Brown & Maurer applied the term «macroecology» to the study of large-scale patterns of plants and animals (1). These studies are not new as they date from the European explorations of the Earth in the 18th and 19th centuries (2, 3). What is more recent is the statistical approach of macroecology for analyzing large amounts of data (1, 3, 4), which are compiled from sources like these mentioned here. Some of these sources are atlases about the distributions of species that bring together the findings of numerous local surveys. Others provide environmental data based on interpolations and extrapolations, especially at regional, continental or global scales. I will focus on the collection and preparation of species occurrence data in this post and in the next one [edited: October 13, 2013]; in another post I will show how to obtain climate data for species distribution models; later on, I will concentrate on species richness (i.e. the number of different species found in a given area).


Johann Reinhold Forster (1729-1798) did already observe the «latitudianal gradient» in species richness (5), and Alexander von Humboldt (1769-1859; this picture), among others, provided explanations for this gradient in terms of heat (6, p. 348)

♣ ♣ ♣

«Importing occurrence data into R is easy. But collecting, georeferencing, and cross-checking coordinate data is tedious», as stated by Hijmans & Elith (7). If we are concerned with the distribution of a species and its response to some environmental factors (i.e. ecological niche-modelling), we need to obtain lots of observations about that species. If our goal is to study the spatial variation in species richness of a given taxonomic group (e.g. vascular plants, vertebrates…), we will need observations for a number of species. In both cases, and if we want to use spatial statistical techniques, we should also get the geographic coordinates (i.e. longitude and latitude) of the locations where the observations were obtained.

Once we have obtained the data, a preliminary investigation before starting with any statistical analysis can be done by putting the data on a map. An alternative method is the use of simple scatter-plots (8). Both approaches can be useful for detecting trends in the data and errors as well.


1| Obtaining occurrence data of individual species

We may download occurrence data of individual species with R from the Global Biodiversity Information Facility (GBIF), and in the «dismo» package there is the function «gbif» that can be used for this. So we need to load several packages:

library(maptools) ## checks «rgeos» availability (if so, it will be the option used)

EDITED: First, we may need to call the world map included in «maptools»; then plot a map such as this:

plot(wrld_simpl, xlim=c(-10,5), ylim=c(34,46),
     col='light yellow')
box() ## restores the box around the map


Although this is a low-resolution map, at the moment it is enough to assure that the observations are, at least roughly, in the right location (7). Moreover, with this option we can use the «drawExtent» function to visually determine the bounding box of our study area and create an «extent» object:

eo <-drawExtent()


class       : Extent
xmin        : -10.17082
xmax        : 4.712419
ymin        : 34.16029
ymax        : 44.69502

Then we use the «gbif» function to download the data of our target species, for instance, the dwarf palm Chamaerops humilis; plot the data, and save them.

Chamaerops humilis

The dwarf palm (Chamaerops humilis) is the only palm species native to continental Europe. At present, it is found in the western Mediterranean Basin. (Photo obtained at Cabezo del Horno, eastern end of the Cartagena range, southeastern Spain)


palmito = gbif("chamaerops", "humilis", ext=eo, geo=T, download=T)
## if download = FALSE, the records will be shown but not downloaded
chamaerops humilis : 2913 occurrences found
points(palmito$lon, palmito$lat, col='red', pch=20, cex=0.9)
write.table(palmito, file = "C:/.../Palmito_dat_GBIF.txt",
            quote = TRUE, sep = "\t", eol = "\n",
            na = "NA", dec = ",", row.names = TRUE,
            qmethod = c("escape", "double"))



Just looking at the map, do we see any errors? The answer should be: —Yes, no doubt, at least because several points are located in middle of the Mediterranean Sea!


2| Data cleaning

Therefore, the next step should be to clean the data. But first let’s see how many rows and columns we have downloaded, and which variables have obtained.

[1] 2913   25
 [1] "species"               "continent"             "country"
 [4] "adm1"                  "adm2"                  "locality"
 [7] "lat"                   "lon"                   "coordUncertaintyM"
[10] "alt"                   "institution"           "collection"
[13] "catalogNumber"         "basisOfRecord"         "collector"
[16] "earliestDateCollected" "latestDateCollected"   "gbifNotes"
[19] "downloadDate"          "maxElevationM"         "minElevationM"
[22] "maxDepthM"             "minDepthM"             "ISO2"
[25] "cloc"

Now let’s have a look at the observations where latitude is around 35º

lat35 = subset(palmito, lat<=35.5)
      lat  lon                                                cloc
683  35.1 -5.1                  San Carlos del Tiradero, Ca, Spain
687  35.1 -5.1 Zahara de los Atunes, Faro del Camarinal, Ca, Spain
1950 35.1 -5.1                                   Tarifa, Ca, Spain
1960 35.1 -5.1                            Punta Palomas, Ca, Spain
1963 35.1 -5.1              Puerto del Bujeo, Algeciras, Ca, Spain
1973 35.1 -5.1       Cerca de Torre Guadiaro, San Roque, Ca, Spain
1986 35.1 -5.1      Sierra del Retín, Barbate de Franco, Ca, Spain
2881 35.1 -5.1      Sierra del Retín, Barbate de Franco, Ca, Spain
2887 35.1 -5.1                  Punta Camarinal, Tarifa, Ca, Spain

All these records are clearly wrong, so we may delete them or may try to correct the coordinates.

If, for instance, we look at the observations corresponding to Tarifa, we find this:

tarifa = subset(palmito, adm2=="Tarifa")
tarifa[1:5, c(4,7,8,25)] ## show some values only
    adm1      lat       lon                      cloc
15    Ca 36.06739 -5.712036 Ca, Tarifa, Spain, Europa
401   Ca 36.10276 -5.820712 Ca, Tarifa, Spain, Europa
430   Ca 36.19866 -5.708252 Ca, Tarifa, Spain, Europa
455   Ca 36.06741 -5.712070 Ca, Tarifa, Spain, Europa
528   Ca 36.10672 -5.800984 Ca, Tarifa, Spain, Europa

So it seems that there will be no problem if we omit all these records where latitude is 35.1

palmito_nlat35 <- subset(palmito, !(lat==35.1))
## the records are excluded but not deleted
 [1] 2904   25


In the same way, we may check all other doubtful points, and exclude all those observations that are in error. My suggestion would be to preserve the original GBIF file, create a new data file to remove wrong observations, and save the wrong records in another file for further checking (this may take a while). An example: I know that Chamaerops humilis is frequent in the province of Murcia (see, e.g. here). If we have a look at the GBIF data, and also look at the information provided by Anthos about this species (both distribution map and references list), it becomes evident that the GBIF data are incorrect. However, in this case, rather than omit these observations we should correct their coordinates.

murcia = subset(palmito, adm1=="Mu")
     adm1  lat  lon collection                                                                                                            cloc
680    Mu 37.1 -1.1     ANTHOS                                                                                  Sierra de Carrascoy, Mu, Spain
686    Mu 37.1 -0.1     ANTHOS                                       La Union. Fruticedas del Cabezo de la Galera y Cola de Caballo, Mu, Spain
696    Mu 37.1 -0.1     ANTHOS                                                                                          Cabo Tiñoso, Mu, Spain
703    Mu 37.1 -0.1     ANTHOS                                             Cartagena. Llano del Beal. Pastizales del Lalno del Beal, Mu, Spain
711    Mu 37.1 -0.1     ANTHOS                                                                           Cartagena. Isla del Ciervo, Mu, Spain
721    Mu 37.1 -1.1     ANTHOS                                                             Aguilas. Cabo Cope. Sabinar de Cabo Cope, Mu, Spain
723    Mu 37.1 -0.1     ANTHOS                                                                                            La Azohía, Mu, Spain
735    Mu 37.1 -0.1     ANTHOS                                                                           Cartagena. Isla del Ciervo, Mu, Spain
737    Mu 37.1 -0.1     ANTHOS Cartagena. P. R. Calblanque, Monte de las Cenizas y Pena del Aguila. Tomillar-fruticeda de Atamaria-, Mu, Spain
742    Mu 37.1 -1.1     ANTHOS                                                  Mazarron. Sierra de las Moreras. Solana de Bolnuevo, Mu, Spain
1941   Mu 37.1 -0.1     ANTHOS                                                            Cartagena. Galeras. Litosuelos de Galeras, Mu, Spain
1945   Mu 37.1 -0.1     ANTHOS                                                         Entre campo de golf de Los Belones y Portman, Mu, Spain
1951   Mu 37.1 -0.1     ANTHOS                                                          La Union. Las Lajas. La Cuesta de las Lajas, Mu, Spain
1958   Mu 37.1 -0.1     ANTHOS                                                         Entre La Unión y Portman, cerca de Cartagena, Mu, Spain
1983   Mu 37.1 -0.1     ANTHOS                                                                                            Cartagena, Mu, Spain
1993   Mu 37.1 -1.1     ANTHOS                                                                                   Barranco del Sordo, Mu, Spain
1997   Mu 37.1 -0.1     ANTHOS                                                                                              Portman, Mu, Spain
2866   Mu 37.1 -0.1     ANTHOS                                                Murcia. Algezares. Pastizales y roquedos de Los Lages, Mu, Spain
2893   Mu 37.1 -0.1     ANTHOS                                    Cartagena. Penas Blancas. Fruticedas y roquedos de Penas Blancas., Mu, Spain
2904   Mu 38.1 -0.1     ANTHOS                                                                                               Murcia, Mu, Spain
2913   Mu 37.1 -0.1     ANTHOS                                                   Cartagena/La Union. Sabinar de cipres de Cartagena, Mu, Spain


Another important thing is to remove all those records that are duplicated, i.e. with the same coordinates. For further reading on this and related issues see the vignette by Robert Hijmans & Jane Elith about «species distribution modeling with R» (7).


3| Mapping occurrence data

After removing most clear errors and duplicates (the modified GBIF data file is here), let’s see the result using a map with a bit more of resolution. Load some additional packages and…

palmito2 <-read.table("C:/.../Chamaerops_GBIF_modified.txt", dec=",", header=T, row.names=1)
## note that decimals are as "commas"
 [1] 2061   25
iberia <-map("worldHires", regions=c("Spain", "Spain:Cabo de Palos", "Portugal", "Andorra"), exact=TRUE)
points(palmito2$lon, palmito2$lat, col='red', pch=20, cex=0.9)



This new map looks more OK, isn’t it? Nevertheless, if compared to the map by Anthos (see here [edited: October 15, 2013]), there must still remain many errors, especially regarding the observations from southwestern Spain, and, in addition, that many points are missing. Furthermore, by comparing these two maps a question arises: If the Anthos database is available to the GBIF, why are there these differences?

In my studies (for references, see here), when atlases with geocoded data (i.e. projected to the Military Grid Reference System) for the entire Iberian Peninsula are no available, I initially examine general maps of distribution (i.e. range maps) of the individual species, and also consider the administrative provinces where they are found according to e.g. Flora Iberica. This examination gives an overview of the area covered by the species and is useful for assessing the data as they are obtained (species occurrences based on records) (9). Then I use several sources, i.e. databases compiled in journals, short notes and original papers (most of them are available here), local and regional chorological atlases… and, of course, also look at the databases by both Anthos and the GBIF. An additional point to be considered is that Anthos does not specify whether the plants occur naturally or whether they are cultivated, naturalized… (9), and the GBIF either. However, in most studies about species diversity the included species are native. So, in any case, the data obtained from these databases (not only the coordinates) must be assessed by using original sources. This is unavoidable, but there is no doubt that the data handling (collection, mapping, cross-checking, cleaning…) is much easier with R, the dismo package and other essential packages such as all these used for this post. These powerful tools are simply fantastic!



(1) Brown, J.H. & Maurer, B.A. (1989) Macroecology – the division of food and space among species on continents. Science, 243, 1145–1150.

(2) Ricklefs, R.E. (2004) A comprehensive framework for global patterns in biodiversity. Ecololgy Letters, 7, 1–15.

(3) Rahbek, C. (2005) The role of spatial scale and the perception of large-scale species-richness patterns. Ecology Letters, 8, 224–239.

(4) Brown, J.H. (1999) Macroecology: progress and prospect. Oikos, 87, 3–14.

(5) Briggs, J.C. & Humphries, C.J. (2004) Early classics. Foundations of Biogeography. Classic papers with commentaries (ed. by Lomolino, M.V., Sax, D.F. & Brown, J.H.), pp. 5–13. University of Chicago Press, Chicago.

(6) von Humboldt, A. (1849) Cosmos: A sketch of a physical description of the universe, vol 1. Harper and Brothers, New York. [Translated from the German by E.C. Otté in 1866]

(7) Hijmans, R.J. & Elith, J. (2013) Species distribution modeling with R.

(8) Ruggiero, A. & Hawkins, B.A. (2006) Mapping macroecology. Global Ecology and Biogeography, 15, 433–437.

(9) Vetaas, O.R. & Ferrer-Castán, D. (2008) Patterns of woody plant species richness in the Iberian Peninsula: environmental range and spatial scale. Journal of Biogeography, 35, 1863–1878.

¿Ecología espacial y macroecología? Pequeñas presentaciones

No confundamos ecología con ecologismo, ni ecología espacial con astroecología, aunque en ciertos aspectos los límites sean difusos. En macroecología importan los patrones y los procesos ecológicos a gran escala.

♣ ♣ ♣

Tras los saludos iniciales, el par de besos o, en su caso, el apretón de manos de una presentación, si me preguntan que a qué me dedico, normalmente respondo que trabajo en la Universidad y, por lo general, las reacciones que percibo son positivas. [Con todo, nunca imaginé que, tal como muestra el avance de resultados del Barómetro de Febrero de este año elaborado por el Centro de Investigaciones Sociológicas (CIS), el oficio de profesor universitario pudiera situarse entre los mejor valorados por la sociedad española]

Mareas blanca y verde

¿Han podido tener alguna influencia las mareas «blancas» y las «verdes» de estos últimos tiempos en la ciudadanía española para que las profesiones de médico y profesor universitario sean las más valoradas por ésta, de acuerdo con el Barómetro del pasado mes de febrero realizado por el CIS? ¿Podría ser la reciente valoración de estas dos profesiones por parte de los españoles una llamada de atención a los poderes que apuestan por modelos privados de Sanidad y Educación?


Sea como fuere, el caso es que si después de decir que trabajo en la Universidad agrego, o si directamente especifico, que me dedico al mundo de la ecología, lo siguiente que suelen decirme es más o menos esto:

—¿Ecologista? ¿Eres ecologista? La verdad es que lo de reciclar envases está muy bien. Yo lo hago. Reciclar, utilizar los servicios públicos de transporte, apagar las luces al salir de las habitaciones…

—Bueno, más que ecologista, soy ecóloga —es lo que respondo—, y para aclarar un poco las cosas, o tal vez lo contrario, añado: —en realidad, me dedico a estudiar la distribución de los seres vivos y los factores que determinan esa distribución con la idea, eso sí, de aportar mi pequeño granito de arena a la conservación de la biodiversidad.

Y es que resulta que, al menos en España, suele confundirse «ecología» con «ecologismo». La ecología es una disciplina científica —«la ciencia de los ecosistemas», que decía el profesor González Bernáldez (1), o, si se prefiere, «la biología de los ecosistemas», que es como la definía el profesor Ramón Margalef (2)—, mientras que el ecologismo es un movimiento social y político que defiende la protección del medio ambiente.

ErnstHaeckel«Ecología» proviene del griego «oikos» (casa o hábitat), raíz compartida con economía, y «logos» (tratado). Literalmente significa «ciencia del hábitat»

El término «ecología» fue introducido en 1886 por el científico alemán Ernst Haeckel (1834–1919)


¿Y por qué esta confusión entre ecología y ecologismo? Por una parte, esta confusión ha podido surgir de la desafortunada traducción del inglés que se hace en los medios de comunicación del término «ecologist». The ecologists son los ecólogos en castellano, mientras que the greens, los verdes, son los ecologistas. Me atrevería a decir que, en general, los ecólogos sentimos bastante simpatía por los movimientos ecologistas y en ocasiones hasta nos involucramos de lleno con ellos en muchas empresas que tienen que ver con la defensa de la naturaleza. Además, se puede ser a un mismo tiempo ecólogo y ecologista, aunque esto no siempre es así. De hecho, algunos ecólogos utilizan métodos de estudio destructivos que nada tienen que ver con la salvaguarda de sus objetos de investigación, de la misma manera que hay también pretendidos ecologistas que carecen de las más elementales nociones de ecología y que, por ejemplo, con las mejores de sus intenciones, realizan sueltas en el campo de plantas y animales exóticos —esto es, procedentes de regiones que nada tienen que ver con el lugar donde se llevan a cabo las sueltas—, ignorando completamente el daño que esas especies pueden llegar a causar en la flora y fauna autóctonas y en los ecosistemas. Por otra parte, también hay que reconocer que los límites entre ecología y ecologismo a veces son difusos, particularmente en el contexto de la biología de la conservación, una disciplina científica que tiene como principal objetivo «proporcionar principios y herramientas para preservar la diversidad biológica» (3).


La Unión Internacional para la Conservación de la Naturaleza (International Union for Conservation of Nature, IUCN) se fundó en 1948. Entre otras cosas, se ocupa de la elaboración de listas rojas de especies amenazadas.




La fundación del Fondo Mundial para la Naturaleza (World Wildlife Fund for Nature, WWF) tuvo lugar en 1961. Actualmente, es la mayor organización conservacionista de carácter internacional.



Pero volviendo a esa conversación recién iniciada, una vez aclarado, o no, que me dedico a la ecología y lo que ello significa, los comentarios que se suceden son de lo más variado. Destacaría dos ejemplos que podrían considerarse extremos:

—¡Uf! ¿Seres vivos? Bichos y plantas, la naturaleza… Lo siento pero soy alérgico al polen, a los gatos… Hasta que no asfalten el monte, conmigo que no cuenten…

—¡Ah! ¡Qué bonito! ¡Así que te dedicas a estudiar los seres vivos y a tratar de preservar la biodiversidad! A mí me encantan los documentales de la 2. En su día tuve perro y ahora… ahora tenemos una tortuga, un acuario…

Y si en lugar de decir simplemente que mi trabajo tiene que ver con la ecología añado que me dedico a la «ecología espacial», entonces las expresiones de sorpresa sí que son totales:

—¿¡Ecología espacial!? ¿¡No me irás a decir que andas metida en algún proyecto de la NASA y que te dedicas a explorar la vida en Marte o en el espacio sideral!?

—¡Más quisiera! —respondo yo. —No, lo mío es algo más prosaico y terrenal. En ecología espacial, de lo que se trata es de «estudiar las relaciones entre patrones y procesos ecológicos a través de un rango de escalas teniendo en consideración el espacio de una manera explícita». ¡Me refiero al espacio de tres dimensiones (3D), aquí en la Tierra, no al espacio sideral! Bueno, en realidad el espacio y el tiempo no se pueden separar, pero… No obstante, muchas veces sí que utilizamos imágenes y datos que proporcionan los satélites…

—Antes te referías al continuo espacio-tiempo, ¿no? Interesante… —Algo así es lo que se suele añadir a continuación, pero las caras de desilusión que muchas veces observo son ya bastante manifiestas. ¡Qué le vamos a hacer!

La astrobiología también llamada exobiología, es una rama de la biología que se ocupa de estudiar la posible existencia de vida en otros lugares del Universo, excluyendo la Tierra. Del mismo modo, podríamos hablar también de una astroecología o exoecología que no debemos confundir con la ecología espacial, muy relacionada con la ecología del paisaje. En la exploración del planeta rojo que está llevando a cabo el robot Curiosity se ha llegado a encontrar agua pero, de momento, ni rastro de compuestos orgánicos. 

Es posible que haya otros planetas rebosantes de vida, ¿por qué no? Pero de una cosa sí que podemos estar seguros, y es que por mucho que busquemos planetas idénticos al nuestro no encontraremos ninguno. Por eso, llegados a este punto y ya que estamos hablando del espacio y del tiempo, quizá no esté de más recordar lo que nos decía el astrónomo Carl Sagan en el episodio titulado «Viajes a través del espacio y el tiempo» (en inglés, Journeys in Space and Time) de su mítica serie documental «Cosmos: Un Viaje Personal» (Cosmos: A Personal Voyage). Han pasado más de 30 años desde entonces, pero el mensaje sigue plenamente vigente.


Otras veces opto por decir que me dedico a la macroecología, y entonces…

—¿Macroecología? ¿Te dedicas al reciclado a lo bestia? ¿Al reciclado a gran escala?

Preguntas como estas últimas me llevan a lo que contaba al principio de este «post» acerca de la ecología sin más, y a dar una parte de razón a quien formula estas preguntas porque, efectivamente, en macroecología lo que interesa son los «patrones y los procesos que tienen lugar a gran escala». El concepto de escala es verdaderamente interesante, pero mejor lo dejamos para otro día.

The Earth

La Tierra, el único planeta habitable y habitado que por ahora conocemos, vista a gran escala.


Tras estas pequeñas presentaciones sí querría añadir que el desconocimiento del significado del término «ecología» como disciplina científica y la confusión existente entre ecología y ecologismo, o entre ecología espacial y astroecología, los he observado fuera de la Universidad, y no sólo fuera de la Universidad. Resulta comprensible que la gente de la calle desconozca muchas de las cosas aquí comentadas. Es más grave, sin embargo, que algunos estudiantes universitarios, incluso de cuarto curso de Biología o hasta de un máster como el de Biología y Conservación de la Biodiversidad, lleguen a carecer de unos conocimientos que en estos ámbitos serían elementales y, mucho peor todavía, que tengan estas y otras carencias políticos y gestores que se ocupan del medio ambiente (al fin y al cabo, estos últimos son profesionales).



(1) González Bernáldez, F. (1970) Ecología. Graellsia, 25, 339-346.

(2) Margalef, R. (1968) Perspectives in ecological theory. Chicago University Press, Chicago. [Trad. española: Perspectivas de la teoría ecológica. Blume, Barcelona, 1978]

(3) Soulé, M.E. (1985) What is conservation biology. BioScience, 35, 727-734.

Esta tarde he visto el post de Jeremy Fox sobre el visor Ngram de Google books, y lo primero que he hecho ha sido probar con los términos “spatial ecology” y “macroecology”, y el resultado ha sido interesante (véanse los comentarios que siguen al post)

I ♥ R! (R de R-Project, claro)

Tras una parada obligada de varios meses, por fin he podido añadir una nueva página. Es la que trata sobre software (libre) y, por ahora, está únicamente dedicada a R, un lenguaje de programación y un entorno que sirve para la descripción, el análisis estadístico y la visualización de datos.

R logo

He incluido una breve introducción y una sección sobre recursos que ya recoge manuales, tutoriales y documentos disponibles online, así como algunas páginas web, blogs, agregadores de blogs y otros recursos, y que poco a poco espero ir enriqueciendo.

No descarto añadir alguna página «estática» más, pero a partir de ahora los próximos contenidos serán ya en forma de post, espero.