Our DataFrame is the list of French mayors in 2014:

https://www.data.gouv.fr/storage/f/2014-04-25T17-51-58/maires-25-04-2014.xlsx

this file is also in data/maires-25-04-2014.xlsx so no reason to reload it...

Load the file in a DataFrame¶

In [ ]:
 

Correct the dataframe¶

We can see there are issues:

  • first 3 lines are just comments to ignore
  • last line holds sums which we don't want
  • names of columns are in the forth line
  • names of columns are too long (e.g. 'Code du département (Maire)') so let define our name and ignore line 4 too (the title)

Show head of the resulting DataFrame.

Note: it can be useful to reaload the DataFrame with the right arguments.

In [ ]:
 

Lisez la doc de read_excel et recharger le tableau avec les bonnes options pour avoir directement le tableau parfait, sans aucunes des corrections précédentes à faire.

In [ ]:
 

Cast birth and population¶

Birth and population are useless String, cast them to what they should be.

In [ ]:
 

Add a column 'age'¶

Use the birthdate to add a column 'age'. You may need to compute in year since TimeDelta are in days by default.

Looking at data¶

  • display the line of Paris
  • sort all cities per population, largest first
  • give total population
  • give percentage of male mayors
  • give statitics on the age of mayors
In [ ]:
 

Group data¶

Let's group all cities of the same department and

  • sum the population with np.sum
  • average the age of mayors with np.mean
  • count the number of cities with np.size
In [ ]: