pd90 exercice -- Maires de France.ipynb

Our DataFrame is the list of French mayors in 2014:

https://www.data.gouv.fr/storage/f/2014-04-25T17-51-58/maires-25-04-2014.xlsx

this file is also in data/maires-25-04-2014.xlsx so no reason to reload it...

Load the file in a DataFrame¶

In [ ]:

Correct the dataframe¶

We can see there are issues:

first 3 lines are just comments to ignore
last line holds sums which we don't want
names of columns are in the forth line
names of columns are too long (e.g. 'Code du département (Maire)') so let define our name and ignore line 4 too (the title)

Show head of the resulting DataFrame.

Note: it can be useful to reaload the DataFrame with the right arguments.

In [ ]:

Lisez la doc de read_excel et recharger le tableau avec les bonnes options pour avoir directement le tableau parfait, sans aucunes des corrections précédentes à faire.

In [ ]:

Cast birth and population¶

Birth and population are useless String, cast them to what they should be.

In [ ]:

Add a column 'age'¶

Use the birthdate to add a column 'age'. You may need to compute in year since TimeDelta are in days by default.

Looking at data¶

display the line of Paris
sort all cities per population, largest first
give total population
give percentage of male mayors
give statitics on the age of mayors

In [ ]:

Group data¶

Let's group all cities of the same department and

sum the population with np.sum
average the age of mayors with np.mean
count the number of cities with np.size

In [ ]: