Summary Stats Summary statistics
07 May 2015
Summary statistics
The summary() function will return summary statistics if the column contains numeric values, and the count and frequency of top 5 most common values if the column contains non-numeric values. Summary can be used either on a single column, or on a whole dataframe.
It can also return stratified summary statistics using the by argument.
import epipy
import pandas as pd
df = pd.DataFrame({'Age' : [10, 12, 14], 'Group' : ['A', 'B', 'B'] })
epipy.summary(df.Age)
returns:
count 3
missing 0
min 10
median 12
mean 12
std 2
max 14
dtype: float64
and:
epipy.summary(df.Group)
returns:
count freq
B 2 0.666667
A 1 0.333333
finally:
epipy.summary(df.Age, by=df.Group)
returns:
count missing min median mean std max
A 1 0 10 10 10 NaN 10
B 2 0 12 13 13 1.414214 14