Bellabeat Case Study

August 2022

Introduction

Bellabeat is a high-tech manufacturer that focuses on women's health and wellness with a collection of wearable and non-wearable tech. Bella beat's innovative products focus on delivering key fitness insights based on different time points during a woman's cycle. It is a successful small company that may have the potential to become a larger player in the global smart device market. Bellabeat invests year-round in Google Search, maintains active Facebook and Instagram pages, and consistently engages consumers on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google Display Network to support campaigns around key marketing dates. Co-founder and CCO Urška Srše, believes that further analysis of smart device fitness data could help unlock new growth opportunities for the company.


Business Task

Utilizing the Fitbit Fitness Tracker Data, the goal is to gain further insight into how consumers are using non-Bellabeat smart devices. Applying these insights to help influence Bellabeat's customers and the overall marketing strategy. The key questions we are looking to resolve in our analysis are the following:

1. What are some trends in smart device usage?

2. How could these trends apply to Bellabeat customers?

3. How could these trends help influence Bellabeat marketing strategy?

Key stakeholders:

- Urška Sršen: Bellabeat’s co-founder and Chief Creative Officer

- Sando Mur: Bellabeat’s co-founder and a key member of the Bellabeat executive team


Data Summary

  • The data source used for this case study is FitBit Fitness Tracker Data.

  • The data contains personal fitness trackers from thirty Fitbit users collected between 03.12.2016-05.12.2016 via a survey using Amazon Mechanical Turk.

  • Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring.

  • This dataset is under CC0: Public Domain license meaning the creator has waived his right to the work under copyright law. Möbius cited the dataset from Zendo: Furberg, Robert; Brinton, Julia; Keating, Michael; Ortiz, Alexa [Source Here].

Data Reliability/Limitations

  • Considering that this data was collected in 2016 for a period of one month, 4.12 to 5.12 (after data cleaning), the data is quite outdated to be comparing it to a 2022 trend.

  • Furthermore, it would be much more ideal if we could use a larger sample size since our sample size of 30 ( 33 after data cleaning ) is not representative of the entire market.

  • When looking into the more specific datasets I noticed the weight and sleep datasets are lacking participation, more participation from our sample size would be needed to develop a sound conclusion

  • There is no specification whether the participants are male or female, considering we are looking to market to women, this data may not be the most reliable in determining/finding key trends for our target audience.



Process

Databases being used for this analysis include

  • dailyActivity

  • weightLog

  • sleepLog

  • hourlyCalories

  • hourlySteps


The focus is on daily and hourly usage of the Fitbit device to identify possible patterns of smart device usage.

The data was cleaned using google sheets and was transferred to SQL (Google BigQuery) for further analysis.

Data Cleaning

  • Imported all CSV files to a spreadsheet, in this case, google sheets, for cleanup

  • Formatted all dates to yyyy.mm.dd format and split the hours to a separate column as a String to include AM/PM on the hourlyCalories and hourlySteps tables

  • Converted hourlyCalories and hourlySteps tables to 24 hour clock format to better be able to compare timeframes as Integer in SQL

  • Added and converted dates to a "weekday" column as a String to dailyActivity for further weekday analysis

  • Ensured all User ID fields were a consistent 10 characters using the LEN function coupled with conditional formatting to spot out any inconsistencies, none found.

  • Trimmed whitespace and removed duplicates for all tables using the trim function and duplicates tool. Three duplicates were found and removed from the sleepLog table.

  • Sorted by ID and Date, formatted/rounded all decimal values to .00

  • Eliminated all NULL value columns

Hypothesis

Fitness trackers are conventionally used to track daily activity, people are more willing to purchase a product that provides insights into their preferred daily exercises or activities.


Role Call

Once data cleaning was completed I downloaded all the cleaned databases as .csv files and uploaded them to Big Query for analysis. I first ran an "Id" count on each database to double-check how many distinct participants submitted data.


-- Lists the unique number of users in each table

SELECT

COUNT(Distinct Id)

FROM `abrahdata.bellabeat.dailyActivity`;

SELECT

COUNT(Distinct Id)

FROM `abrahdata.bellabeat.hourlyCalories`;

SELECT

COUNT(Distinct Id)

FROM `abrahdata.bellabeat.hourlySteps`;

SELECT

COUNT(Distinct Id)

FROM `abrahdata.bellabeat.sleepLog`;

SELECT

COUNT(Distinct Id)

FROM `abrahdata.bellabeat.weightLog`;



  • DailyActivity: 33

  • HourlySteps: 33

  • HourlyCalories: 33

  • DailySleep: 24

  • WeightLog: 8

Usage

Before we can find trends in usage, we get an accurate idea of how often the Fitbits were worn and what they were mostly used for during this data period. First I chose to accurately define the timeframe

SELECT

Max(ActivityDate) AS last_day,

Min(ActivityDate) AS first_day

FROM `abrahdata.bellabeat.dailyActivity`


Result:

last_day || first_day

2016-05-12 || 2016-04-12

The total days used in this study were 31.


Next, I wanted to get an idea of the usage rate of the participants I classified them into 4 different groups based on the percentage usage.


-- Calculates the frequency of usage separated by 4 categories

WITH user_r AS (SELECT

Distinct days_used,

COUNT(Id) AS u_rate_num,

CASE When days_used >= 29 Then "Frequent Usage 93%"

When days_used >= 26 Then "Often Usage 84%"

WHen days_used >= 20 then "Sometimes Usage 65%"

Else "Sparse Usage Less than 65%" END AS usage

From (SELECT

Id,

Count(Id)As days_used,

FROM `abrahdata.bellabeat.dailyActivity`

Group by Id )

GROUP BY days_used

ORDER BY days_used DESC)


SELECT

usage,

SUM(user_r.u_rate_num) AS num_users

FROM user_r

GROUP BY usage

ORDER BY num_users DESC;

29 = 93% usage or more "Frequent Usage" - 26 people

26-28 = 84% usage at least "Often Usage" - 3 people

20-25 = 65% usage at least "Sometimes Usage" - 1 person

Anything under 65% "Sparse Usage" - 3 people


It seems that 93.5% of the participants used the Fitbit device more often than not.

I next measured the minutes used grouped by intensity per weekday. in each column. The results did not seem clear enough to determine a trend so I instead compared the average weekly active minutes vs the sedentary minutes.


SUM:

/* Lists sum of active minutes per intensity and the sum of sedentary minutes all per weekdaay */

SELECT

Weekday,

SUM(VeryActiveMinutes) AS very_active,

SUM(LightlyActiveMinutes) AS lightly_active,

SUM(FairlyActiveMinutes) AS fairly_active,

SUM(SedentaryMinutes) AS sedentary,

FROM `abrahdata.bellabeat.dailyActivity`

GROUP BY Weekday

ORDER BY sedentary DESC

;

Average:

-- Lists average active minutes and average sedentary minutes by weekday rounded to the nearest 100th

SELECT

Weekday,

ROUND(AVG(VeryActiveMinutes)+ AVG(LightlyActiveMinutes)+ AVG(FairlyActiveMinutes),2) AS avg_total_active,

ROUND(AVG(SedentaryMinutes),2) AS sedentary,

FROM `abrahdata.bellabeat.dailyActivity`

GROUP BY Weekday

ORDER BY avg_total_active DESC

;

Visualization:

It appears that the top Active days are Saturday, Friday, and Tuesday

Top days for Sedentary days are Monday, Tuesday, and Friday


From here I decided to see what the most active hours were on average. I joined the tables using SQL and prepared them for visualization using tableau. Averages were calculated using tableau. As shown below the most active hours are hours 12,13,14 (12:00 pm - 2:00pm) and hours 17,18,19 (5:00pm-7:00pm) respectively.


--Links the hourly steps and calories to prepare for visualization

SELECT

step.Id,

step.Date,

step.Hour,

step.StepTotal,

calories.Calories

FROM `abrahdata.bellabeat.hourlySteps` AS step

Left JOIN `abrahdata.bellabeat.hourlyCalories` AS calories

ON step.Id = calories.Id and step.Date = calories.Date and step.Hour= calories.Hour

ORDER BY step.Id, step.Date;


The visualization above also points out that the most active caloric hours coincide with the most active steps hours. The scatterplot below further confirms that assumption given the positive correlation between steps and calories.


Benefits- Weight and sleep


Lastly I took a look at some of the other data available in the zip file to see if there is any correlations between daily calories to weight and daily calories to sleep.


Sleep:

There was an interesting finding while observing the minutes slept. Although there appears to be no correlation between calories burned and minutes slept, there does seem to be a negative correlation between minutes slept and sedentary minutes. The data suggests that the more time the subjects spent sedentary, the less they slept.

--Lists the minutes slept and calories per participant on each recorded date

SELECT

DISTINCT s.Id,

ActivityDate,

Calories,

s.TotalMinutesAsleep

FROM `abrahdata.bellabeat.dailyActivity` AS daily

INNER JOIN `abrahdata.bellabeat.sleepLog` AS s

ON daily.Id = s.Id AND daily.ActivityDate = s.SleepDay

WHERE s.TotalMinutesAsleep is not NULL

ORDER BY s.ID, ActivityDate;


--Lists the minutes slept and sedentary minutes per participant on each recorded date

SELECT

DISTINCT s.Id,

ActivityDate,

SedentaryMinutes,

s.TotalMinutesAsleep

FROM `abrahdata.bellabeat.dailyActivity` AS daily

INNER JOIN `abrahdata.bellabeat.sleepLog` AS s

ON daily.Id = s.Id AND daily.ActivityDate = s.SleepDay

WHERE s.TotalMinutesAsleep is not NULL

ORDER BY s.ID, ActivityDate;

Weight:


Weight is the one measure that received the least amount of participants (8) since we have such low participation It is best not to use this data for the analysis. Below is a visualization showing the data of those who participated. No correlation is expressed on any of the participants.


-- Lists the weights and calories per participant on each recorded date

SELECT

DISTINCT w.Id,

ActivityDate,

Calories,

w.WeightPounds

FROM `abrahdata.bellabeat.dailyActivity` AS daily

INNER JOIN `abrahdata.bellabeat.weightLog` AS w

ON daily.Id = w.Id AND daily.ActivityDate = w.Date

WHERE WeightPounds is not NULL

ORDER BY w.ID, ActivityDate;


Conclusions

Summary of Findings:


  • Given the high amount of sedentary data, it is fair to conclude that people do not only wear smart devices for tracking workouts and activity, but they also wear them casually throughout the day

  • Users recorded the most active data between the hours of (12:00 pm - 2:00 pm) and (5:00 pm-7:00 pm). These hours are regarded as "lunchtime hours" and "just got off work" hours, for 9-5 workers, which also happen to have the highest amount of steps.

  • Fridays and Saturdays tended to be the most active days of the week. This may suggest that most Fitbit wearers tend to be 9-5 workers with a Monday-Friday schedule.

  • Higher calories burned do not correlate to more or fewer minutes slept, however, the more minutes spent sedentary the fewer sleep minutes each user recorded. This suggests that users who are more active tend to have more consistency with their sleep habits than sedentary users.

  • Sleep tracking received less participation at 72%, this can be due to a variety of factors. Possibly has to do with the comfort of the device at sleep but more data would be required for an accurate reason.

  • Weight received hardly any participation and was inconsistent. The raw weight_log data table had a boolean column labeled "IsManualReport" which detailed whether this data was inputted manually or automatically, possibly using a Fitbit Smart Scale that syncs up with the watch. It would be best to conduct a survey of current Bellabeat customers to see if that is a feature they would be interested in.

  • There is insufficient demographic data to differentiate which users are male or female, more demographic information would be needed to draw accurate conclusions about the target market.

Recommendations:


My recommendations for the Bellabeat Ivy and Bellabeat Leaf using the data available are as follows:

  1. Given the data it is clear that users tend to wear smart devices throughout the day whether active or inactive. When marketing a smart device product for women, aesthetics are paramount. I recommend directing resources to market the aesthetic of the Bellabeat device, like customizability as well as comfort. After the aesthetics, I recommend marketing the tracking features of day-to-day activities such as steps, heart rate, cycle tracking, and sleep tracking.

  2. Activity tracking shows that the most active timeframes are (12:00 pm - 2:00 pm) and (5:00 pm-7:00 pm), with the most active days being Friday and Saturday. This data suggests that most smartdevice wearers tend to be 9-5 workers, therefore, it would be wise to develop a marketing strategy for women who fall into that category. Perhaps a survey of current Bellabeat users can confirm or clear up this finding, before directing significant resources to this strategy.

  3. Sleep tracking is a relatively new feature for smart devices and Bellabeat has an edge on insights with its new Ivy device. Given the correlation data from Fitbit users, we know that a more sedentary lifestyle leads to worse sleep. Adding a feature that notifies the