Primary Identifier Mapping for Belleville

Student

From internal records (prod db), we can match on Email (prod db, dbo.ClassLinkUsers) to STUDENTS EMAIL from PS (Powerschool), then from STUDENTS STUDENT_NUMBER from PS to student_identifier from Renaissance.

The process of data integration involves four different data sources: internal records (prod db), dbo.ClassLinkUsers, Powerschool, and Rennaisance. The integration process involves matching data based on specific fields between these sources. Let's break it down step by step:

  1. Internal records (prod db): This refers to the internal database containing various records and data within RethinkEd.

  2. dbo.ClassLinkUsers: This refers to the external ClassLink database and is a table within prod db.

  3. Powerschool (PS): This external data from the SFTP contains student-related information, including details about individual students such as names, student numbers, and other relevant data.

  4. Renaissance: This external data from the SFTP contains ELA and math assessment scores from the data files SEL.csv, SM.csv, and SR.csv.

The integration process follows the following steps:

  1. Matching on Email: The first step involves matching the "Email" field in Rethink database prod db dbo.ClassLinkUsers table with the "STUDENTS EMAIL" field in Powerschool (PS) data. This matching allows you to associate students from the dbo.ClassLinkUsers table with corresponding entries in the PS system.

  2. Matching on Student Number: Once you have identified the students, the next step involves matching the STUDENTS STUDENT_NUMBER field from Powerschool (PS) to the student_identifier from the Renaissance table. This matching helps establish a connection between student numbers and the associated student identifiers.

By following these matching steps, we can ensure that students are properly mapped for Belleville. We will need to reevaluate the student identifiers when we sign on another district with Powerschool and/or Renaissance to ensure that we meet FERPA standards. FERPA (Family Educational Rights and Privacy Act) sets guidelines for protecting students' privacy and the confidentiality of their educational records. When integrating data from external sources or partnering with other districts, it is crucial to assess and ensure compliance with FERPA regulations to protect student privacy and data confidentiality.

This integration can provide a comprehensive view of student information, including user details, scores, and attendance and incidents, facilitating analysis and reporting to better support students throughout a district.

District

Specification for Mapping Integration with Data File Identifier Validation

Note: The district identifier for Belleville in our data (RethinkEd’s EDU Data Warehouse) is NcesId = 3401350. We can also use the additional district identifier for Belleville AccountInfoId = 107873. These identifiers are in our dm.AccountInfo table in the EDU Data Warehouse, and AccountInfoId is used as an identifier across multiple data tables in EDU Data Warehouse.

Right now, after reviewing all data brought in for Belleville from both Power School and Renaissance, there is no numeric district identifier that is shared across the Belleville’s data and our own internal data (either PROD Database or EDU Data Warehouse). Our only option as of right now for a 1-to-1 mapping would be to map on name by searching our dm.AccountInfo table in the EDU Data Warehouse, field/column name for the name of the district. This match will likely have to be “fuzzy”. For Belleville, we saw variations in the district name as it appeared in (1) the Renaissance data files (SM.csv, SEL.csv, and SR.csv data files), column/field name DistrictName (“Belleville Public Schools”), (2) the dm.AccountInfo table in the EDU Data Warehouse, column/field name AccountOrganizationName (“BELLEVILLE SCHOOL DISTRICT”), and (3) what we are getting from the API from ClassLink, field/column (“Belleville PD”).

Additional note: Only a few of the data files supplied by Belleville contain any district identifier at all. RethinkEd will have to “bundle” the data files supplied by the district in such a way as to attach the district identifier to all of the data files (whichever means of identifying the district that is ultimately used).

In lieu of an automated process for setting up the initial data ingestion, we will need a human to go through the data sources (internal and external/provided by the district) and locate and/or verify the best way to identify the given district across all data sources. Which specific fields/columns are used to identify the district may vary across districts, and the human’s job will be to document the best ways to identify across data sources (e.g., document the district’s NCES ID, AccountInfoId, district name, etc.).

The above process, with fuzzy matching based on district name, is only going to be used right now for Belleville. We are working hard to find a common way of identifying each new district that will join as MTSS customers across internal and external data sources. Additionally, as we get more data from other districts and assessment providers, we may find an additional field/column that was absent from the Belleville’s data that is present in other districts' data that can be used to identify districts across all data sources.

 

More Generalized Information for Data Ingestion for MTSS Districts

(Will possibly change as we work with and get data from more district partners and other edtech vendors)

 

  1. Overview The purpose of this specification is to outline the requirements for a mapping integration system that ensures data updates are only performed when the ingested data file identifiers for district ID match. This specification aims to prevent data inconsistencies and maintain the integrity of the integrated system.

  2. Integration Components The mapping integration system consists of the following components: a. Ingestion Module: Responsible for receiving and processing the data files. b. Mapping Module: Performs the mapping of data between the data we will be ingesting from other applications, and our own. c. Data Validation Module: Verifies the data file identifiers for district ID.

  3. Requirements The mapping integration system must adhere to the following requirements:

3.1. Data Ingestion a. The ingestion module should accept data files containing records with various attributes, including a district ID. b. The ingestion module should validate the format and structure of the incoming data files to ensure they comply with the expected schema. c. If the data file fails validation, an error should be logged, and the ingestion process should be halted for that file.

3.2. Mapping Process a. The mapping module should perform the necessary transformations to map the data from the source system to the target system. b. The mapping module should not perform any updates or modifications to the target system until the data file identifiers for district ID are verified.

3.3. Data Validation a. The data validation module should compare the district ID from the ingested data file with the existing records in the target system. b. If the district ID is not found in the target system, an error should be logged, and the data updates should not be performed. c. If the district ID is found in the target system, the mapping module can proceed with the data updates.

  1. Error Handling and Logging a. The integration system should maintain a comprehensive log of all errors, including validation failures and mapping errors. b. When errors occur during the ingestion, mapping, or data validation process, they should be logged with relevant details, such as the data file name, district ID, and timestamp.

  2. Performance Considerations a. The mapping integration system should be designed to handle a high volume of data files efficiently. b. The data validation process should be optimized to minimize the impact on system performance while ensuring accurate verification.

  3. Security Considerations a. The mapping integration system should adhere to security best practices, such as secure data transmission, access controls, and data encryption, to protect sensitive information.

  4. Documentation and Support a. The mapping integration system should be thoroughly documented, including configuration instructions and troubleshooting guidelines. b. Adequate support channels should be established to address any issues or questions related to the mapping integration system.