Data Mashups in R.: A Case Study in Real-World Data Analysis by Jeremy Leipzig

How do you utilize R to import, deal with, visualize, and learn real-world info? With this brief, hands-on instructional, you tips on how to gather on-line information, therapeutic massage it right into a average shape, and paintings with it utilizing R amenities to engage with net servers, parse HTML and XML, and extra. instead of use canned pattern information, you are going to plot and study present domestic foreclosures auctions in Philadelphia. This useful mashup workout exhibits you ways to entry spatial info in different codecs in the neighborhood and over the net to provide a map of domestic foreclosure. it truly is a very good strategy to discover how the R surroundings works with R programs and plays statistical research.

Figure 2-1. The Census Bureau page containing all census tracts data; Pennsylvania and Philadelphia County are selected from the drop-down menu Figure 2-2. years"... Examining our downloaded data, we see that the first line in the text file are IDs that makes little sense, while the second line describes those IDs. table allow us to skip the first column, By skipping the first line, the headers of censusTable are extracted from the second line. Also keep one of R’s quirks in mind— it likes to replace spaces with a period.

FIPSSTCO: Factor w/ 1 level "42101": 1 1 1 1 1 1 1 1 1 1 ... : 1 2 3 4 5 6 7 ... : 1 2 3 4 5 6 7 8 9 10 ... info Now we have a connection between the tracts and our census data. We also need to include the foreclosure data. y="PID") Changing the names for each column will facilitate scripting later on. Identifier", "totalPop", "totalHousehold", "familyHousehold", "nonfamilyHousehold", "TravelTime", "TravelTime90+minutes", "totalDisabled", "medianHouseholdIncome", "povertyStatus", "BelowPoverty","OccupiedHousing", "ownedOccupied", "rentOccupied", "FCS") Descriptive Statistics The calculation of mean, median, and standard deviation is performed with mean(), median(), and sd(), respectively.

Using these packages effectively often requires some trial and error, but R package usage patterns will typically resemble what has been covered in this tutorial. In addition to reviewing the internal help and examples, it is good practice to closely examine each package’s data structures using str(). The interactive nature of R allows a beginner to attempt to solve complex problems by trying different strategies in real-time, without the hassles of compilation. A spatial mashup cannot cover R’s extensive statistical capabilities, but hopefully this book will spark some interest in programmers who want to incorporate statistical analysis into their data pipelines.

